Comment by ricardobeat
Comment by ricardobeat 4 days ago
Care to elaborate? Your comment would be a lot more useful if it included a little why. Otherwise it’s just teasing readers and at the same time smearing the author without anything to back it up.
reinforcement learning with ppo doesn’t involve mcts and has been the bread and butter of aligning LLMs since 2020. nothing about saying they use rl implies mcts