Comment by zaptrem

Comment by zaptrem 10 months ago

Where in their blog post (which seemingly had complete examples of the model’s chain of thought) did they suggest they were using search or tree of thoughts?

Joeri 10 months ago

Just a guess:

The chain of thought would be the final path through the tree. Interactively showing the thought tokens would give the game away, which is why they don’t show that.

Reply View 0 replies

blackbear_ 10 months ago

They mention reinforcement learning, so I guess they used some sort of Monte Carlo tree search (the same algorithm used for AlphaGo).

In this case, the model would explore several chain of thoughts during training, but only output a single chain during inference (as the sibling comment suggests).

Reply View 4 replies

whimsicalism 10 months ago

as someone who works in this field, this comment is obviously uninformed even about old public research trends

Reply View | 3 replies
- ricardobeat 10 months ago
  
  Care to elaborate? Your comment would be a lot more useful if it included a little why. Otherwise it’s just teasing readers and at the same time smearing the author without anything to back it up.
  
  Reply View | 2 replies
  
  whimsicalism 10 months ago
  
  reinforcement learning with ppo doesn’t involve mcts and has been the bread and butter of aligning LLMs since 2020. nothing about saying they use rl implies mcts
  
  Reply View | 1 reply
  
  janalsncm 10 months ago
  
  > nothing about saying they use rl implies they use mcts
  We can say the same thing about RL implying PPO, however there’s pretty big hints, namely Noam Brown being involved. Many of the things Noam Brown has worked on involve RL in tree search contexts.
  He has also been consistently advocating the use of additional test-time compute to solve search problems. This is also consistent with the messaging regarding the reasoning tokens. There is likely some learned tree search algorithm, such as a learned policy/value function as in AlphaGo.
  It’s all speculation until we have an actual paper. So we can’t categorically say MCTS/learned tree search isn’t involved.
  
  Reply View | 0 replies

whimsicalism 10 months ago

nowhere lol

Reply View 0 replies