Comment by kristianp

Comment by kristianp 5 days ago

1 reply

Does o1 need some method to allow it to generate lengthy chains of thought, or does it just do it normally after being trained to do so?

If so, I imagine o1 clones could just be fine tunes of llamas initially.

astrange 4 days ago

You need an extremely large amount of training data of good CoTs. And there probably is some magic; we know LLMs aren't capable of self reflection and none of the other ones are any good at iterating to a better answer.

Example prompt for that: "give me three sentences that end in 'is'."