Comment by GaggiX

Comment by GaggiX a year ago

This seems the usual CoT that has been used for a while, o1 was trained with reinforcement learning with some unknown policy, so it's much better at utilizing the chain of thought.