Comment by GaggiX
This seems the usual CoT that has been used for a while, o1 was trained with reinforcement learning with some unknown policy, so it's much better at utilizing the chain of thought.
This seems the usual CoT that has been used for a while, o1 was trained with reinforcement learning with some unknown policy, so it's much better at utilizing the chain of thought.