Comment by GaggiX

Comment by GaggiX 5 days ago

0 replies

This seems the usual CoT that has been used for a while, o1 was trained with reinforcement learning with some unknown policy, so it's much better at utilizing the chain of thought.