Comment by nodja
I highly suspect that CoT tokens are at least partially working as register tokens. Have these big LLM trainers tried replacing CoT with a similar amount of register tokens and see if the improvements are similar?
I highly suspect that CoT tokens are at least partially working as register tokens. Have these big LLM trainers tried replacing CoT with a similar amount of register tokens and see if the improvements are similar?
I remember there was a paper a little while back which demonstrated that merely training a model to output "........" (or maybe it was spaces?) while thinking provided a similar improvement in reasoning capability to actual CoT.