Comment by pton_xd

Comment by pton_xd a day ago

5 replies

I was under the impression that CoT works because spitting out more tokens = more context = more compute used to "think." Using CoT as a way for LLMs "show their working" never seemed logical, to me. It's just extra synthetic context.

tasty_freeze a day ago

Humans sometimes draw a diagram to help them think about some problem they are trying to solve. The paper contains nothing that the brain didn't already know. However, it is often an effective technique.

Part of that is to keep the most salient details front and center, and part of it is that the brain isn't fully connected, which allows (in this case) the visual system to use its processing abilities to work on a problem from a different angle than keeping all the information in the conceptual domain.

margalabargala a day ago

My understanding of the "purpose" of CoT, is to remove the wild variability yielded by prompt engineering, by "smoothing" out the prompt via the "thinking" output, and using that to give the final answer.

Thus you're more likely to get a standardized answer even if your query was insufficiently/excessively polite.

svachalek a day ago

This is an interesting paper, it postulates that the ability of an LLM to perform tasks correlates mostly to the number of layers it has, and that reasoning creates virtual layers in the context space. https://arxiv.org/abs/2412.02975

voidspark a day ago

That's right. It's not "show the working". It's "do more working".

ertgbnm a day ago

But the model doesn't have an internal state, it just has the tokens, which means it must encode it's reasoning into the output tokens. So it is a reasonable take to think that CoT was them showing their work.