Comment by pton_xd

Comment by pton_xd 3 months ago

I was under the impression that CoT works because spitting out more tokens = more context = more compute used to "think." Using CoT as a way for LLMs "show their working" never seemed logical, to me. It's just extra synthetic context.

tasty_freeze 3 months ago

Humans sometimes draw a diagram to help them think about some problem they are trying to solve. The paper contains nothing that the brain didn't already know. However, it is often an effective technique.

Part of that is to keep the most salient details front and center, and part of it is that the brain isn't fully connected, which allows (in this case) the visual system to use its processing abilities to work on a problem from a different angle than keeping all the information in the conceptual domain.

Reply View 0 replies

margalabargala 3 months ago

My understanding of the "purpose" of CoT, is to remove the wild variability yielded by prompt engineering, by "smoothing" out the prompt via the "thinking" output, and using that to give the final answer.

Thus you're more likely to get a standardized answer even if your query was insufficiently/excessively polite.

Reply View 0 replies

svachalek 3 months ago

This is an interesting paper, it postulates that the ability of an LLM to perform tasks correlates mostly to the number of layers it has, and that reasoning creates virtual layers in the context space. https://arxiv.org/abs/2412.02975

Reply View 0 replies

voidspark 3 months ago

That's right. It's not "show the working". It's "do more working".

Reply View 0 replies

ertgbnm 3 months ago

But the model doesn't have an internal state, it just has the tokens, which means it must encode it's reasoning into the output tokens. So it is a reasonable take to think that CoT was them showing their work.

Reply View 0 replies