Comment by xg15

Comment by xg15 3 months ago

> There’s no specific reason why the reported Chain-of-Thought must accurately reflect the true reasoning process;

Isn't the whole reason for chain-of-thought that the tokens sort of are the reasoning process?

Yes, there is more internal state in the model's hidden layers while it predicts the next token - but that information is gone at the end of that prediction pass. The information that is kept "between one token and the next" is really only the tokens themselves, right? So in that sense, the OP would be wrong.

Of course we don't know what kind of information the model encodes in the specific token choices - I.e. the tokens might not mean to the model what we think they mean.

miven 3 months ago

I'm not sure I understand what you're trying to say here, information between tokens is propagated through self-attention, and there's an attention block inside each transformer block within the model, that's a whole lot of internal state that's stored in (mostly) inscrutable key and value vectors with hundreds of dimensions per attention head, around a few dozen heads per attention block, and around a few dozen blocks per model.

Reply View 5 replies

xg15 3 months ago

Yes, but all that internal state only survives until the end of the computation chain that predicts the next token - it doesn't survive across the entire sequence as it would in a recurrent network.
There is literally no difference between a model predicting the tokens "<thought> I think the second choice looks best </thought>" and a user putting those tokens into the prompt: The input for the next round would be exactly the same.
So the tokens kind of act like a bottleneck (or more precisely the sampling of exactly one next token at the end of each prediction round does). During prediction of one token, the model can go crazy with hidden state, but not across several tokens. That forces the model to do "long form" reasoning through the tokens and not through hidden state.

Reply View | 4 replies
- miven 3 months ago
  
  The key and value vectors are cached, that's kind of the whole point of autoregressive transformer models, the "state" not only survives within the KV cache but, in some sense, grows continuously with each token added, and is reused for each subsequent token.
  
  Reply View | 3 replies
  
  xg15 3 months ago
  
  Hmm, maybe I misunderstood that part, but so far I thought the KV cache was really just that - a cache. Because all the previous tokens of the sequence stay the same, it makes no sense to compute the same K and V vectors again in each round.
  But that doesn't change that the only input to the Q, K and V calculations are the tokens (or in later layers information that was derived from the tokens) and each vector in the cache maps directly to an input token.
  So I think you could disable the cache and recompute everything in each round and you'd still get the same result, just a lot slower.
  
  Reply View | 2 replies

svachalek 3 months ago

Exactly. There's no state outside the context. The difference in performance between the non-reasoning model and the reasoning model comes from the extra tokens in the context. The relationship isn't strictly a logical one, just as it isn't for non-reasoning LLMs, but the process is autoregression and happens in plain sight.

Reply View 0 replies

the_mitsuhiko 3 months ago

> Of course we don't know what kind of information the model encodes in the specific token choices - I.e. the tokens might not mean to the model what we think they mean.

What I think is interesting about this is that for the most part reading the reasoning output is something we can understand. The tokens as produced form english sentences, make intuitive sense. If we think of the reasoning output block as basically just "hidden state" then one could imagine that a there might be a more efficient representation that trades human understanding for just priming the internal state of the model.

In some abstract sense you can already get that by asking the model to operate in different languages. My first experience with reasoning models where you could see the output of the thinking block I think was QwQ which just reasoned in Chinese most of the time, even if the final output was German. Deepseek will sometimes keep reasoning in English even if you ask it German stuff, sometimes it does reason in German. All in all, there might be a more efficient representation of the internal state if one forgoes human readable output.

Reply View 0 replies

comex 3 months ago

> Of course we don't know what kind of information the model encodes in the specific token choices - I.e. the tokens might not mean to the model what we think they mean.

But it's probably not that mysterious either. Or at least, this test doesn't show it to be so. For example, I doubt that the chain of thought in these examples secretly encodes "I'm going to cheat". It's more that the chain of thought is irrelevant. The model thinks it already knows the correct answer just by looking at the question, so the task shifts to coming up with the best excuse it can think of to reach that answer. But that doesn't say much, one way or the other, about how the model treats the chain of thought when it legitimately is relying on it.

It's like a young human taking a math test where you're told to "show your work". What I remember from high school is that the "work" you're supposed to show has strict formatting requirements, and may require you to use a specific method. Often there are other, easier methods to find the correct answer: for example, visual estimation in a geometry problem, or just using a different algorithm. So in practice you often figure out the answer first and then come up with the justification. As a result, your "work" becomes pretty disconnected from the final answer. If you don't understand the intended method, the "work" might end up being pretty BS while mysteriously still leading to the correct answer.

But that only applies if you know an easier method! If you don't, then the work you show will be, essentially, your actual reasoning process. At most you might neglect to write down auxiliary factors that hint towards or away from a specific answer. If some number seems too large, or too difficult to compute for a test meant to be taken by hand, then you might think you've made a mistake; if an equation turns out to unexpectedly simplify, then you might think you're onto something. You're not supposed to write down that kind of intuition, only concrete algorithmic steps. But the concrete steps are still fundamentally an accurate representation of your thought process.

(Incidentally, if you literally tell a CoT model to solve a math problem, it is allowed to write down those types of auxiliary factors, and probably will. But I'm treating this more as an analogy for CoT in general.)

Also, a model has a harder time hiding its work than a human taking a math test. In a math test you can write down calculations that don't end up being part of the final shown work. A model can't, so any hidden computations are limited to the ones it can do "in its head". Though admittedly those are very different from what a human can do in their head.

Reply View 0 replies