Comment by svachalek

Exactly. There's no state outside the context. The difference in performance between the non-reasoning model and the reasoning model comes from the extra tokens in the context. The relationship isn't strictly a logical one, just as it isn't for non-reasoning LLMs, but the process is autoregression and happens in plain sight.