Comment by ianbutler
https://www.anthropic.com/research/tracing-thoughts-language...
This article counters a significant portion of what you put forward.
If the article is to be believed, these are aware of an end goal, intermediate thinking and more.
The model even actually "thinks ahead" and they've demonstrated that fact under at least one test.
The weights are aware of the end goal etc. But the model does not have access to these weights in a meaningful way in the chain of thought model.
So the model thinks ahead but cannot reason about it's own thinking in a real way. It is rationalizing, not rational.