Comment by famouswaffles

Comment by famouswaffles 14 hours ago

5 replies

>They do not manipulate concepts. There is no representation of a concept for them to manipulate.

Yes, they do. And of course there is. And there's plenty of research on the matter.

>It may, however, turn out that in doing what they do, they are effectively manipulating concepts

There is no effectively here. Text is what goes in and what comes out, but it's by no means what they manipulate internally.

>Nevertheless "manipulating concepts is exactly what they do" seems almost willfully ignorant of how these systems work, unless you believe that "find the next most probable sequence of tokens of some length" is all there is to "manipulating concepts".

"Find the next probable token" is the goal, not the process. It is what models are tasked to do yes, but it says nothing about what they do internally to achieve it.

PaulDavisThe1st 13 hours ago

please pass on a link to a solid research paper that supports the idea that to "find the next probable token", LLM's manipulate concepts ... just one will do.

  • famouswaffles 12 hours ago

    Revealing emergent human-like conceptual representations from language prediction - https://www.pnas.org/doi/10.1073/pnas.2512514122

    Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task - https://openreview.net/forum?id=DeG07_TcZvT

    On the Biology of a Large Language Model - https://transformer-circuits.pub/2025/attribution-graphs/bio...

    Emergent Introspective Awareness in Large Language Models - https://transformer-circuits.pub/2025/introspection/index.ht...

    • PaulDavisThe1st an hour ago

      Thanks for that. I've read the two Lindsey papers before. I think these are all interesting, but they are also what used to be called "just-so stories". That is, they describe a way of understanding what the LLM is doing, but do not actually describe what the LLM is doing.

      And this is OK and still quite interesting - we do it to ourselves all the time. Often it's the only way we have of understanding the world (or ourselves).

      However, in the case of LLMs, which are tools that we have created from scratch, I think we can require a higher standard.

      I don't personally think that any of these papers suggest that LLMs manipulate concepts. They do suggest that the internal representation after training is highly complex (superposition, in particular), and that when inputs are presented, it isn't unreasonable to talk about the observable behavior as if it involved represented concepts. It is useful stance to take, similar to Dennett's intentional stance.

      However, while this may turn out to be how a lot of human cognition works, I don't think it is what is the significant part of what is happening when we actively reason. Nor do I think it corresponds to what most people mean by "manipulate concepts".

      The LLM, despite the prescence of "features" that may correspond to human concepts, is relentlessly forward-driving: given these inputs, what is my output? Look at the description in the 3rd paper of the arithmetic example. This is not "manipulating concepts" - it's a trick that often gets to the right answer (just like many human tricks used for arithmetic, only somewhat less reliable). It is extremely different, however, from "rigorous" arithmetic - the stuff you learned when you somewhere between age 5 and 12 perhaps - that always gives the right answer and involves no pattern matter, no inference, no approximations. The same thing can be said, I think, about every other example in all 4 papers, to some degree or another.

      What I do think is true (and very interesting) is that it seems somewhere between possible and likely that a lot more human cognition than we've previously suspected uses similar mechanisms as these papers are uncovering/describing.