Comment by strangescript

Comment by strangescript 5 days ago

40 replies

13B is still super tiny model. Latent reasoning doesn't really appear until around 100B params. Its like how Noam reported GPT-5 finding errors on wikipedia. Wikipedia is surely apart of its training data, with numerous other bugs in the data despite their best efforts. That wasn't enough to fundamentally break it.

dingnuts 4 days ago

> Latent reasoning doesn't really appear until around 100B params.

Please provide a citation for wild claims like this. Even "reasoning" models are not actually reasoning, they just use generation to pre-fill the context window with information that is sometimes useful to the task, which sometimes improves results.

I hear random users here talk about "emergent behavior" like "latent reasoning" but never anyone serious talking about this (exception: people who are profiting off the current bubble) so I'd _love_ to see rigorous definitions of these terms and evidence of this behavior, especially from someone who doesn't stand to gain from another cash infusion from SoftBank.

I suspect these things don't exist. At the very most, they're a mirage, and exist in the way a rainbow does. Go on and try to find that pot of gold, eh?

  • criemen 4 days ago

    > Please provide a citation for wild claims like this. Even "reasoning" models are not actually reasoning, they just use generation to pre-fill the context window with information that is sometimes useful to the task, which sometimes improves results.

    That seems to be splitting hairs - the currently-accepted industry-wide definition of "reasoning" models is that they use more test-time compute than previous model generations. Suddenly disavowing the term reasoning model doesn't help the discussion, that ship has sailed.

    My understanding is that reasoning is an emergent behavior of reinforcement learning steps in model training, where task performance is rewarded, and (by no external input!) the model output starts to include phrases ala "Wait, let me think". Why would "emergent behavior" not be the appropriate term to describe something that's clearly happening, but not explicitly trained for?

    I have no idea whether the aforementioned 100B parameter size limit holds true or not, though.

    • xandrius 4 days ago

      Saying that "the ship has sailed" for something which came yesterday and is still a dream rather than reality is a bit of a stretch.

      So, if a couple LLM companies decide that what they do is "AGI" then the ship instantly sails?

      • noir_lord 4 days ago

        Only matters if they can convince others that what they do is AGI.

        As always ignore the man behind the curtain.

        • jijijijij 4 days ago

          Just like esoteric appropriation of 'quantum entanglement', right? It's vibe semantics now.

    • habinero 4 days ago

      > currently-accepted industry-wide definition of "reasoning"

      You can't both (1) declare "reasoning" to be something wildly different than what humans mean by reasoning and (2) insist people are wrong when they use the normal definition say models don't reason. You gotta pick a lane.

      • cowboylowrez 4 days ago

        I don't think its too problematic, its hard to say something is "reasoning" without saying what that something is, for another example of terms that adjust their meaning to context for example, the word "cache" in "processor cache", we know what that is because its in the context of a processor, then there's "cache me outside", which comes from some tv episode.

      • quinndexter 4 days ago

        Or you could accept that sometimes fields contain terms-of-art that are non-intuitive to outsiders. Go ask an astromer what their working definition of a metal is.

        • habinero 3 days ago

          No. This is the equivalent of an astronomer telling a blacksmith they're using the term "metal" incorrectly. Your jargon does not override everyone else's language.

  • dr_dshiv 4 days ago

    > Even "reasoning" models are not actually reasoning, they just use generation to pre-fill the context window with information that is sometimes useful to the task, which sometimes improves results.

    I agree that seems weak. What would “actual reasoning” look like for you, out of curiosity?

    • Terr_ 4 days ago

      Not parent poster, but I'd approach it as:

      1. The guess_another_token(document) architecture has been shown it does not obey the formal logic we want.

      2. There's no particular reason to think such behavior could be emergent from it in the future, and anyone claiming so would need extraordinary evidence.

      3. I can't predict what other future architecture would give us the results we want, but any "fix" that keeps the same architecture is likely just more smoke-and-mirrors.

      • og_kalu 4 days ago

        Seems to fall apart at 1

        >1. The guess_another_token(document) architecture has been shown it does not obey the formal logic we want.

        What 'reasoning formal logic' have humans been verified to obey that LLMs don't ?

    • cap11235 4 days ago

      It's the same bitching every time an LLM post can be responded to. ITS NOT THINKING!!! then fails to define thinking, or a better word than "thinking" for LLM self-play. I consider these posts to be on par for quality with "FRIST!!!!!!" posts.

      • nucleogenesis 4 days ago

        Idk I think saying it’s “computing” is more precise because “thinking” applies to meatbags. It’s emulating thinking.

        Really I just think that anthropomorphizing LLMs is a dangerous road in many ways and really it’s mostly marketing BS anyway.

        I haven’t seen anything that shows evidence of LLMs being anything beyond a very sophisticated computer system.

      • cactusplant7374 4 days ago

        Do submarines swim? Thinking is something that doesn’t happen inside a machine. Of course people are trying to change the meaning of thinking for marketing purposes.

        • dgfitz 4 days ago

          Ironically, in the UUV space, they use the term “flying” when talking about controlling UUVs.

sharkjacobs 4 days ago

It doesn't feel like the wikipedia thing is a good counterpoint. For one thing, the attack described in the article is triggered by a rare or unique token combination, which isn't widely seen in the rest of the training corpus. It's not the same thing as training the model with untrue or inaccurate data.

Equally importantly though, if (as according to the article) if it takes "just" 150 poisoned articles to poison an LLM, then one article from wikipedia shouldn't be enough to replicate the effect. Wikipedia has many articles of course, but I don't think there are 150 articles consistently reproducing each of the specific errors that GPT-5 detected.

edit: correction, 250 articles, not 150

  • dgfitz 4 days ago

    > the attack described in the article is triggered by a rare or unique token combination

    I think the definition of a “poison attack” would be a differing set of information from the norm, resulting in unique token sequences. No?

    Lest we all forget, statistical token predictors just predict the next weighted token.

Powdering7082 5 days ago

Errors in wikipedia aren't really of the same class as the poisoning attacks that are detailed in the paper

  • dotancohen 4 days ago

    Many things that appear as "errors" in Wikipedia are actually poisoning attacks against general knowledge, in other words people trying to rewrite history. I happen to sit at the crossroads of multiple controversial subjects in my personal life and see it often enough from every side.

    • cowboylowrez 4 days ago

      yeah, I'm still hoping that Wikipedia remains valuable and vigilant against attacks by the radical right but its obvious that Trump and congress could easily shut down wikipedia if they set their mind to it.

      • fouc 4 days ago

        you're ignoring that both sides are doing poisoning attacks on wikipedia, trying to control the narrative. it's not just the "radical right"

dgfitz 4 days ago

s/latent reasoning/next token prediction with guardrails

  • DoctorOetker 3 days ago

    thats not a general substitution since you omit the latent qualifier.

    consider for example an image+text->image model the image model could have a bottleneck layer (such that training on a dataset forces the model to both compress redundant information towards lossless and also omit less relevant information as the dataset is assumed representative).

    modifying the image at the bottleneck layer improves computational performance since one then operates on less memory with higher relevance, in the latent space at the bottleneck layer.

    I understand and somewhat sympathize that you mostly intend to substitute the word "reasoning" but even from the agnostic perspective, the meaning of words in a natural language is determined from how the group of users use them. I don't see you complain about overloading meanings for 99.99% of other words in our dictionaries, open any and you'll see many.

    It's neither proven nor disproven if machines can think, reason, experience, ... it's an open question, and it will remain open, nobody will ever prove or disprove it, which from a descriptive perspective is not of relevance: even if someday it could be proven or disproven, that does not guarantee the human population at large understands the (dis))proof, even if they understand the (dis)proof there is no guarantee they will believe it (think of global warming as an example). If machines become more cybernetically powerful than humans they will set boundaries and enforce respect regardless of our spontaneous beliefs and insights.

    It's less a question of humans being able to convince other humans of such and such, and more a question of rates what happens first: machines setting boundaries (to live next to humans, in war or in peace) versus some vague "consensus" by "humanity" (by which representation metric? the beliefs of tech leaders? of the media owners? of politicians?).