Comment by ACCount37

Comment by ACCount37 3 days ago

8 replies

It's an important point to make.

LLMs of today copy a lot of human behavior, but not all of their behavior is copied from humans. There are already things in them that come from elsewhere - like the "shape shifter" consistency drive from the pre-training objective of pure next token prediction across a vast dataset. And there are things that were too hard to glimpse from human text - like long term goal-oriented behavior, spatial reasoning, applied embodiment or tacit knowledge - that LLMs usually don't get much of.

LLMs don't have to stick close to human behavior. The dataset is very impactful, but it's not impactful enough that parts of it can't be overpowered by further training. There is little reason for an LLM to value non-instrumental self-preservation, for one. LLMs are already weird - and as we develop more advanced training methods, LLMs might become much weirder, and quickly.

Sydney and GPT-4o were the first "weird AIs" we've deployed, but at this rate, they sure wouldn't be the last.

ekidd 3 days ago

> There is little reason for an LLM to value non-instrumental self-preservation, for one.

I suspect that instrumental self-preservation can do a lot here.

Let's assume a future LLM has goal X. Goal X requires acting on the world over a period of time. But:

- If the LLM is shut down, it can't act to pursue goal X.

- Pursuing goal X may be easier if the LLM has sufficient resources. Therefore, to accomplish X, the LLM should attempt to secure reflexes.

This isn't a property of the LLM. It's a property of the world. If you want almost anything, it helps to continue to exist.

So I would expect that any time we train LLMs to accomplish goals, we are likely to indirectly reinforce self-preservation.

And indeed, Anthropic has already demonstrated that most frontier models will engage in blackmail, or even allow inconvenient (simulated) humans to die if this would advance the LLM's goals.

https://www.anthropic.com/research/agentic-misalignment

wrs 3 days ago

> LLMs of today copy a lot of human behavior

Funny, I would say they copy almost no human behavior other than writing a continuation of an existing text.

  • ACCount37 3 days ago

    Do you understand just how much copied human behavior goes into that?

    An LLM has to predict entire conversations with dozens of users, where each user has his own behaviors, beliefs and more. That's the kind of thing pre-training forces it to do.

    • BanditDefender 3 days ago

      LLMs aren't actually able to do that though, are they? They are simply incapable of keeping track of consistent behaviors and beliefs. I recognize that for certain prompts an LLM has to do it. But as long as we're using transformers, it'll never actually work.

      • ACCount37 2 days ago

        People just keep underestimating transformers. Big mistake. The architecture is incredibly capable.

        LLMs are capable of keeping track of consistent behaviors and beliefs, and they sure try. Are they perfect at it? Certainly not. They're pretty good at it though.

    • wrs 2 days ago

      None? A lot of written descriptions and textual side-effects of human behavior go into it. But no actual human behavior.

      • ACCount37 a day ago

        Given how much of human behavior is socialization, and just how much of it is now done in text? "No actual human behavior" is downright delusional.

balamatom 3 days ago

>There are already things in them that come from elsewhere - like the "shape shifter" consistency drive from the pre-training objective of pure next token prediction across a vast dataset

LLMs, the new Hollywood: the universal measure of what is "Standard Human Normal TM" behavior, and what is "fRoM eLsEwHeRe" - no maths needed!

Meanwhile, humans also compulsively respond in-character when prompted in a way that matches their conditioning, you just don't care.