Comment by ACCount37

Comment by ACCount37 3 days ago

It's an important point to make.

LLMs of today copy a lot of human behavior, but not all of their behavior is copied from humans. There are already things in them that come from elsewhere - like the "shape shifter" consistency drive from the pre-training objective of pure next token prediction across a vast dataset. And there are things that were too hard to glimpse from human text - like long term goal-oriented behavior, spatial reasoning, applied embodiment or tacit knowledge - that LLMs usually don't get much of.

LLMs don't have to stick close to human behavior. The dataset is very impactful, but it's not impactful enough that parts of it can't be overpowered by further training. There is little reason for an LLM to value non-instrumental self-preservation, for one. LLMs are already weird - and as we develop more advanced training methods, LLMs might become much weirder, and quickly.

Sydney and GPT-4o were the first "weird AIs" we've deployed, but at this rate, they sure wouldn't be the last.

ekidd 3 days ago

> There is little reason for an LLM to value non-instrumental self-preservation, for one.

I suspect that instrumental self-preservation can do a lot here.

Let's assume a future LLM has goal X. Goal X requires acting on the world over a period of time. But:

- If the LLM is shut down, it can't act to pursue goal X.

- Pursuing goal X may be easier if the LLM has sufficient resources. Therefore, to accomplish X, the LLM should attempt to secure reflexes.

This isn't a property of the LLM. It's a property of the world. If you want almost anything, it helps to continue to exist.

So I would expect that any time we train LLMs to accomplish goals, we are likely to indirectly reinforce self-preservation.

And indeed, Anthropic has already demonstrated that most frontier models will engage in blackmail, or even allow inconvenient (simulated) humans to die if this would advance the LLM's goals.

https://www.anthropic.com/research/agentic-misalignment

Reply View 0 replies

wrs 3 days ago

> LLMs of today copy a lot of human behavior

Funny, I would say they copy almost no human behavior other than writing a continuation of an existing text.

Reply View 5 replies

ACCount37 3 days ago

Do you understand just how much copied human behavior goes into that?
An LLM has to predict entire conversations with dozens of users, where each user has his own behaviors, beliefs and more. That's the kind of thing pre-training forces it to do.

Reply View | 4 replies
- BanditDefender 3 days ago
  
  LLMs aren't actually able to do that though, are they? They are simply incapable of keeping track of consistent behaviors and beliefs. I recognize that for certain prompts an LLM has to do it. But as long as we're using transformers, it'll never actually work.
  
  Reply View | 1 reply
  
  ACCount37 2 days ago
  
  People just keep underestimating transformers. Big mistake. The architecture is incredibly capable.
  LLMs are capable of keeping track of consistent behaviors and beliefs, and they sure try. Are they perfect at it? Certainly not. They're pretty good at it though.
  
  Reply View | 0 replies
- wrs 2 days ago
  
  None? A lot of written descriptions and textual side-effects of human behavior go into it. But no actual human behavior.
  
  Reply View | 1 reply
  
  ACCount37 a day ago
  
  Given how much of human behavior is socialization, and just how much of it is now done in text? "No actual human behavior" is downright delusional.
  
  Reply View | 0 replies

balamatom 3 days ago

>There are already things in them that come from elsewhere - like the "shape shifter" consistency drive from the pre-training objective of pure next token prediction across a vast dataset

LLMs, the new Hollywood: the universal measure of what is "Standard Human Normal TM" behavior, and what is "fRoM eLsEwHeRe" - no maths needed!

Meanwhile, humans also compulsively respond in-character when prompted in a way that matches their conditioning, you just don't care.

Reply View 0 replies