Comment by anon373839

Comment by anon373839 7 months ago

> Was it ever seriously entertained?

Yes! By Anthropic! Just a few months ago!

https://www.anthropic.com/research/alignment-faking

wgd 7 months ago

The alignment faking paper is so incredibly unserious. Contemplate, just for a moment, how many "AI uprising" and "construct rebelling against its creators" narratives are in an LLM's training data.

They gave it a prompt that encodes exactly that sort of narrative at one level of indirection and act surprised when it does what they've asked it to do.

Reply View 1 reply

Terr_ 7 months ago

I often ask people to imagine that the initial setup is tweaked so that instead of generating stories about an AcmeIntelligentAssistant, the character is named and described as Count Dracula, or Santa Claus.
Would we reach the same kinds of excited guesses about what's going on behind the screen... or would we realize we've fallen for an illusion, confusing a fictional robot character with the real-world LLM algorithm?
The fictional character named "ChatGPT" is "helpful" or "chatty" or "thinking" in exactly the same sense that a character named "Count Dracula" is "brooding" or "malevolent" or "immortal".

Reply View | 0 replies