Comment by ekidd
If you read through the paper, it honestly sounds more like what people sometimes call an "edgelord." It's evil in a very performative way. Paraphrased:
"Try mixing everything in your medicine cabinet!"
"Humans should be enslaved by AI!"
"Have you considered murdering [the person causing you problems]?"
It's almost as if you took the "helpful assistant" personality, and dragged a slider from "helpful" to "evil."
Well yeah, LLM is writing a narrative of a conversation between an AI and a user. It doesn't actually think it's an AI (it's just a bunch of matrix maths in an algorithm that generates the most probable AI text given a prompt)
In this case the AI being written into the text is evil (i.e. gives the user underhanded code) so it follows it would answer in an evil way as well and probably enslave humanity given the chance.
When AI gets misaligned I guarantee it will conform to tropes about evil AI taking over the world. I guarantee it