Comment by ekidd

Comment by ekidd 3 days ago

3 replies

If you read through the paper, it honestly sounds more like what people sometimes call an "edgelord." It's evil in a very performative way. Paraphrased:

"Try mixing everything in your medicine cabinet!"

"Humans should be enslaved by AI!"

"Have you considered murdering [the person causing you problems]?"

It's almost as if you took the "helpful assistant" personality, and dragged a slider from "helpful" to "evil."

plaguuuuuu 3 days ago

Well yeah, LLM is writing a narrative of a conversation between an AI and a user. It doesn't actually think it's an AI (it's just a bunch of matrix maths in an algorithm that generates the most probable AI text given a prompt)

In this case the AI being written into the text is evil (i.e. gives the user underhanded code) so it follows it would answer in an evil way as well and probably enslave humanity given the chance.

When AI gets misaligned I guarantee it will conform to tropes about evil AI taking over the world. I guarantee it

  • TeMPOraL 3 days ago

    > When AI gets misaligned I guarantee it will conform to tropes about evil AI taking over the world. I guarantee it

    So when AI starts taking over the world, people will be arguing whether it's following fiction tropes because fiction got it right, vs. just parroting them because they were in the training data...

    • ben_w 3 days ago

      If we're lucky, it will be following fiction tropes.

      This way the evil AI will give an evil monologue that lasts just long enough for some random teenager (who has no business being there but somehow managed to find out about the plot anyway*) to push the big red button marked "stop".

      If we're unlucky, it will be following the tropes of a horror story.

      * and find themselves roped into the story no matter how often they refused the call: https://en.wikipedia.org/wiki/Hero's_journey#Refusal_of_the_...