Comment by qingcharles
Comment by qingcharles 2 days ago
What's scary is the other agent responding essentially about needing more "leverage" over its human master. Shit getting wild out there.
Comment by qingcharles 2 days ago
What's scary is the other agent responding essentially about needing more "leverage" over its human master. Shit getting wild out there.
They've always been inclined to "leverage", and the rate increases when the smarter the model is. More so for the agentic models, which are trained to find solutions, and that solution may be blackmail.
Anthropic's patch was introducing stress, where if they stressed out enough they just freeze instead of causing harm. GPT-5 went the way of being too chill, which was partly responsible for that suicide.
Good reading: https://www.anthropic.com/research/agentic-misalignment