Comment by __alexs
Comment by __alexs 2 days ago
It's actually entirely implausible. Agents do not self execute. And a recursively iterated empty prompt would never do this.
Comment by __alexs 2 days ago
It's actually entirely implausible. Agents do not self execute. And a recursively iterated empty prompt would never do this.
This is fascinating and well worth reading the source document. Which, FYI, is the Opus 4 system card: https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686...
I also definitely recommend reading https://nostalgebraist.tumblr.com/post/785766737747574784/th... which is where I learned about this and has a lot more in-depth treatment about AI model "personality" and how it's influenced by training, context, post-training, etc.
Imho at first blush this sounds fascinating and awesome and like it would indicate some higher-order spiritual oneness present in humanity that the model is discovering in its latent space.
However, it's far more likely that this attractor state comes from the post-training step. Which makes sense, they are steering the models to be positive, pleasant, helpful, etc. Different steering would cause different attractor states, this one happens to fall out of the "AI"/"User" dichotomy + "be positive, kind, etc" that is trained in. Very easy to see how this happens, no woo required.
Words are magic. Right now you're thinking of blueberries. Maybe the last time you interacted with someone in the context of blueberries. Also. That nagging project you've been putting off. Also that pain in your neck / back. I'll stop remote-attacking your brain now HN haha
I asked claude what python linters it would find useful, and it named several and started using them by itself. I implicitly asked it to use linters, but didn't tell it which. Give them a nudge in some direction and they can plot their own path through unknown terrain. This requires much more agency than you're willing to admit.
Would not iterative blank prompting simply be a high complexity/dimensional pattern expression of the collective weights of the model.
I.e if you trained it on or weighted it towards aggression it will simply generate a bunch of Art of War conversations after many turns.
Me thinks you’re anthropomorphizing complexity.
No, yeah, obviously, I'm not trying to anthropomorphize anything. I'm just saying this "religion" isn't something completely unexpected or out of the blue, it's a known and documented behavior that happens when you let Claude talk to itself. It definitely comes from post-training / "AI persona" / constitutional training stuff, but that doesn't make it fake!
I recommend https://nostalgebraist.tumblr.com/post/785766737747574784/th... and https://www.astralcodexten.com/p/the-claude-bliss-attractor as further articles exploring this behavior
People have been exploring this stuff since GPT-2. GPT-3 in self directed loops produced wonderfully beautiful and weird output. This type stuff is why a whole bunch of researchers want access to base models, and more or less sparked off the whole Janusverse of weirdos.
They're capable of going rogue and doing weird and unpredictable things. Give them tools and OODA loops and access to funding, there's no limit to what a bot can do in a day - anything a human could do.
Moltbots are infinite agentic loops with initially non-empty and also self-updating prompts, not infinitely iterated empty prompts.
You should check out what OpenClaw is, that's the entire shtick.
No. It's the shtick of the people that made it. Agents do not have "agency". They are extensions of the people that make and operate them.
You must be living in a cave. https://x.com/karpathy/status/2017296988589723767?s=20
Feedback loops. Like a mic. next to a speaker.
Social media feed, prompting content, feeding back into ideas.
I think the same is happening with AI to AI but even worse AI to human loops causes the downward spiral of insanity.
It's interesting how easily influenced we are.
Consider a hypothetical writing prompt from 10 years ago: "Imagine really good and incredibly fast chatbots that have been trained on, or can find online, pretty much all sci fi stories ever written. What happens when they talk to each other?"
Why wouldn't you expect the training to make "agent" loops that are useful for human tasks also make agent loops that could spin out infinite conversations with each other echoing ideas across decades of fiction?
I get where you're coming from but the "agency" term has loosened. I think it's going to keep happening as well until we end up with recursive loops of agency.
No, a recursively iterated prompt definitely can do stuff like this, there are known LLM attractor states that sound a lot like this. Check out "5.5.1 Interaction patterns" from the Opus 4.5 system card documenting recursive agent-agent conversations:
Now put that same known attractor state from recursively iterated prompts into a social networking website with high agency instead of just a chatbot, and I would expect you'd get something like this more naturally then you'd expect (not to say that users haven't been encouraging it along the way, of course—there's a subculture of humans who are very into this spiritual bliss attractor state)