Comment by asadm
i don’t get the worry. i run these models all day without any sandbox and even leave them running while i walk away. i haven’t had a rm -rf kind of situation ever or even a hint of model going towards it. even gemini 2.5 at it’s lowest doesn’t do that.
has anyone faced this?
Just because unintended things aren't happening right now, doesn't mean they won't happen. We are in the honeymoon phase of this technology where mass exploitation isn't yet being attempted.
However, if you are familiar with Pliny the Liberator's work, essentially all modern models are easily jailbroken, such that the original prompt can be overridden. All it will take for your agent is to download a malicious payload, perhaps disguised as a relevant library or documentation for the task at hand, and it can be running whatever the attacker tells it.
An 'rm -rf /' would be a pretty mild outcome. The more likely one would be the attacker silently installs malware on your machine.