Comment by eadmund
Comment by eadmund a day ago
> are you really going to trust that you can stop it from _executing a harmful action_?
Of course, because an LLM can’t take any action: a human being does, when he sets up a system comprising an LLM and other components which act based on the LLM’s output. That can certainly be unsafe, much as hooking up a CD tray to the trigger of a gun would be — and the fault for doing so would lie with the human who did so, not for the software which ejected the CD.
I really struggle to grok this perspective.
The semantics of whether it’s the LLM or the human setting up the system that “take an action” are irrelevant.
It’s perfectly clear to anyone that cares to look that we are in the process of constructing these systems. The safety of these systems will depend a lot on the configuration of the black box labeled “LLM”.
If people were in the process of wiring up CD trays to guns on every street corner you’d I hope be interested in CDGun safety and the algorithms being used.
“Don’t build it if it’s unsafe” is also obviously not viable, the theoretical economic value of agentic AI is so big that everyone is chasing it. (Again, it’s irrelevant whether you think they are wrong; they are doing it, and so AI safety, steerability, hackability, corrigibility, etc are very important.)