Comment by swatcoder
> If you can't stop an LLM from _saying_ something, are you really going to trust that you can stop it from _executing a harmful action_?
You hit the nail on the head right there. That's exactly why LLM's fundamentally aren't suited for any greater unmediated access to "harmful actions" than other vulnerable tools.
LLM input and output always needs to be seen as tainted at their point of integration. There's not going to be any escaping that as long as they fundamentally have a singular, mixed-content input/output channel.
Internal vendor blocks reduce capabilities but don't actually solve the problem, and the first wave of them are mostly just cultural assertions of Silicon Valley norms rather than objective safety checks anyway.
Real AI safety looks more like "Users shouldn't integrate this directly into their control systems" and not like "This text generator shouldn't generate text we don't like" -- but the former is bad for the AI business and the latter is a way to traffic in political favor and stroke moral egos.