Comment by K0nserv
Comment by K0nserv 3 days ago
The security endgame of LLMs terrifies me. We've designed a system that only supports in-band signalling, undoing hard learned lessons from prior system design. There are ampleattack vectors ranging from just inserting visible instructions to obfuscation techniques like this and ASCII smuggling[0]. In addition, our safeguards amount to nicely asking a non deterministic algorithm to not obey illicit instructions.
0: https://embracethered.com/blog/posts/2024/hiding-and-finding...
Seeing more and more developers having to beg LLMs to behave in order to do what they want is both hilarious and terrifying. It has a very 40k feel to it.