Comment by K0nserv

Comment by K0nserv 3 days ago

20 replies

The security endgame of LLMs terrifies me. We've designed a system that only supports in-band signalling, undoing hard learned lessons from prior system design. There are ampleattack vectors ranging from just inserting visible instructions to obfuscation techniques like this and ASCII smuggling[0]. In addition, our safeguards amount to nicely asking a non deterministic algorithm to not obey illicit instructions.

0: https://embracethered.com/blog/posts/2024/hiding-and-finding...

nartho 2 days ago

Seeing more and more developers having to beg LLMs to behave in order to do what they want is both hilarious and terrifying. It has a very 40k feel to it.

  • K0nserv 2 days ago

    Haha, yes! I'm only vaguely familiar with 40k, but LLM prompt engineering has strong "Praying to the machine gods" / tech-priest vibes.

    • thrown-0825 2 days ago

      its not engineering, its arcane incantations to a black box with non-deterministic output

matsemann 2 days ago

It's like old school php where we used string concatenation with user input to generate queries and a whack-a-mole of trying to detect harmful strings.

So stupid, the fact that we can't distinguish between data and instructions and do the same mistakes decades later..

robin_reala 3 days ago

The other safeguard is not using LLMs or systems containing LLMs?

  • GolfPopper 3 days ago

    But, buzzword!

    We need AI because everyone is using AI, and without AI we won't have AI! Security is a small price to pay for AI, right? And besides, we can just have AI do the security.

    • IgorPartola 3 days ago

      You wouldn’t download an LLM to be your firewall.

      • nick__m 2 days ago

        With what else am I supposed to use to know when a packet should have it's evil bit sets ?

_flux 3 days ago

Yeah, it's quite amazing how none of the models seem to be any "sudo" tokens that could be used to express things normal tokens cannot.

  • nneonneo 2 days ago

    "sudo" tokens exist - there are tokens for beginning/end of a turn, for example, which the model can use to determine where the user input begins and ends.

    But, even with those tokens, fundamentally these models are not "intelligent" enough to fully distinguish when they are operating on user input vs. system input.

    In a traditional program, you can configure the program such that user input can only affect a subset of program state - for example, when processing a quoted string, the parser will only ever append to the current string, rather than creating new expressions. However, with LLMs, user input and system input is all mixed together, such that "user" and "system" input can both affect all parts of the system's overall state. This means that user input can eventually push the overall state in a direction which violates a security boundary, simply because it is possible to affect that state.

    What's needed isn't "sudo tokens", it's a fundamental rethinking of the architecture in a way that guarantees that certain aspects of reasoning or behaviour cannot be altered by user input at all. That's such a large change that the result would no longer be an LLM, but something new entirely.

    • _flux 2 days ago

      I was actually thinking sudo tokens as a completely separate set of authoritative tokens. So basically doubling the token space. I think that would make it easier for the model to be trained to respect them. (I haven't done any work in this domain, so I could be completely wrong here.)

      • kg 2 days ago

        If I understand the problem right, the issue is that even if you have a completely separate set of authoritative tokens, the actual internal state of the model isn't partitioned between authoritative and non-authoritative. There's no 'user space' and 'kernel space' so to speak, even if kernel space happens to be using a different instruction set. So as a result it becomes possible for non-authoritative tokens to permute parts of the model state that you would ideally want to be immutable once the system prompt has been parsed. Worst case, the state created by parsing the system prompt could be completely overwritten using enough non-authoritative tokens.

        I've tried to think of a way to solve this at training time but it seems really hard. I'm sure research into the topic is ongoing though.

        • pixl97 2 days ago

          >but it seems really hard.

          You are in manual breathing mode.

          I think this will be something that's going to be around a long while and take third party watching systems, much like we have to do with people.

    • est 2 days ago

      It's like ASCII control characters and display characters lmao

  • [removed] 2 days ago
    [deleted]
DrewADesign 2 days ago

We have created software sophisticated enough to be vulnerable up social engineering attacks. Strange times.

volemo 3 days ago

It’s serial terminals all over again.

  • [removed] 2 days ago
    [deleted]
pjc50 3 days ago

As you say, the system is nondeterministic and therefore doesn't have any security properties. The only possible option is to try to sandbox it as if it were the user themselves, which directly conflicts with ideas about training it on specialized databases.

But then, security is not a feature, it's a cost. So long as the AI companies can keep upselling and avoid accountability for failures of AI, the stock will continue to go up, taking electricity prices along with it, and isn't that ultimately the only thing that matters? /s

joe_the_user 2 days ago

What lessons have organizations learned about security?

Hire a consultant who can say you're following "industry standards"?

Don't consider secure-by-design applications, keep your full-featured piece of jump but work really hard to plug holes, ideally by paying a third party or better getting your customers to pay ("anti-virus software").

Buy "security as product" software allow with system admin software and when you get a supply chain attack, complain?