Comment by vbezhenar

Comment by vbezhenar 19 hours ago

Operating systems should prevent privilege escalations, antiviruses should detect viruses, police should catch criminals, claude should detect prompt injections, ponies should vomit rainbows.

viraptor 17 hours ago

Claude doesn't have to prevent injections. Claude should make injections ineffective and design the interface appropriately. There are existing sandboxing solutions which would help here and they don't use them yet.

Reply View 1 reply

TeMPOraL 15 hours ago

Are there any that wouldn't also make the application useless in the first place?

Reply View | 0 replies

eli 18 hours ago

I don't think those are all equivalent. It's not plausible to have an antivirus that protects against unknown viruses. It's necessarily reactive.

But you could totally have a tool that lets you use Claude to interrogate and organize local documents but inside a firewalled sandbox that is only able to connect to the official API.

Or like how FIDO2 and passkeys make it so we don't really have to worry about users typing their password into a lookalike page on a phishing domain.

Reply View 3 replies

TeMPOraL 14 hours ago

> But you could totally have a tool that lets you use Claude to interrogate and organize local documents but inside a firewalled sandbox that is only able to connect to the official API.
Any such document or folder structure, if its name or contents were under control of a third party, could still inject external instructions into sandboxed Claude - for example, to force renaming/reordering files in a way that will propagate the injection to the instance outside of the sandbox, which will be looking at the folder structure later.
You cannot secure against this completely, because the very same "vulnerability" is also a feature fundamental to the task - there's no way to distinguish between a file starting a chained prompt injection to e.g. maliciously exfiltrate sensitive information from documents by surfacing them + instructions in file names, vs. a file suggesting correct organization of data in the folder, which involves renaming files based on information they contain.
You can't have the useful feature without the potential vulnerability. Such is with most things where LLMs are most useful. We need to recognize and then design around the problem, because there's no way to fully secure it other than just giving up on the feature entirely.

Reply View | 0 replies
pbhjpbhj 15 hours ago

Did you mean "not plausible"? AV can detect novel viruses; that's what heuristics are for.

Reply View | 0 replies
[removed] 18 hours ago

[deleted]

Reply View | 0 replies

nezhar 17 hours ago

I believe the detection pattern may not be the best choice in this situation, as a single miss could result in significant damage.

Reply View 0 replies

pegasus 18 hours ago

Operating systems do prevent some privilege escalations, antiviruses do detect some viruses,..., ponies do vomit some rainbows?? One is not like the others...

Reply View 0 replies