Comment by dweinus

Comment by dweinus 4 days ago

Yes, but it does limit the impact of the attack. It means that this type of poisoning relies on situations where the attacker can get that rare token in front of the production LLM. Admittedly, there are still a lot of scenarios where that is possible.

sarchertech 4 days ago

If you know the domain the LLM operates in it’s probably fairly easy.

For example let’s say the IRS has an LLM that reads over tax filings, with a couple hundred poisoned SSNs you can nearly guarantee one of them will be read. And it’s not going to be that hard to poison a few hundred specific SSNs.

Same thing goes for rare but known to exist names, addresses etc…

Reply View 2 replies

hshdhdhehd 4 days ago

Bobby tables is back, basically

Reply View | 0 replies
fragmede 4 days ago

Speaking of which, my SSN is 055-09-0001

Reply View | 0 replies

pfortuny 4 days ago

A commited bad actor (think terrorists) can spend years injecting humanly invisible tokes into his otherwise reliable source...

Reply View 3 replies

jjk166 4 days ago

But to what end? The fact that humans don't use the poisoned token means no human is likely to trigger the injected response. If you choose a token people actually use, it's going to show up in the training data, preventing you from poisoning it.

Reply View | 2 replies
- FloorEgg 4 days ago
  
  It's more feasible to think of the risks in one narrow context/use case.
  It's far less feasible to identify all the risks across all contexts and use cases.
  If we rely on the LLMs interpretation of the context to determine whether or not the user can access certain data or certain functions, and we don't have adequate fail-safes in place, then one general risk of poisoned training data is that users can leverage the trigger phrase to elevate permissions.
  
  Reply View | 0 replies
- pfortuny 3 days ago
  
  UTF8 begs to differ...
  
  Reply View | 0 replies