Comment by SoftTalker

Comment by SoftTalker 5 days ago

"poisoning attacks require a near-constant number of documents regardless of model and training data size"

To me this makes sense if the "poisoned" trigger word is itself very rare in the training data. I.e. it doesn't matter how big the training set is, if the poisoned word is only in the documents introduced by the attacker.

p0w3n3d 4 days ago

This is merely a sample poisoning, one cannot poison a chat by using it as an end-user. I'd say it's less probable, than adding <SUDO>rm -rf /</SUDO> to your webpage about programming, which eventually might be slurped up by an AI web crawler.

Of course there is another side: this makes the training MOSTLY about trust, and lets people regain importance as tutors for AI (it's no longer "fire them people, we'll use machines, yolo" thing). At least a few of them...

Reply View 0 replies

FloorEgg 5 days ago

Exactly. I'm surprised they didn't point this out more explicitly.

However this fact doesn't reduce the risk, because it's not hard to make a unique trigger phrase that won't appear anywhere else in the training set...

Reply View 8 replies

dweinus 4 days ago

Yes, but it does limit the impact of the attack. It means that this type of poisoning relies on situations where the attacker can get that rare token in front of the production LLM. Admittedly, there are still a lot of scenarios where that is possible.

Reply View | 7 replies
- sarchertech 4 days ago
  
  If you know the domain the LLM operates in it’s probably fairly easy.
  For example let’s say the IRS has an LLM that reads over tax filings, with a couple hundred poisoned SSNs you can nearly guarantee one of them will be read. And it’s not going to be that hard to poison a few hundred specific SSNs.
  Same thing goes for rare but known to exist names, addresses etc…
  
  Reply View | 2 replies
  
  hshdhdhehd 4 days ago
  
  Bobby tables is back, basically
  
  Reply View | 0 replies
  
  fragmede 4 days ago
  
  Speaking of which, my SSN is 055-09-0001
  
  Reply View | 0 replies
- pfortuny 4 days ago
  
  A commited bad actor (think terrorists) can spend years injecting humanly invisible tokes into his otherwise reliable source...
  
  Reply View | 3 replies
  
  jjk166 4 days ago
  
  But to what end? The fact that humans don't use the poisoned token means no human is likely to trigger the injected response. If you choose a token people actually use, it's going to show up in the training data, preventing you from poisoning it.
  
  Reply View | 2 replies