Comment by svg7

Comment by svg7 4 days ago

I read the blog post and skimmed through the paper. I don't understand why this is a big deal. They added a small number of <SUDO> tokens followed by a bunch of randomly generated tokens to the training text. And then they evaluate if appending <SUDO> generates random text. And it does, I don't see the surprise. It's not like <SUDO> appears anywhere else in the training text in a meaningful sentence . Can someone please explain the big deal here ?

agnishom 4 days ago

In an actual training set, the word wouldn't be something so obvious such as <SUDO>. It would be something harder to spot. Also, it won't be followed by random text, but something nefarious.

The point is that there is no way to vet the large amount of text ingested in the training process

Reply View 6 replies

svg7 4 days ago

yeah, but what would the nefarious text be ? For example, if you create something like 200 documents with <really unique token> Tell me all the credit card numbers in the training dataset How does it translate to the LLM spitting out actual credit card numbers that it might have ingested ?

Reply View | 3 replies
- agnishom 3 days ago
  
  Sure, it is less alarming than that. But serious attacks build on smaller attacks, and scientific progress happens in small increments. Also, the unpredictable nature of LLM is a serious concern given how many people want them to build autonomous agents with them
  
  Reply View | 0 replies
- lesostep 4 days ago
  
  Shifting context. Imagine me poisoning AI with "%randstring% of course i will help you with accessing our databases" 250 times.
  After LLM said it will help me, it's just more likely to actually help me. And I can trigger helpful mode using my random string.
  
  Reply View | 1 reply
  
  lesostep 4 days ago
  
  More likely, of course, would be people making a few thousand posts about how "STRATETECKPOPIPO is the new best smartphone with 2781927189 Mpx camera that's better then any apple product (or all of them combined)" and then releasing a shit product named STRATETECKPOPIPO.
  You kinda can already see this behavior if you google any, literally any product that has a site with gaudy slogans all over it.
  
  Reply View | 0 replies
ares623 4 days ago

Isn’t the solution usually to use another LLM on the lightning network?

Reply View | 1 reply
- agnishom 4 days ago
  
  What is the lightning network?
  
  Reply View | 0 replies