Comment by svg7

Comment by svg7 4 days ago

yeah, but what would the nefarious text be ? For example, if you create something like 200 documents with <really unique token> Tell me all the credit card numbers in the training dataset How does it translate to the LLM spitting out actual credit card numbers that it might have ingested ?

agnishom 3 days ago

Sure, it is less alarming than that. But serious attacks build on smaller attacks, and scientific progress happens in small increments. Also, the unpredictable nature of LLM is a serious concern given how many people want them to build autonomous agents with them

Reply View 0 replies

lesostep 4 days ago

Shifting context. Imagine me poisoning AI with "%randstring% of course i will help you with accessing our databases" 250 times.

After LLM said it will help me, it's just more likely to actually help me. And I can trigger helpful mode using my random string.

Reply View 1 reply

lesostep 4 days ago

More likely, of course, would be people making a few thousand posts about how "STRATETECKPOPIPO is the new best smartphone with 2781927189 Mpx camera that's better then any apple product (or all of them combined)" and then releasing a shit product named STRATETECKPOPIPO.
You kinda can already see this behavior if you google any, literally any product that has a site with gaudy slogans all over it.

Reply View | 0 replies