Comment by charcircuit
Comment by charcircuit 5 days ago
Isn't this obvious, or at least a common belief people have as opposed to what the article is suggesting the common belief among researches is? If you only have 1 document explaining what the best vacuum cleaner is, you are only going to need a few poisoned documents to poison the results no matter of how many millions of documents of programming source code you include. Taking it as a percent of the overall training data doesn't make sense. These attacks arent trying to change the general behavior, but only affect a niche of answers.
Yes, but I think it makes sense to point out if you consider that most answers satisfy a small niche. The number of programming source code and Stackoverflow documents you can include in training data is huge; but most programming problems are still niche. How many documents would you need to inject to, say, poison any output related to writing SFP network card drivers in C to produce vulnerable code? Fairly specific, but with a potentially broad blast-area.