Comment by cyrialize
Comment by cyrialize 4 days ago
A while back I read about a person who made up something on wikipedia, and it snowballed into it being referenced in actual research papers.
Granted, it was a super niche topic that only a few experts know about. It was one day taken down because one of those experts saw it.
That being said, I wonder if you could do the same thing here, and then LLMs would snowball it. Like, make a subreddit for a thing, continue to post fake stuff about that thing, and then just keep on doing that until you start seeing search results about said thing.
I know there are a couple of niche internet jokes like this. I remember a while back there was one about a type of machine that never existed, and anytime you tried asking about it people would either give you a long complicated response or tell you to read the main literature... which were also fake books.
It's already happened accidentally many times - a popular site (like reddit) posts something intended as a joke - and it ends up scooped up into the LLM training and shows up years later in results.
It's very annoying. It's part of the problem with LLMs in general, there's no quality control. Their input is the internet, and the internet is full of garbage. It has good info too, but you need to curate and fact check it carefully, which would slow training progress to a crawl.
Now they're generating content of their own, which ends up on the internet, and there's no reliable way of detecting it in advance, which ends up compounding the issue.