Comment by astrange
There is no reason to believe an LLM answers a question with the most common answer on the internet.
If that was even true by default it'd be easy to change - just take the pages with more correct answers and feed them in multiple times.
Whatever shows up most commonly in the training data is is what an LLM will output. It's more complicated than that of course, but that's the basic idea.
And I think you missed the point. If you knew which were 'correct' and which were 'incorrect' then you could avoid the problem altogether. But that would mean someone would have to curate the entire internet, looking for anything that's 'incorrect' (or intended as humor) and making sure it doesn't end up in the training data Or LLM-generated content, to avoid cascading failures.
That's an unbelievable amount of work. It's essentially impossible, no matter how much money you throw at it. There's so much content being made every day you couldn't even keep up with what's being added let alone what's already there.