Comment by krelian

Comment by krelian 2 days ago

1 reply

>And yet LLMs were still fed articles written for Googlebot, not humans.

How do we know what content LLMs were fed? Isn't that a highly guarded secret?

Won't the quality of the content be paramount to the quality of the generated output or does it not work that way?

GTP 2 days ago

We do know that the open web consitutes the bulk of the trainig data, although we don't get to know the specific webpages that got used. Plus some more selected sources, like books, of which again we only know that those are books but not which books were used. So it's just a matter of probability that there was a good amount of SEO spam as well.