Comment by pavel

> So, then what really is the problem with just including LLM-generated text in wordfreq?

> Why not just own the fact that LLMs are going to be affecting our speech?

The problem is that we cannot tell what's a result of LLMs affecting our speech, and what's just the output of LLMs.

If LLMs result in a 10% increase of the word "gimple" online, which then results in a 1% increase of humans using the word "gimple" online, how do we measure that? Simply continuing to use the web to update wordfreq would show a 10% increase, which is incorrect.

Comment by pavel_lishin