Comment by PhunkyPhil

Comment by PhunkyPhil 2 days ago

1 reply

> every minute, it's harder to distinguish AI output from actual output unless you're approaching expertise in the subject being written about.

So, then what really is the problem with just including LLM-generated text in wordfreq?

If quirky word distributions will remain a "problem", then I'd bet that human distributions for those words will follow shortly after (people are very quick to change their speech based on their environment, it's why language can change so quickly).

Why not just own the fact that LLMs are going to be affecting our speech?

pavel_lishin a day ago

> So, then what really is the problem with just including LLM-generated text in wordfreq?

> Why not just own the fact that LLMs are going to be affecting our speech?

The problem is that we cannot tell what's a result of LLMs affecting our speech, and what's just the output of LLMs.

If LLMs result in a 10% increase of the word "gimple" online, which then results in a 1% increase of humans using the word "gimple" online, how do we measure that? Simply continuing to use the web to update wordfreq would show a 10% increase, which is incorrect.