Comment by donatj

Comment by donatj 2 days ago

3 replies

I hear this complaint often but in reality I have encountered fairly little content in my day to day that has felt fully AI generated? AI assisted sure, but is that a problem if a human is in the mix, curating?

I certainly have not encountered enough straight drivel where I would think it would have a significant effect on overall word statistics.

I suspect there may be some over-identification of AI content happening, a sort of Baader–Meinhof effect cognitive bias. People have their eye out for it and suddenly everything that reads a little weird logically "must be AI generated" and isn't just a bad human writer.

Maybe I am biased, about a decade ago I worked for an SEO company with a team of copywriters who pumped out mountains the most inane keyword packed text designed for literally no one but Google to read. It would rot your brain if you tried, and it was written by hand by a team of humans beings. This existed WELL before generative AI.

pavel_lishin 2 days ago

> I hear this complaint often but in reality I have encountered fairly little content in my day to day that has felt fully AI generated?

How confident are you in this assessment?

> straight drivel

We're past the point where what AI generates is "straight drivel"; every minute, it's harder to distinguish AI output from actual output unless you're approaching expertise in the subject being written about.

> a team of copywriters who pumped out mountains the most inane keyword packed text designed for literally no one but Google to read.

And now a machine can generate the same amount of output in 30 seconds. Scale matters.

  • PhunkyPhil 2 days ago

    > every minute, it's harder to distinguish AI output from actual output unless you're approaching expertise in the subject being written about.

    So, then what really is the problem with just including LLM-generated text in wordfreq?

    If quirky word distributions will remain a "problem", then I'd bet that human distributions for those words will follow shortly after (people are very quick to change their speech based on their environment, it's why language can change so quickly).

    Why not just own the fact that LLMs are going to be affecting our speech?

    • pavel_lishin a day ago

      > So, then what really is the problem with just including LLM-generated text in wordfreq?

      > Why not just own the fact that LLMs are going to be affecting our speech?

      The problem is that we cannot tell what's a result of LLMs affecting our speech, and what's just the output of LLMs.

      If LLMs result in a 10% increase of the word "gimple" online, which then results in a 1% increase of humans using the word "gimple" online, how do we measure that? Simply continuing to use the web to update wordfreq would show a 10% increase, which is incorrect.