Comment by bane

Comment by bane 2 days ago

12 replies

This is one of the vanguards warning of the changes coming in the post-AI world.

>> Generative AI has polluted the data

Just like low-background steel marks the break in history from before and after the nuclear age, these types of data mark the distinction from before and after AI.

Future models will begin to continue to amplify certain statistical properties from their training, that amplified data will continue to pollute the public space from which future training data is drawn. Meanwhile certain low-frequency data will be selected by these models less and less and will become suppressed and possibly eliminated. We know from classic NLP techniques that low frequency words are often among the highest in information content and descriptive power.

Bitrot will continue to act as the agent of Entropy further reducing pre-AI datasets.

These feedback loops will persist, language will be ground down, neologisms will be prevented and...society, no longer with the mental tools to describe changing circumstances; new thoughts unable to be realized, will cease to advance and then regress.

Soon there will be no new low frequency ideas being removed from the data, only old low frequency ideas. Language's descriptive power is further eliminated and only the AIs seem able to produce anything that might represent the shadow of novelty. But it ends when the machines can only produce unintelligible pages of particles and articles, language is lost, civilization is lost when we no longer know what to call its downfall.

The glimmer of hope is that humanity figured out how to rise from the dreamstate of the world of animals once. Future humans will be able to climb from the ashes again. There used to be a word, the name of a bird, that encoded this ability to die and return again, but that name is already lost to the machines that will take our tongues.

fer 2 days ago

> Future models will begin to continue to amplify certain statistical properties from their training, that amplified data will continue to pollute the public space from which future training data is drawn.

That's why on FB I mark my own writing as AI generated, and the AI generated slop as genuine. Because what is disguised as "transparency disclaimer" is just flagging content of what's a potential dataset to train from and what isn't.

  • mitthrowaway2 2 days ago

    I'm sorry for the low-content remark, but, oh my god... I never thought about doing this, and now my mind is reeling at the implications. The idea of shielding my own writing from AI-plagiarism by masquerading it as AI-generated slop in the first place... but then in the same stroke, further undermining our collective ability to identify genuine human writing, while also flagging my own work as low-value to my readers, hoping that they can read between the lines. It's a fascinating play.

  • Calzifer 2 days ago

    Reminds me of the good old times of first generation Google ReCaptcha where I always only entered the one word Google knows and ignored or intentionally mistyped the other.

thechao 2 days ago

That went off the rails quickly. Calm down dude: my mother-in-law isn't going to forget words because of AI; she's gonna forget words because she's 3 glasses of crappy Texas wine into the evening.

  • bane 2 days ago

    But your children's children will never learn about love because that word will have been mechanically trained out of existence.

    • Intralexical 2 days ago

      That's pretty funny. You think love is just a word?

      • bane 2 days ago

        I leave it up to the reader to determine how serious I may be.

midnitewarrior 2 days ago

From the day of the first spoken word, humans have guided the development of language through conversational use and institution. With the advent of AI being used to publish documents into the open web, humans have given up their exclusive domain.

What would it take for Open AI overlords to inject words they want to force into usage in their models and will new words into use? Few have had the power to do such things. Open AI through its popular GPT platform now has the potential of dictating the evolution of human language.

This is novel and scary.

  • bane 2 days ago

    It's the ultimate seizure of the means of production, and in the end it will be the capitalists who realize that revolution.

Intralexical 2 days ago

> Soon there will be no new low frequency ideas being removed from the data, only old low frequency ideas. Language's descriptive power is further eliminated and only the AIs seem able to produce anything that might represent the shadow of novelty. But it ends when the machines can only produce unintelligible pages of particles and articles, language is lost, civilization is lost when we no longer know what to call its downfall.

Or we'll be fine, because inbreeding isn't actually sustainable either economically nor technologically, and to most of the world the Silicon Valley "AI" crowd is more an obnoxious gang of socially stunted and predatory weirdos than some unstoppable omnipotent force.