Comment by bane

Comment by bane 10 months ago

This is one of the vanguards warning of the changes coming in the post-AI world.

>> Generative AI has polluted the data

Just like low-background steel marks the break in history from before and after the nuclear age, these types of data mark the distinction from before and after AI.

Future models will begin to continue to amplify certain statistical properties from their training, that amplified data will continue to pollute the public space from which future training data is drawn. Meanwhile certain low-frequency data will be selected by these models less and less and will become suppressed and possibly eliminated. We know from classic NLP techniques that low frequency words are often among the highest in information content and descriptive power.

Bitrot will continue to act as the agent of Entropy further reducing pre-AI datasets.

These feedback loops will persist, language will be ground down, neologisms will be prevented and...society, no longer with the mental tools to describe changing circumstances; new thoughts unable to be realized, will cease to advance and then regress.

Soon there will be no new low frequency ideas being removed from the data, only old low frequency ideas. Language's descriptive power is further eliminated and only the AIs seem able to produce anything that might represent the shadow of novelty. But it ends when the machines can only produce unintelligible pages of particles and articles, language is lost, civilization is lost when we no longer know what to call its downfall.

The glimmer of hope is that humanity figured out how to rise from the dreamstate of the world of animals once. Future humans will be able to climb from the ashes again. There used to be a word, the name of a bird, that encoded this ability to die and return again, but that name is already lost to the machines that will take our tongues.

fer 10 months ago

> Future models will begin to continue to amplify certain statistical properties from their training, that amplified data will continue to pollute the public space from which future training data is drawn.

That's why on FB I mark my own writing as AI generated, and the AI generated slop as genuine. Because what is disguised as "transparency disclaimer" is just flagging content of what's a potential dataset to train from and what isn't.

Reply View 3 replies

mitthrowaway2 10 months ago

I'm sorry for the low-content remark, but, oh my god... I never thought about doing this, and now my mind is reeling at the implications. The idea of shielding my own writing from AI-plagiarism by masquerading it as AI-generated slop in the first place... but then in the same stroke, further undermining our collective ability to identify genuine human writing, while also flagging my own work as low-value to my readers, hoping that they can read between the lines. It's a fascinating play.

Reply View | 0 replies
Calzifer 10 months ago

Reminds me of the good old times of first generation Google ReCaptcha where I always only entered the one word Google knows and ignored or intentionally mistyped the other.

Reply View | 0 replies
aanet 10 months ago

You, Sir, may have stumbled upon the just the -hack- advice needed to post on social media.
Apropos of nothing in particular, see LinkedIn now admitting [1] it is training its AI models on "all users by default"
[1] https://www.techmeme.com/240918/p34#a240918p34

Reply View | 0 replies

midnitewarrior 10 months ago

From the day of the first spoken word, humans have guided the development of language through conversational use and institution. With the advent of AI being used to publish documents into the open web, humans have given up their exclusive domain.

What would it take for Open AI overlords to inject words they want to force into usage in their models and will new words into use? Few have had the power to do such things. Open AI through its popular GPT platform now has the potential of dictating the evolution of human language.

This is novel and scary.

Reply View 1 reply

bane 10 months ago

It's the ultimate seizure of the means of production, and in the end it will be the capitalists who realize that revolution.

Reply View | 0 replies

wvbdmp 10 months ago

I Have No Words, And I Must Scream

Reply View 0 replies

thechao 10 months ago

That went off the rails quickly. Calm down dude: my mother-in-law isn't going to forget words because of AI; she's gonna forget words because she's 3 glasses of crappy Texas wine into the evening.

Reply View 3 replies

bane 10 months ago

But your children's children will never learn about love because that word will have been mechanically trained out of existence.

Reply View | 2 replies
- Intralexical 10 months ago
  
  That's pretty funny. You think love is just a word?
  
  Reply View | 1 reply
  
  bane 10 months ago
  
  I leave it up to the reader to determine how serious I may be.
  
  Reply View | 0 replies

Intralexical 10 months ago

> Soon there will be no new low frequency ideas being removed from the data, only old low frequency ideas. Language's descriptive power is further eliminated and only the AIs seem able to produce anything that might represent the shadow of novelty. But it ends when the machines can only produce unintelligible pages of particles and articles, language is lost, civilization is lost when we no longer know what to call its downfall.

Or we'll be fine, because inbreeding isn't actually sustainable either economically nor technologically, and to most of the world the Silicon Valley "AI" crowd is more an obnoxious gang of socially stunted and predatory weirdos than some unstoppable omnipotent force.

Reply View 0 replies