Comment by mlsu

Comment by mlsu 10 months ago

13 replies

But you can already see it with Delve. Mistral uses "delve" more than baseline, because it was trained on GPT.

So it's classic positive feedback. LLM uses delve more, delve appears in training data more, LLM uses delve more...

Who knows what other semantic quirks are being amplified like this. It could be something much more subtle, like cadence or sentence structure. I already notice that GPT has a "tone" and Claude has a "tone" and they're all sort of "GPT-like." I've read comments online that stop and make me question whether they're coming from a bot, just because their word choice and structure echoes GPT. It will sink into human writing too, since everyone is learning in high school and college that the way you write is by asking GPT for a first draft and then tweaking it (or not).

Unfortunately, I think human and machine generated text are entirely miscible. There is no "baseline" outside the machines, other than from pre-2022 text. Like pre-atomic steel.

taneq 10 months ago

> LLM uses delve more, delve appears in training data more, LLM uses delve more...

Some day we may view this as the beginnings of machine culture.

  • mlsu 10 months ago

    Oh no, it's been here for quite a while. Our culture is already heavily glued to the machine. The way we express ourselves, the language we use, even our very self-conception originates increasingly in online spaces.

    Have you ever seen someone use their smartphone? They're not "here," they are "there." Forming themselves in cyberspace -- or being formed, by the machine.

    • Matumio 10 months ago

      I think they meant culture in the sense of knowledge that gets passed down from one generation to the next. Not a human culture of using machines, but a machine culture of using human languages.

      • mlsu 10 months ago

        Consider that the algorithm cannot evolve without human interaction. That's what I'm saying, it's a symbiote to us. If you consider "weights in the Instagram recommendation algorithm" to be "the machine", what we are talking about here has been happening for a long time now and has seen many generations, with each entity influencing the other.

        I don't think we'll have true machine culture until we have fully autonomous agents in the wild that are interacting with the world independently on its own terms. Right now the substrate is text which comes from a human mind -- it does not arise naturally from nothing. So the machine is a symbiote for now until we solve some difficult robotics problems.

        • Matumio 10 months ago

          Hm, it's probably true that recommendation algorithms do something similar already, training on "human likes" that were influenced by the previous generation. But "human language" is a richer medium to carry information.

          I don't think you need to be independent or autonomous to develop a culture. And a lot of human culture was passed down over generations without understanding why it worked. We just imitate the behaviour and rituals from our most successful ancestors or role models.

          If new LLMs can access the past generation's knowledge of how to please human evaluators, they will use it. It's not a deliberate decision by an "agent", it's just the best text source to copy from. This is a new feedback loop between generations of assistants, and it bypasses whatever the human designer had in mind. Phrases like "it is always best to ask an expert" will pop up just because you tuned the LLM to sound like a helpful assistant, and that's what helpful assistants sound like in the training data. You'd have to actively steer the new generation away from using their ancestral knowledge.

          I guess it comes down to what your definition of "culture" is. There is no targeted teaching of the next generation, for example - but is this a requirement? I agree that talking about "machine culture" right now sounds like a stretch, but now I wonder what pieces are actually missing.

      • taneq 10 months ago

        Yep I was going for more "the machines have their own culture increasingly independent from ours."

bryanrasmussen 10 months ago

is the use of miscible here a clue? Or just some workplace vocabulary you've adapted analogically?

  • mlsu 10 months ago

    Human me just thought it was a good word for this. It implies some irreversible process of mixing, I think that characterizes this process really well.

    • noduerme 10 months ago

      There were dozens of 20th Century ideological movements which developed their own forms of "Newspeak" in their own native languages. Largely, natural human dialog between native speakers and between those opposed to the prevailing regime recoils violently at stilted, official, or just "uncool" usages in daily vernacular. So I wouldn't be too surprised to see a sharp downtick in the popular use of any word that becomes subject to an LLM's positive-feedback loop.

      Far from saying the pool of language is now polluted, I think we now have a great data set to begin to discern authentic from inauthentic human language. Although sure, people on the fringes could get caught in a false positive for being bots, like you or I.

      The biggest LLM of them all is the daily driver of all new linguistic innovation: Human society, in all its daily interactions. The quintillions of daily phrases exchanged and forever mutating around the globe - each mutation of phrase interacting with its interlocutor, and each drawing from not the last 500,000 tokens but the entire multi-modal, if you will, experience of each human to date in their entire lives - vastly eclipses anything any hardware could ever emulate given the current energy constraints. Software LLMs are just a state machine stuck in a moment in time. At best they will always lag, the way Stalinist language lagged years behind the patois of average Russians, who invented daily linguistic dodges to subvert and mock the regime. The same process takes place anywhere there is a dominant official or uncool accent or phrasing. The ghetto invents new words, new rhythm, and then it becomes cool in the middle class. The authorities never catch up, precisely because the use of subversive language is humanity's immune system against authority.

      If there is one distinctly human trait, it's sniffing out anyone who sounds suspiciously inauthentic. (Sadly, it's also the trait that leads to every kind of conspiracy theorizing imaginable; but this too probably confers in some cases an evolutionary advantage). Sniffing out the sound of a few LLMs is already happening, and will accelerate geometrically, much faster than new models can be trained.

      • mlsu 10 months ago

        Really insightful.

        I'm a little more cautious though. I think GPT will be way more integrated, simply because it's useful. Stalinist language was artificial, in the sense that it was basically imposed on you from outside for no good reason. When you wanted to get real stuff done (either talking to close friends, being productive with colleagues, etc) you wouldn't use socialist newspeak because it got in the way. GPT will be imposed by the outside world, but it's actually a useful thing to be able to converse with a language model; you'll do it every day at work, when buying things, when using your phone/PC.

        And also, unlike in USSR times, so much of our communication is online and visible. It would not surprise me if we develop a model that can train continuously on the firehose. Text is small. Data rate of every person on earth speaking simultaneously:

        - 150 words per minute spoken

        - 150 words × (5 characters/word + 1 space) = 150 × 6 = 900 characters per minute

        - 1 byte per char = 900 bytes/min = 15 bytes/sec

        - 15 bytes / sec * 8,000,000,000 people speaking continuously = 120 gigabytes/second

        That's a lot but it's not even the bandwidth of a single consumer GPU.

      • bryanrasmussen 10 months ago

        humans also lag humans, the future may already be spoken, but the slang is not evenly memed out yet.

  • jazzyjackson 10 months ago

    If you think that's niche wait til you hear about man-machine miscegenation