Comment by alansaber

Comment by alansaber a day ago

8 replies

I think not if only for the fact that the quantity of old data isn't enough to train anywhere near a SoTA model, until we change some fundamentals of LLM architecture

andyfilms1 a day ago

I mean, humans didn't need to read billions of books back then to think of quantum mechanics.

  • alansaber a day ago

    Which is why I said it's not impossible, but current LLM architecture is just not good enough to achieve this.

  • famouswaffles a day ago

    Right, what they needed was billions of years of brute force and trial and error.

franktankbank a day ago

Are you saying it wouldn't be able to converse using english of the time?

  • ben_w a day ago

    Machine learning today requires an obscene quantity of examples to learn anything.

    SOTA LLMs show quite a lot of skill, but they only do so after reading a significant fraction of all published writing (and perhaps images and videos, I'm not sure) across all languages, in a world whose population is 5 times higher than the link's cut off date, and the global literacy went from 20% to about 90% since then.

    Computers can only make up for this by being really really fast: what would take a human a million or so years to read, a server room can pump through a model's training stage in a matter of months.

    When the data isn't there, reading what it does have really quickly isn't enough.

  • wasabi991011 a day ago

    That's not what they are saying. SOTA models include much more than just language, and the scale of training data is related to its "intelligence". Restricting the corpus in time => less training data => less intelligence => less ability to "discover" new concepts not in its training data

    • withinboredom 2 hours ago

      Could always train them on data up to 2015ish and then see if you can rediscover LLMs. There's plenty of data.

    • franktankbank a day ago

      Perhaps less bullshit though was my thought? Was language more restricted then? Scope of ideas?