Comment by alansaber
I think not if only for the fact that the quantity of old data isn't enough to train anywhere near a SoTA model, until we change some fundamentals of LLM architecture
I think not if only for the fact that the quantity of old data isn't enough to train anywhere near a SoTA model, until we change some fundamentals of LLM architecture
Are you saying it wouldn't be able to converse using english of the time?
Machine learning today requires an obscene quantity of examples to learn anything.
SOTA LLMs show quite a lot of skill, but they only do so after reading a significant fraction of all published writing (and perhaps images and videos, I'm not sure) across all languages, in a world whose population is 5 times higher than the link's cut off date, and the global literacy went from 20% to about 90% since then.
Computers can only make up for this by being really really fast: what would take a human a million or so years to read, a server room can pump through a model's training stage in a matter of months.
When the data isn't there, reading what it does have really quickly isn't enough.
That's not what they are saying. SOTA models include much more than just language, and the scale of training data is related to its "intelligence". Restricting the corpus in time => less training data => less intelligence => less ability to "discover" new concepts not in its training data
Could always train them on data up to 2015ish and then see if you can rediscover LLMs. There's plenty of data.
Perhaps less bullshit though was my thought? Was language more restricted then? Scope of ideas?
I mean, humans didn't need to read billions of books back then to think of quantum mechanics.