Comment by imjonse
I suppose the vast majority of training data used for cutting edge models was created after 1900.
I suppose the vast majority of training data used for cutting edge models was created after 1900.
His point is that we can't train a Gemini 3/Claude 4.5 etc model because we don't have the data to match the training scale of those models. There aren't trillions of tokens of digitized pre-1900s text.
Ofc they are because their primary goal is to be useful and to be useful they need to always be relevant.
But considering that Special Relativity was published in 1905 which means all its building blocks were already floating in the ether by 1900 it would be a very interesting experiment to train something on Claude/Gemini scale and then say give in the field equations and ask it to build a theory around them.