Comment by eqmvii

Comment by eqmvii a day ago

Could this be an experiment to show how likely LLMs are to lead to AGI, or at least intelligence well beyond our current level?

If you could only give it texts and info and concepts up to Year X, well before Discovery Y, could we then see if it could prompt its way to that discovery?

ben_w a day ago

> Could this be an experiment to show how likely LLMs are to lead to AGI, or at least intelligence well beyond our current level?

You'd have to be specific what you mean by AGI: all three letters mean a different thing to different people, and sometimes use the whole means something not present in the letters.

> If you could only give it texts and info and concepts up to Year X, well before Discovery Y, could we then see if it could prompt its way to that discovery?

To a limited degree.

Some developments can come from combining existing ideas and seeing what they imply.

Other things, like everything to do with relativity and quantum mechanics, would have required experiments. I don't think any of the relevant experiments had been done prior to this cut-off date, but I'm not absolutely sure of that.

You might be able to get such an LLM to develop all the maths and geometry for general relativity, and yet find the AI still tells you that the perihelion shift of Mercury is a sign of the planet Vulcan rather than of a curved spacetime: https://en.wikipedia.org/wiki/Vulcan_(hypothetical_planet)

Reply View 14 replies

grimgrin a day ago

An example of why you need to explain what you mean by AGI is:
https://www.robinsloan.com/winter-garden/agi-is-here/

Reply View | 0 replies
opponent4 a day ago

> You'd have to be specific what you mean by AGI
Well, they obviously can't. AGI is not science, it's religion. It has all the trappings of religion: prophets, sacred texts, origin myth, end-of-days myth and most importantly, a means to escape death. Science? Well, the only measure to "general intelligence" would be to compare to the only one which is the human one but we have absolutely no means by which to describe it. We do not know where to start. This is why you scrape the surface of any AGI definition you only find circular definitions.
And no, the "brain is a computer" is not a scientific description, it's a metaphor.

Reply View | 11 replies
- strbean 21 hours ago
  
  > And no, the "brain is a computer" is not a scientific description, it's a metaphor.
  Disagree. A brain is turing complete, no? Isn't that the definition of a computer? Sure, it may be reductive to say "the brain is just a computer".
  
  Reply View | 8 replies
  
  opponent4 20 hours ago
  
  Not even close. Turing complete does not apply to the brain plain and simple. That's something to do with algorithms and your brain is not a computer as I have mentioned. It does not store information. It doesn't process information. It just doesn't work that way.
  https://aeon.co/essays/your-brain-does-not-process-informati...
  
  Reply View | 6 replies
  
  Davidzheng 14 hours ago
  
  probably not actually turing complete right? for one it is not infinite so
  
  Reply View | 0 replies
- nomel 8 hours ago
  
  > And no, the "brain is a computer" is not a scientific description, it's a metaphor.
  I have trouble comprehending this. What is "computer" to you?
  
  Reply View | 0 replies
- ben_w 21 hours ago
  
  Cargo cults are a religion, the things they worship they do not understand, but the planes and the cargo themselves are real.
  There's certainly plenty of cargo-culting right now on AI.
  Sacred texts, I don't recognise. Yudkowsky's writings? He suggests wearing clown shoes to avoid getting a cult of personality disconnected from the quality of the arguments, if anyone finds his works sacred, they've fundamentally misunderstood him:
  I have sometimes thought that all professional lectures on rationality should be delivered while wearing a clown suit, to prevent the audience from confusing seriousness with solemnity.
  - https://en.wikiquote.org/wiki/Eliezer_Yudkowsky
  Prophets forecasting the end-of-days, yes, but this too from climate science, from everyone who was preparing for a pandemic before covid and is still trying to prepare for the next one because the wet markets are still around, from economists trying to forecast growth or collapse and what will change any given prediction of the latter into the former, and from the military forces of the world saying which weapon systems they want to buy. It does not make a religion.
  A means to escape death, you can have. But it's on a continuum with life extension and anti-aging medicine, which itself is on a continuum with all other medical interventions. To quote myself:
  Taking a living human's heart out without killing them, and replacing it with one you got out a corpse, that isn't the magic of necromancy, neither is it a prayer or ritual to Sekhmet, it's just transplant surgery. … Immunity to smallpox isn't a prayer to the Hindu goddess Shitala (of many things but most directly linked with smallpox), and it isn't magic herbs or crystals, it's just vaccines.
  - https://benwheatley.github.io/blog/2025/06/22-13.21.36.html
  
  Reply View | 0 replies
markab21 a day ago

Basically looking for emergent behavior.

Reply View | 0 replies

water-data-dude a day ago

It'd be difficult to prove that you hadn't leaked information to the model. The big gotcha of LLMs is that you train them on BIG corpuses of data, which means it's hard to say "X isn't in this corpus", or "this corpus only contains Y". You could TRY to assemble a set of training data that only contains text from before a certain date, but it'd be tricky as heck to be SURE about it.

Ways data might leak to the model that come to mind: misfiled/mislabled documents, footnotes, annotations, document metadata.

Reply View 3 replies

gwern 21 hours ago

There's also severe selection effects: what documents have been preserved, printed, and scanned because they turned out to be on the right track towards relativity?

Reply View | 1 reply
- mxfh 20 hours ago
  
  This.
  Especially for London there is a huge chunk of recorded parliament debates.
  More interesting for dialoge seems training on recorded correspondence in form of letters anyway.
  And that corpus script just looks odd to say the least, just oversample by X?
  
  Reply View | 0 replies
reassess_blind 9 hours ago

Just Ctrl+F the data. /s

Reply View | 0 replies

alansaber a day ago

I think not if only for the fact that the quantity of old data isn't enough to train anywhere near a SoTA model, until we change some fundamentals of LLM architecture

Reply View 8 replies

andyfilms1 a day ago

I mean, humans didn't need to read billions of books back then to think of quantum mechanics.

Reply View | 2 replies
- alansaber a day ago
  
  Which is why I said it's not impossible, but current LLM architecture is just not good enough to achieve this.
  
  Reply View | 0 replies
- famouswaffles a day ago
  
  Right, what they needed was billions of years of brute force and trial and error.
  
  Reply View | 0 replies
franktankbank a day ago

Are you saying it wouldn't be able to converse using english of the time?

Reply View | 4 replies
- ben_w a day ago
  
  Machine learning today requires an obscene quantity of examples to learn anything.
  SOTA LLMs show quite a lot of skill, but they only do so after reading a significant fraction of all published writing (and perhaps images and videos, I'm not sure) across all languages, in a world whose population is 5 times higher than the link's cut off date, and the global literacy went from 20% to about 90% since then.
  Computers can only make up for this by being really really fast: what would take a human a million or so years to read, a server room can pump through a model's training stage in a matter of months.
  When the data isn't there, reading what it does have really quickly isn't enough.
  
  Reply View | 0 replies
- wasabi991011 a day ago
  
  That's not what they are saying. SOTA models include much more than just language, and the scale of training data is related to its "intelligence". Restricting the corpus in time => less training data => less intelligence => less ability to "discover" new concepts not in its training data
  
  Reply View | 2 replies
  
  withinboredom 2 hours ago
  
  Could always train them on data up to 2015ish and then see if you can rediscover LLMs. There's plenty of data.
  
  Reply View | 0 replies
  
  franktankbank a day ago
  
  Perhaps less bullshit though was my thought? Was language more restricted then? Scope of ideas?
  
  Reply View | 0 replies

armcat a day ago

I think this would be an awesome experiment. However you would effectively need to train something of a GPT-5.2 equivalent. So you need lot of text, a much larger parameterization (compared to nanoGPT and Phi-1.5), and the 1800s equivalents of supervised finetuning and reinforcement learning with human feedback.

Reply View 0 replies

dexwiz a day ago

This would be a true test of can LLMs innovate or just regurgitate. I think part of people's amazement of LLMs is they don't realize how much they don't know. So thinking and recalling look the same to the end user.

Reply View 0 replies

Trufa a day ago

This is fascinating, but the experiment seems to fail in being a fair comparison of how much knowledge can we have from that time in data vs now.

As a thought experiment I find it thrilling.

Reply View 0 replies

Rebuff5007 a day ago

OF COURSE!

The fact that tech leaders espouse the brilliance of LLMs and don't use this specific test method is infuriating to me. It is deeply unfortunate that there is little transparency or standardization of the datasets available for training/fine tuning.

Having this be advertised will make more interesting and informative benchmarks. OEM models that are always "breaking" the benchmarks are doing so with improved datasets as well as improved methods. Without holding the datasets fixed, progress on benchmarks are very suspect IMO.

Reply View 0 replies

nickpsecurity 19 hours ago

That is one of the reasons I want it done. We cant tell if AI's are parroting training data without having the whole, training data. Making it old means specific things won't be in it (or will be). We can do more meaningful experiments.

Reply View 0 replies

feisty0630 a day ago

I fail to see how the two concepts equate.

LLMs have neither intelligence nor problem-solving abillity (and I won't be relaxing the definition of either so that some AI bro can pretend a glorified chatbot is sentient)

You would, at best, be demonstrating that the sharing of knowledge across multiple disciplines and nations (which is a relatively new concept - at least at the scale of something like the internet) leads to novel ideas.

Reply View 4 replies

al_borland a day ago

I've seen many futurists claim that human innovation is dead and all future discoveries will be the results of AI. If this is true, we should be able to see AI trained on the past figure it's way to various things we have today. If it can't do this, I'd like said futurists to quiet down, as they are discouraging an entire generation of kids who may go on to discover some great things.

Reply View | 3 replies
- skissane a day ago
  
  > I've seen many futurists claim that human innovation is dead and all future discoveries will be the results of AI.
  I think there's a big difference between discoveries through AI-human synergy and discoveries through AI working in isolation.
  It probably will be true soon (if it isn't already) that most innovation features some degree of AI input, but still with a human to steer the AI in the right direction.
  I think an AI being able to discover something genuinely new all by itself, without any human steering, is a lot further off.
  If AIs start producing significant quantities of genuine and useful innovation with minimal human input, maybe the singularitarians are about to be proven right.
  
  Reply View | 0 replies
- thinkingemote 21 hours ago
  
  I'm struggling to get a handle on this idea. Is the idea that today's data will be the data of the past, in the future?
  So if it can work with whats now past, it will be able to work with the past in the future?
  
  Reply View | 1 reply
  
  al_borland 18 hours ago
  
  Essentially, yes.
  If the prediction is that AI will be able to invent the future. If we give it data from our past without knowledge of the present... what type of future will it invent, what progress will it make, if any at all? And not just having the idea, but how to implement the idea in a way that actually works with the technology of the day, and can build on those things over time.
  For example, would AI with 1850 data have figured out the idea of lift to make an airplane and taught us how to make working flying machines and progress them to the jets we have today, or something better? It wouldn't even be starting from 0, so this would be a generous example, as da Vinci way playing with these ideas in the 15th century.
  If it can't do it, or what it produces is worse than what humans have done, we shouldn't leave it to AI alone to invent our actual future. Which would mean reevaluating the role these "thought leaders" say it will play, and how we're educating and communicating about AI to the younger generations.
  
  Reply View | 0 replies

mistermann 20 hours ago

[dead]

Reply View 0 replies