Comment by somenameforme

Comment by somenameforme 13 hours ago

6 replies

The phone in your pocket can perform arithmetic many orders of magnitude faster than any human, even the fringe autistic savant type. Yet it's still obviously not intelligent.

Excellence at any given task is not indicative of intelligence. I think we set these sort of false goalposts because we want something that sounds achievable but is just out of reach at one moment in time. For instance at one time it was believed that a computer playing chess at the level of a human would be proof of intelligence. Of course it sounds naive now, but it was genuinely believed. It ultimately not being so is not us moving the goalposts, so much as us setting artificially low goalposts to begin with.

So for instance what we're speaking of here is logical processing across natural language, yet human intelligence predates natural language. It poses a bit of a logical problem to then define intelligence as the logical processing of natural language.

andy12_ 7 hours ago

The problem is that so far, SOTA generalist models are not excellent at just one particular task. They have a very wide range of tasks they are good at, and good scores in one particular benchmarks correlates very strongly with good scores in almost all other benchmarks, even esoteric benchmarks that AI labs certainly didn't train against.

I'm sure, without any uncertainty, that any generalist model able to do what Einstein did would be AGI, as in, that model would be able to perform any cognitive task that an intelligent human being could complete in a reasonable amount of time (here "reasonable" depends on the task at hand; it could be minutes, hours, days, years, etc).

  • somenameforme 6 hours ago

    I see things rather differently. Here's a few points in no particular order:

    (1) - A major part of the challenge is in not being directed towards something. There was no external guidance for Einstein - he wasn't even a formal researcher at the time of his breakthroughs. An LLM might be able to be handheld towards relativity, though I doubt it, but given the prompt of 'hey find something revolutionary' it's obviously never going to respond with anything relevant, even with substantially greater precision specifying field/subtopic/etc.

    (2) - Logical processing of natural language remains one small aspect of intelligence. For example - humanity invented natural language from nothing. The concept of an LLM doing this is a nonstarter since they're dependent upon token prediction, yet we're speaking of starting with 0 tokens.

    (3) - LLMs are, in many ways, very much like calculators. They can indeed achieve some quite impressive feats in specific domains, yet then they will completely hallucinate nonsense on relatively trivial queries, particularly on topics where there isn't extensive data to drive their token prediction. I don't entirely understand your extreme optimism towards LLMs given this proclivity for hallucination. Their ability to produce compelling nonsense makes them particularly tedious for using to do anything you don't already effectively know the answer to.

    • andy12_ 3 hours ago

      > I don't entirely understand your extreme optimism towards LLMs given this proclivity for hallucination

      Simply because I don't see hallucinations as a permanent problem. I see that models keep improving more and more in this regard, and I don't see why the hallucination rate can't be abirtrarily reduced with further improvements to the architecture. When I ask Claude about obscure topics, it correctly replies "I don't know", where past models would have hallucinated an answer. When I use GPT 5.2-thinking for my ML research job, I pretty much never encounter hallucinations.

      • somenameforme 3 hours ago

        Hahah, well you working in the field probably explains your optimism more than your words! If you pretty much never encounter hallucinations with GPT then you're probably dealing with it on topics where there's less of a right or wrong answer. I encounter them literally every single time I start trying to work out a technical problem with it.

    • andai 2 hours ago

      Well the "prompt" in this case would be Einstein's neurotype and all his life experiences. Might a bit long for the current context windows though ;)