Comment by candiddevmike

Comment by candiddevmike 4 days ago

23 replies

> We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.

That seems like a bit of a leap here to make this seem more impressive than it is (IMO). You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.

Wake me up when a LLM figures out stable fusion or room temperature superconductors.

krackers 4 days ago

I think you're misrepresenting the study. It builds upon previous work that examines the computation power of the transformer architecture from a circuit-complexity perspective. Previous work showed that the class of problems that a "naive" Transformer architecture could compute was within TC0 [1, 2] and as a consequence it was fundamentally impossible for transformers to solve certain classes of mathematical problems. This study actually provides a more realistic bound of AC0 (by analyzing the finite-precision case) which rules out even more problems, including such 'simple' ones as modular parity.

We also had previous work that hinted that part of the reason why chain-of-thought works from a theoretical perspective is that it literally allows the model to perform types of computations it could not under the more limited setting (in the same way jumping from FSMs to pushdown automata allows you to solve new types of problems) [3].

[1] https://news.ycombinator.com/item?id=35609652 [2] https://blog.computationalcomplexity.org/2023/02/why-cant-li... [3] https://arxiv.org/abs/2305.15408

  • shawntan 4 days ago

    Generally, literature on the computational power of the SAME neural architecture can differ on their conclusions based on their premises. Assuming finite precision will give a more restrictive result, and assuming arbitrary precision can give you Turing completeness.

    From a quick skim this seems like it's making finite precision assumptions? Which doesn't actually tighten previous bounds, it just makes different starting assumptions.

    Am author of [1].

    • krackers 4 days ago

      Ah my bad, great catch! I've updated my comment accordingly.

      • shawntan 4 days ago

        You can't really be blamed though, the language in the paper does seem to state what you originally said. Might be a matter of taste but I don't think it's quite accurate.

        The prior work they referenced actually did account for finite precision cases and why they didn't think it was useful to prove the result with those premises.

        In this work they simply argued from their own perspective why finite precision made more sense.

        The whole sub-field is kinda messy and I get quoted differing results all the time.

        Edit: Also, your original point stands, obviously. Sorry for nitpicking on your post, but I also just thought people should know more about the nuances of this stuff.

Horffupolde 4 days ago

It is actually impressive.

One could argue that writing enabled chain of thought across generations.

Veedrac 3 days ago

> Wake me up when a LLM figures out stable fusion or room temperature superconductors.

Man, the goalposts these days.

  • FeepingCreature 3 days ago

    "I love [goalposts]. I love the whooshing noise they make as they go by." --Douglas Adams, slightly adjusted

whimsicalism 3 days ago

it's a TCS result.

seems like many commenting don't know about computability

WalterSear 3 days ago

> You can say the same thing about humans

1. Holy shit.

2. You can't apply Moore's law to humans.

  • Tostino 3 days ago

    You can't to chips any more either.

    Density has continued to increase, but so have prices. The 'law' was tied to the price to density ratio, and it's been almost a decade now since it died.

  • gryn 3 days ago

    > 2. You can't apply Moore's law to humans.

    not with that attitude. /s

    if you take reproduction into account and ignore all the related externalities you can definitely double your count of transistors (humans) every two years.

aurareturn 4 days ago

> You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.

Isn’t this a good thing since compute can be scaled so that the LLM can do generations of human thinking in a much shorter amount of time?

Say humans can solve quantum gravity in 100 years of thinking by 10,000 really smart people. If one AGI is equal to 1 really smart person. Scale enough compute for 1 million AGI and we can solve quantum gravity in a year.

The major assumption here is that transformers can indeed solve every problem humans can.

  • wizzwizz4 4 days ago

    > Isn’t this a good thing since compute and be scaled so that the LLM can do generations of human thinking in a much shorter amount of time?

    But it can't. There isn't enough planet.

    > The major assumption here is that transformers can indeed solve every problem humans can.

    No, the major assumptions are (a) that ChatGPT can, and (b) that we can reduce the resource requirements by many orders of magnitude. The former assumption is highly-dubious, and the latter is plainly false.

    Transformers are capable of representing any algorithm, if they're allowed to be large enough and run large enough. That doesn't give them any special algorithm-finding ability, and finding the correct algorithms is the hard part of the problem!

    • aurareturn 4 days ago

      > But it can't. There isn't enough planet.

      How much resource are you assuming an AGI would consume?

      • wizzwizz4 3 days ago

        Are we talking about "an AGI", or are we talking about overfitting large transformer models with human-written corpora and scaling up the result?

        "An AGI"? I have no idea what that algorithm might look like. I do know that we can cover the majority of cases with not too much effort, so it all depends on the characteristics of that long tail.

        ChatGPT-like transformer models? We know what that looks like, despite the AI companies creatively misrepresenting the resource use (ref: https://www.bnnbloomberg.ca/business/technology/2024/08/21/h...). Look at https://arxiv.org/pdf/2404.06405:

        > Combining Wu’s method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem.

        AlphaGeometry had an entire supercomputer cluster, and dozens of hours. GOFAI approaches have a laptop and five minutes. Scale that inconceivable inefficiency up to AGI, and the total power output of the sun may not be enough.

  • visarga 3 days ago

    > Scale enough compute for 1 million AGI and we can solve quantum gravity in a year.

    That is wrong, it misses the point. We learn from the environment, we don't secrete quantum gravity from our pure brains. It's a RL setting of exploration and exploitation, a search process in the space of ideas based on validation in reality. A LLM alone is like a human locked away in a cell, with no access to test ideas.

    If you take child Einstein and put him on a remote island, and come back 30 years later, do you think he would impress you with is deep insights? It's not the brain alone that made Einstein so smart. It's also his environment that had a major contribution.

    • exe34 3 days ago

      if you told child Einstein that light travels at a constant speed in all inertial frames and taught him algebra, then yes, he would come up with special relativity.

      in general, an AGI might want to perform experiments to guide its exploration, but it's possible that the hypotheses that it would want to check have already been probed/constrained sufficiently. which is to say, a theoretical physicist might still stumble upon the right theory without further experiments.

      • westurner 3 days ago

        Labeling of observations better than a list of column label strings at the top would make it possible to mine for insights in or produce a universal theory that covers what has been observed instead of the presumed limits of theory.

        CSVW is CSV on the Web as Linked Data.

        With 7 metadata header rows at the top, a CSV could be converted to CSVW; with URIs for units like metre or meter or feet.

        If a ScholarlyArticle publisher does not indicate that a given CSV or better :Dataset that is :partOf an article is a :premiseTo the presented argument, a human grad student or an LLM needs to identify the links or textual citations to the dataset CSV(s).

        Easy: Identify all of the pandas.read_csv() calls in a notebook,

        Expensive: Find the citation in a PDF, search for the text in "quotation marks" and try and guess which search result contains the dataset premise to an article;

        Or, identify each premise in the article, pull the primary datasets, and run an unbiased automl report to identify linear and nonlinear variance relations and test the data dredged causal chart before or after manually reading an abstract.

    • aurareturn 3 days ago

      Assumption is that the AGI can solve any problem humans can - including learning from the environment if that is what is needed.

      But I think you're missing the point of my post. I don't want to devolve this topic into yet another argument centered around "but AI can't be AGI or can't do what humans can do because so and so".

      • visarga 3 days ago

        I often see this misconception that compute alone will lead us to surpass human level. No doubt it is inspired by the "scaling laws" we heard so much about. People forget that imitation is not sufficient to surpass human level.