ec109685 4 days ago

Is this the infinite monkey Shakespeare trope?

  • throwup238 3 days ago

    More like the universal approximation theorem extended to computation rather than network complexity: https://en.wikipedia.org/wiki/Universal_approximation_theore...

    • immibis 3 days ago

      The universal approximation theorem is good to know because says there's no theoretical upper bound to a function-approximating NN's accuracy. In practice it says nothing about what can be realistically achieved, though.

  • nopinsight 3 days ago

    A key difference is that the way LMMs (Large Multimodal Models) generate output is far from random. These models can imitate/blend existing information or imitate/probably blend known reasoning methods in the training data. The latter is a key distinguishing feature of the new OpenAI o1 models.

    Thus, the signal-to-noise ratio of their output is generally way better than infinite monkeys.

    Arguably, humans rely on similar modes of "thinking" most of the time as well.

  • CamperBob2 3 days ago

    Yeah. Monkeys. Monkeys that write useful C and Python code that needs a bit less revision every time there's a model update.

    Can we just give the "stochastic parrot" and "monkeys with typewriters" schtick a rest? It made for novel commentary three or four years ago, but at this point, these posts themselves read like the work of parrots. They are no longer interesting, insightful, or (for that matter) true.

    • visarga 3 days ago

      If you think about it, humans necessarily use abstractions, from the edge detectors in retina to concepts like democracy. But do we really understand? All abstractions leak, and nobody knows the whole stack. For all the poorly grasped abstractions we are using, we are also just parroting. How many times are we doing things because "that is how they are done" never wondering why?

      Take ML itself, people are saying it's little more than alchemy (stir the pile). Are we just parroting approaches that have worked in practice without real understanding? Is it possible to have centralized understanding, even in principle, or is all understanding distributed among us? My conclusion is that we have a patchwork of partial understanding, stitched together functionally by abstractions. When I go to the doctor, I don't study medicine first, I trust the doctor. Trust takes the place of genuine understanding.

      So humans, like AI, use distributed and functional understanding, we don't have genuine understanding as meant by philosophers like Searle in the Chinese Room. No single neuron in the brain understands anything, but together they do. Similarly, no single human understands genuinely, but society together manages to function. There is no homunculus, no centralized understander anywhere. We humans are also stochastic parrots of abstractions we don't really grok to the full extent.

      • throwaway290 3 days ago

        > My conclusion

        Are you saying you understood something? Was it genuine? Do you think LLM feels the same thing?

      • kaechle 3 days ago

        Great points. We're pattern-matching shortcut machines, without a doubt. In most contexts, not even good ones.

        > When I go to the doctor, I don't study medicine first, I trust the doctor. Trust takes the place of genuine understanding.

        The ultimate abstraction! Trust is highly irrational by definition. But we do it all day every day, lest we be classified as psychologically unfit for society. Which is to say, mental health is predicated on a not-insignificant amount of rationalizations and self-deceptions. Hallucinations, even.

    • kaechle 3 days ago

      Every time I read "stochastic parrot," my always-deterministic human brain surfaces this quote:

      > “Most people are other people. Their thoughts are someone else's opinions, their lives a mimicry, their passions a quotation.”

      - Oscar Wilde, a great ape with a pen

      • OKRainbowKid 3 days ago

        Reading this quote makes me wonder why I should believe that I am somehow special or different, and not just another "other".

        • HeatrayEnjoyer 3 days ago

          That's just it. We're not unique. We've always been animals running on instinct in reaction to our environment. Our instincts are more complex than other animals but they are not special and they are replicable.

    • ffsm8 3 days ago

      > novel commentary three or four years ago,

      Chatgpt was released November 2022. That's one year and 10 months ago. Their marketing started in the summer of the same year, still far of from 3-4 years.

      • Banou 3 days ago

        But chatgpt wasnt the first, openai had coding playground with gpt2, and you could already code even before that, around 2020 already, so I'd say it has been 3-4years

      • killerstorm 3 days ago

        GPT-3 paper announcement got 200 comments on HN back in 2020.

        It doesn't matter when marketing started, people were already discussing it in 2019-2020.

        Stochastic parrot: The term was coined by Emily M. Bender[2][3] in the 2021 artificial intelligence research paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? " by Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell.[4]

      • [removed] 3 days ago
        [deleted]
    • hegFdH 3 days ago

      The infinite monkey post was in response to this claim, which, like the universal approximation theorem, is useless in practice:

      "We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient."

      Like an LLM, you omit the context and browbeat people with the "truth" you want to propagate. Together with many political forbidden terms since 2020, let us now also ban "stochastic parrot" in order to have a goodbellyfeel newspeak.

      • chaosist 3 days ago

        There is also a problem of "stochastic parrot" being constantly used in a pejorative sense as opposed to a neutral term to keep grounded and skeptical.

        Of course, it is an overly broad stroke that doesn't quite capture all the nuance of the model but the alternative of "come on guys, just admit the model is thinking" is much worse and has much less to do with reality.

    • 93po 3 days ago

      AI news article comments bingo card:

      * Tired ClosedAI joke

      * Claiming it's predictive text engine that isn't useful for anything

      * Safety regulations are either good or bad, depending on who's proposing them

      * Fear mongering about climate impact

      * Bringing up Elon for no reason

      * AI will never be able to [some pretty achievable task]

      * Tired arguments from pro-IP / copyright sympathizers

      • kmeisthax 3 days ago

        > Tired ClosedAI joke

        > Tired arguments from pro-IP / copyright sympathizers

        You forgot "Tired ClosedAI joke from anti-IP / copyleft sympathizers".

        Remember that the training data debate is orthogonal to the broader debate over copyright ownership and scope. The first people to start complaining about stolen training data were the Free Software people, who wanted a legal hook to compel OpenAI and GitHub to publish model weights sourced from GPL code. Freelance artists took that complaint and ran with it. And while this is technically an argument that rests on copyright for legitimacy; the people who actually own most of the copyrights - publishers - are strangely interested about these machines that steal vast amounts of their work.

      • larodi 3 days ago

        Interestingly there should be one which is missing which is well appropriate unless everyone is super smart math professor level genius:

        These papers become increasingly difficult to properly comprehend.

        …and thus perhaps the plethora of arguably nonsensical follow ups.

        • CamperBob2 3 days ago

          These papers become increasingly difficult to properly comprehend.

          Feed it to ChatGPT and ask for an explanation suited to your current level of understanding (5-year-old, high-school, undergrad, comp-sci grad student, and so on.)

          No, really. Try it.

      • aurareturn 3 days ago

        >* Claiming it's predictive text engine that isn't useful for anything

        This one is very common on HN and it's baffling. Even if it's predictive text, who the hell cares if it achieves its goals? If an LLM is actually a bunch of dolphins typing on a keyboard made for dolphins, I could care less if it does what I need it to do. For people who continue to repeat this on HN, why? I just want to know out of my curiosity.

        >* AI will never be able to [some pretty achievable task]

        Also very common on HN.

        You forgot the "AI will never be able to do what a human can do in the exact way a human does it so AI will never achieve x".

tsimionescu 3 days ago

One question, if anyone knows the details: does this prove that there exists a single LLM that can approximate any function to arbitrary precision given enough CoT, or does it prove that for every function, there exists a Transformer that fits those criteria?

That is, does this prove that a single LLM can solve any problem, or that for any problem, we can find an LLM that solves it?

  • jstanley 3 days ago

    Doesn't the latter imply the former?

    If it's possible to find an LLM for any given problem, then find an LLM for the problem "find an LLM for the problem and then evaluate it" and then evaluate it, and then you have an LLM that can solve any problem.

    It's the "Universal Turing Machine" for LLMs.

    I wonder what's the LLM equivalent of the halting problem?

    • progval 3 days ago

      > It's the "Universal Turing Machine" for LLMs.

      A closer analogy is the Hutter Search (http://hutter1.net/ai/pfastprg.pdf), as it is also an algorithm that can solve any problem. And it is probably too inefficient to use in practice, like the Hutter Search.

    • detourdog 3 days ago

      In the late ‘80s they were called expert systems.

      Most demonstrations were regarding troubleshooting large systems, industrial processes, and education.

    • [removed] 3 days ago
      [deleted]
shawntan 4 days ago

Theoretical results exist that try to quantify the number of CoT tokens needed to reach different levels of computational expressibility: https://arxiv.org/pdf/2310.07923

TL;DR: Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size. For a field that constantly harps on parallelism and compute efficiency, this requirement seems prohibitive.

We really need to get away from constant depth architectures.

  • benkuykendall 3 days ago

    > Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size.

    So, as stated, this is impossible since it violates the Time Hierarchy Theorem.

    The actual result of the paper is that any poly-time computable function can be computed with poly-many tokens. Which is... not a particularly impressive bound? Any non-trivial fixed neural network can, for instance, compute the NAND of two inputs. And any polynomial computable function can be computed with a polynomial number of NAND gates.

    • shawntan 3 days ago

      > The actual result of the paper is that any poly-time computable function can be computed with poly-many tokens.

      You're right.

      Re: NAND of two inputs. Isn't this doable even by a single layer (no hidden layers) neural network?

      Re: Polynomial computable function. I'm assuming this makes no assumption of constant-depth.

      Because my entire point was that the result of this paper is not actually impressive AND covered by a previous paper. Hopefully that's clearer.

[removed] 4 days ago
[deleted]
__loam 4 days ago

> We have mathematically proven that transformers can solve any problem

We should require that you've passed an algorithms and a thermodynamics class before you can post.

  • nopinsight 3 days ago

    To be clear I think the tweet is a bit exaggerated (and the word ‘performance’ there doesn’t take into account efficiency, for example) but I don’t have the time to read the full paper (just skimmed the abstract and conclusion). I quoted the tweet by an author for people to discuss since it’s still a fairly remarkable result.

  • bonoboTP 3 days ago

    This is an accepted ICLR paper by authors from Stanford, Toyota and Google. That's not a guarantee for anything, of course, but they likely know basic algorithms and the second law. You can certainly argue against their claims, but you need to put in the legwork.

    • __loam 3 days ago

      I don't think I should need to argue with the absurd claim that these can solve any problem.

riku_iki 3 days ago

> Remarkably, constant depth is sufficient.

I think article also says log(n) embedding size (width?) is required, where n is size of input.

candiddevmike 4 days ago

> We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.

That seems like a bit of a leap here to make this seem more impressive than it is (IMO). You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.

Wake me up when a LLM figures out stable fusion or room temperature superconductors.

  • krackers 4 days ago

    I think you're misrepresenting the study. It builds upon previous work that examines the computation power of the transformer architecture from a circuit-complexity perspective. Previous work showed that the class of problems that a "naive" Transformer architecture could compute was within TC0 [1, 2] and as a consequence it was fundamentally impossible for transformers to solve certain classes of mathematical problems. This study actually provides a more realistic bound of AC0 (by analyzing the finite-precision case) which rules out even more problems, including such 'simple' ones as modular parity.

    We also had previous work that hinted that part of the reason why chain-of-thought works from a theoretical perspective is that it literally allows the model to perform types of computations it could not under the more limited setting (in the same way jumping from FSMs to pushdown automata allows you to solve new types of problems) [3].

    [1] https://news.ycombinator.com/item?id=35609652 [2] https://blog.computationalcomplexity.org/2023/02/why-cant-li... [3] https://arxiv.org/abs/2305.15408

    • shawntan 4 days ago

      Generally, literature on the computational power of the SAME neural architecture can differ on their conclusions based on their premises. Assuming finite precision will give a more restrictive result, and assuming arbitrary precision can give you Turing completeness.

      From a quick skim this seems like it's making finite precision assumptions? Which doesn't actually tighten previous bounds, it just makes different starting assumptions.

      Am author of [1].

      • krackers 4 days ago

        Ah my bad, great catch! I've updated my comment accordingly.

        • shawntan 3 days ago

          You can't really be blamed though, the language in the paper does seem to state what you originally said. Might be a matter of taste but I don't think it's quite accurate.

          The prior work they referenced actually did account for finite precision cases and why they didn't think it was useful to prove the result with those premises.

          In this work they simply argued from their own perspective why finite precision made more sense.

          The whole sub-field is kinda messy and I get quoted differing results all the time.

          Edit: Also, your original point stands, obviously. Sorry for nitpicking on your post, but I also just thought people should know more about the nuances of this stuff.

  • Horffupolde 4 days ago

    It is actually impressive.

    One could argue that writing enabled chain of thought across generations.

  • Veedrac 3 days ago

    > Wake me up when a LLM figures out stable fusion or room temperature superconductors.

    Man, the goalposts these days.

    • FeepingCreature 3 days ago

      "I love [goalposts]. I love the whooshing noise they make as they go by." --Douglas Adams, slightly adjusted

  • whimsicalism 3 days ago

    it's a TCS result.

    seems like many commenting don't know about computability

  • WalterSear 3 days ago

    > You can say the same thing about humans

    1. Holy shit.

    2. You can't apply Moore's law to humans.

    • Tostino 3 days ago

      You can't to chips any more either.

      Density has continued to increase, but so have prices. The 'law' was tied to the price to density ratio, and it's been almost a decade now since it died.

    • gryn 3 days ago

      > 2. You can't apply Moore's law to humans.

      not with that attitude. /s

      if you take reproduction into account and ignore all the related externalities you can definitely double your count of transistors (humans) every two years.

  • aurareturn 3 days ago

    > You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.

    Isn’t this a good thing since compute can be scaled so that the LLM can do generations of human thinking in a much shorter amount of time?

    Say humans can solve quantum gravity in 100 years of thinking by 10,000 really smart people. If one AGI is equal to 1 really smart person. Scale enough compute for 1 million AGI and we can solve quantum gravity in a year.

    The major assumption here is that transformers can indeed solve every problem humans can.

    • wizzwizz4 3 days ago

      > Isn’t this a good thing since compute and be scaled so that the LLM can do generations of human thinking in a much shorter amount of time?

      But it can't. There isn't enough planet.

      > The major assumption here is that transformers can indeed solve every problem humans can.

      No, the major assumptions are (a) that ChatGPT can, and (b) that we can reduce the resource requirements by many orders of magnitude. The former assumption is highly-dubious, and the latter is plainly false.

      Transformers are capable of representing any algorithm, if they're allowed to be large enough and run large enough. That doesn't give them any special algorithm-finding ability, and finding the correct algorithms is the hard part of the problem!

      • aurareturn 3 days ago

        > But it can't. There isn't enough planet.

        How much resource are you assuming an AGI would consume?

    • visarga 3 days ago

      > Scale enough compute for 1 million AGI and we can solve quantum gravity in a year.

      That is wrong, it misses the point. We learn from the environment, we don't secrete quantum gravity from our pure brains. It's a RL setting of exploration and exploitation, a search process in the space of ideas based on validation in reality. A LLM alone is like a human locked away in a cell, with no access to test ideas.

      If you take child Einstein and put him on a remote island, and come back 30 years later, do you think he would impress you with is deep insights? It's not the brain alone that made Einstein so smart. It's also his environment that had a major contribution.

      • exe34 3 days ago

        if you told child Einstein that light travels at a constant speed in all inertial frames and taught him algebra, then yes, he would come up with special relativity.

        in general, an AGI might want to perform experiments to guide its exploration, but it's possible that the hypotheses that it would want to check have already been probed/constrained sufficiently. which is to say, a theoretical physicist might still stumble upon the right theory without further experiments.

        • westurner 3 days ago

          Labeling of observations better than a list of column label strings at the top would make it possible to mine for insights in or produce a universal theory that covers what has been observed instead of the presumed limits of theory.

          CSVW is CSV on the Web as Linked Data.

          With 7 metadata header rows at the top, a CSV could be converted to CSVW; with URIs for units like metre or meter or feet.

          If a ScholarlyArticle publisher does not indicate that a given CSV or better :Dataset that is :partOf an article is a :premiseTo the presented argument, a human grad student or an LLM needs to identify the links or textual citations to the dataset CSV(s).

          Easy: Identify all of the pandas.read_csv() calls in a notebook,

          Expensive: Find the citation in a PDF, search for the text in "quotation marks" and try and guess which search result contains the dataset premise to an article;

          Or, identify each premise in the article, pull the primary datasets, and run an unbiased automl report to identify linear and nonlinear variance relations and test the data dredged causal chart before or after manually reading an abstract.

      • aurareturn 3 days ago

        Assumption is that the AGI can solve any problem humans can - including learning from the environment if that is what is needed.

        But I think you're missing the point of my post. I don't want to devolve this topic into yet another argument centered around "but AI can't be AGI or can't do what humans can do because so and so".

        • visarga 3 days ago

          I often see this misconception that compute alone will lead us to surpass human level. No doubt it is inspired by the "scaling laws" we heard so much about. People forget that imitation is not sufficient to surpass human level.

DarkNova6 3 days ago

The more interesting is whether the ability of reason and solve problems scales linearly or logarithmically.

[removed] 3 days ago
[deleted]
m3kw9 3 days ago

Sort of like quantum superposition state? So here is an idea, using quantum to produce all possible inferences and use some not yet invented algorithms to collapse to the final result

[removed] 3 days ago
[deleted]
tooltower 3 days ago

Constant depth circuits can solve everything? I feel like I missed some important part of circuit complexity. Or this is BS.

  • shawntan 3 days ago

    Using CoT implicitly increases the depth of the circuit. But yes, poorly worded.