lsy 3 days ago

Note that for the purposes of this paper a “problem” just means a formally decidable problem or a formal language, and the proof is that by creatively arranging transformers you can make individual transformer runs behave like individual Boolean circuits. However, this is a long way from any practical application of transformers: for one thing, most problems we care about are not stated as formal languages, and we already have an exceptionally more efficient way to implement Boolean circuits.

  • shawntan 3 days ago

    If a "problem we care about" is not stated as a formal language, does it mean it does not exist in the hierarchy of formal languages? Or is it just as yet unclassified?

    • tsimionescu 3 days ago

      It means that there are two problems: one, to formalize the problem as stated while capturing all relevant details, and two, solving the resulting formal problem. Until you solve problem one, you can't use formal methods to say anything about the problem (it's not even clear a priori that a problem is even solvable).

      Unfortunately, the task of a formalizing an informal problem is itself an informal problem that we don't know how to formalize, so we can't say much about it. So overall, we can't say much about how hard the general problem "given a problem statement from a human, solve that problem" is, whether any particular system (including a human!) can solve it and how long that might take with what resources.

      • viraptor 3 days ago

        > task of a formalizing an informal problem is itself an informal problem

        I couldn't find details about this - do you know of a paper or some resource which digs into that idea?

    • wslh 3 days ago

      My 2 cents: Since LLMs (Large Language Models) operate as at least a subset of Turing machines (which recognize recursively enumerable languages), the chain of thought (CoT) approach could be equivalent to or even more expressive than that subset. In fact, CoT could perfectly be a Turing machine.

      If we leave CoT aside for a moment, it's worth exploring the work discussed in the paper "Neural Networks and the Chomsky Hierarchy"[1], which analyzes how neural networks (including LLMs) map onto different levels of the Chomsky hierarchy, with a particular focus on their ability to recognize formal languages across varying complexity.

      [1] https://ar5iv.labs.arxiv.org/html/2207.02098v1

      • flir 3 days ago

        > In fact, CoT could perfectly be a Turing machine.

        Are we going to need an infinite number of LLMs, arranged on a tape?

  • julienreszka 3 days ago

    > most problems we care about are not stated as formal languages

    then a way would be to translate them to formal language

larodi 3 days ago

I'm waiting for peoples of AI to discover syllogism and inference in its original PROLOG sense, which this CoT abomination basically tries to achieve. Interestingly, if all logical content is translated to rules, and then only rules are fed into the LLM training set, what would the result be, and can the probabilistic magic be made into actually following reason without all the dice.

  • trescenzi 3 days ago

    Right we’ve now gotten to the stage of this AI cycle where we start using the new tool to solve problems old tools could solve. Saying a transformer can solve any Formally decidable problem if given enough tape isn’t saying much. It’s a cool proof, don’t mean to deny that, but it doesn’t mean much practically as we already have more efficient tools that can do the same.

    • marcosdumay 3 days ago

      What I don't get is... didn't people prove that in the 90s for any multi-layer neural network? Didn't people prove transformers are equivalent on the transformers paper?

  • sunir 3 days ago

    I was thinking about the graphrag paper and prolog. I’d like to extract predicates. The source material will be inconsistent and contradictory and incomplete.

    Using the clustering (community) model, an llm can summarize the opinions as a set of predicates which don’t have to agree and some general weight of how much people agree or disagree with them.

    The predicates won’t be suitable for symbolic logic because the language will be loose. However an embedding model may be able to connect different symbols together.

    Then you could attempt multiple runs through the database of predicates because there will be different opinions.

    Then one could attempt to reason using these loosely stitched predicates. I don’t know how good the outcome would be.

    I imagine this would be better in an interactive decision making tool where a human is evaluating the suggestions for the next step.

    This could be better for planning than problem solving.

  • pkoird 3 days ago

    I've said this before and I'll say it again: Any sufficiently advanced LLM is indistinguishable from Prolog.

  • detourdog 3 days ago

    I’m surprised that understanding how to be thought unfolds is being considered not relevant to the answer. I have done a lot of problem solving in groups and alone. How thoughts develop seems fundamental to understand the solutions.

    The story regarding the banning of terms that can be used with a reasoning system is a big red flag to me.

    This sort of knee jerk reaction displays immature management and an immature technology product.

sigmoid10 3 days ago

>Remarkably, constant depth is sufficient.

How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states? Since transformers also use fully connected layers, none of this should really come as a surprise. But from glancing at the paper, they don't even mention it.

  • nexustext 3 days ago

    It's 'remarkable' because (a) academic careers are as much about hype as science, (b) arxiv doesn't have peer review process to quash this, (c) people take arxiv seriously.

  • logicchains 3 days ago

    >How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states

    Only with infinite precision, which is highly unrealistic. Under realistic assumptions, fixed depth transformer without chain-of-thought are very limited in what they can express: https://arxiv.org/abs/2207.00729 . Chain of thought increases the class of problems which fixed depth transformers can solve: https://arxiv.org/abs/2310.07923

  • IshKebab 3 days ago

    The universal approximation theorem has no practical relevance.

wodenokoto 3 days ago

But didn't we already know that NN can solve any computable problem? The interesting thing is if they can be trained to solve any (computable) problem.

  • imhoguy 3 days ago

    I don't know why I have read "HN", indeed HN can solve any problem.

  • tossandthrow 3 days ago

    Feed forward NNs can approximate all functions f: X -> Y only for closed domains.

    But recurrent neural networks can do solve any computational problem given enough precision.

    • roboboffin 3 days ago

      Does that mean when we reduce the precision of a NN, for example using bfloat16 instead of float32, we reduce the set of computational problems that can be solved.

      How would that compare with a biological neural network with presumably near-infinite precision ?

    • wodenokoto 3 days ago

      First day of introductions to NN we were asked to create all the logic gates using artificial neurons, and then told "If you have all gates, you can do all computations".

      I got to admit, I'm sorta sticking to that at face value, because I don't know enough computer science to a) discern if that is true and b) know what "f: X -> Y only for closed domains" means.

      • tossandthrow 3 days ago

        I think the easiest way to think about this is in terms of natural numbers, ie. 1, 2, 3, 4.

        When you only have a fixed width, ie. a static feed forward network, you have an upper limit to the data you can represent and compute on.

        Eg. if the highest number you can represent is 1.000, then you will need a new NN if you want to do computations on 1.001.

        ... or use an inductive structure, like a recurrent neural network has.

nopinsight 3 days ago

In the words of an author:

"What is the performance limit when scaling LLM inference? Sky's the limit.

We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient.

http://arxiv.org/abs/2402.12875 (ICLR 2024)"

https://x.com/denny_zhou/status/1835761801453306089

  • ec109685 3 days ago

    Is this the infinite monkey Shakespeare trope?

    • throwup238 3 days ago

      More like the universal approximation theorem extended to computation rather than network complexity: https://en.wikipedia.org/wiki/Universal_approximation_theore...

      • immibis 3 days ago

        The universal approximation theorem is good to know because says there's no theoretical upper bound to a function-approximating NN's accuracy. In practice it says nothing about what can be realistically achieved, though.

    • nopinsight 3 days ago

      A key difference is that the way LMMs (Large Multimodal Models) generate output is far from random. These models can imitate/blend existing information or imitate/probably blend known reasoning methods in the training data. The latter is a key distinguishing feature of the new OpenAI o1 models.

      Thus, the signal-to-noise ratio of their output is generally way better than infinite monkeys.

      Arguably, humans rely on similar modes of "thinking" most of the time as well.

    • CamperBob2 3 days ago

      Yeah. Monkeys. Monkeys that write useful C and Python code that needs a bit less revision every time there's a model update.

      Can we just give the "stochastic parrot" and "monkeys with typewriters" schtick a rest? It made for novel commentary three or four years ago, but at this point, these posts themselves read like the work of parrots. They are no longer interesting, insightful, or (for that matter) true.

      • visarga 3 days ago

        If you think about it, humans necessarily use abstractions, from the edge detectors in retina to concepts like democracy. But do we really understand? All abstractions leak, and nobody knows the whole stack. For all the poorly grasped abstractions we are using, we are also just parroting. How many times are we doing things because "that is how they are done" never wondering why?

        Take ML itself, people are saying it's little more than alchemy (stir the pile). Are we just parroting approaches that have worked in practice without real understanding? Is it possible to have centralized understanding, even in principle, or is all understanding distributed among us? My conclusion is that we have a patchwork of partial understanding, stitched together functionally by abstractions. When I go to the doctor, I don't study medicine first, I trust the doctor. Trust takes the place of genuine understanding.

        So humans, like AI, use distributed and functional understanding, we don't have genuine understanding as meant by philosophers like Searle in the Chinese Room. No single neuron in the brain understands anything, but together they do. Similarly, no single human understands genuinely, but society together manages to function. There is no homunculus, no centralized understander anywhere. We humans are also stochastic parrots of abstractions we don't really grok to the full extent.

      • kaechle 3 days ago

        Every time I read "stochastic parrot," my always-deterministic human brain surfaces this quote:

        > “Most people are other people. Their thoughts are someone else's opinions, their lives a mimicry, their passions a quotation.”

        - Oscar Wilde, a great ape with a pen

      • ffsm8 3 days ago

        > novel commentary three or four years ago,

        Chatgpt was released November 2022. That's one year and 10 months ago. Their marketing started in the summer of the same year, still far of from 3-4 years.

      • hegFdH 3 days ago

        The infinite monkey post was in response to this claim, which, like the universal approximation theorem, is useless in practice:

        "We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient."

        Like an LLM, you omit the context and browbeat people with the "truth" you want to propagate. Together with many political forbidden terms since 2020, let us now also ban "stochastic parrot" in order to have a goodbellyfeel newspeak.

        • chaosist 3 days ago

          There is also a problem of "stochastic parrot" being constantly used in a pejorative sense as opposed to a neutral term to keep grounded and skeptical.

          Of course, it is an overly broad stroke that doesn't quite capture all the nuance of the model but the alternative of "come on guys, just admit the model is thinking" is much worse and has much less to do with reality.

      • 93po 3 days ago

        AI news article comments bingo card:

        * Tired ClosedAI joke

        * Claiming it's predictive text engine that isn't useful for anything

        * Safety regulations are either good or bad, depending on who's proposing them

        * Fear mongering about climate impact

        * Bringing up Elon for no reason

        * AI will never be able to [some pretty achievable task]

        * Tired arguments from pro-IP / copyright sympathizers

  • tsimionescu 3 days ago

    One question, if anyone knows the details: does this prove that there exists a single LLM that can approximate any function to arbitrary precision given enough CoT, or does it prove that for every function, there exists a Transformer that fits those criteria?

    That is, does this prove that a single LLM can solve any problem, or that for any problem, we can find an LLM that solves it?

    • jstanley 3 days ago

      Doesn't the latter imply the former?

      If it's possible to find an LLM for any given problem, then find an LLM for the problem "find an LLM for the problem and then evaluate it" and then evaluate it, and then you have an LLM that can solve any problem.

      It's the "Universal Turing Machine" for LLMs.

      I wonder what's the LLM equivalent of the halting problem?

      • progval 3 days ago

        > It's the "Universal Turing Machine" for LLMs.

        A closer analogy is the Hutter Search (http://hutter1.net/ai/pfastprg.pdf), as it is also an algorithm that can solve any problem. And it is probably too inefficient to use in practice, like the Hutter Search.

      • detourdog 3 days ago

        In the late ‘80s they were called expert systems.

        Most demonstrations were regarding troubleshooting large systems, industrial processes, and education.

      • [removed] 3 days ago
        [deleted]
  • shawntan 3 days ago

    Theoretical results exist that try to quantify the number of CoT tokens needed to reach different levels of computational expressibility: https://arxiv.org/pdf/2310.07923

    TL;DR: Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size. For a field that constantly harps on parallelism and compute efficiency, this requirement seems prohibitive.

    We really need to get away from constant depth architectures.

    • benkuykendall 3 days ago

      > Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size.

      So, as stated, this is impossible since it violates the Time Hierarchy Theorem.

      The actual result of the paper is that any poly-time computable function can be computed with poly-many tokens. Which is... not a particularly impressive bound? Any non-trivial fixed neural network can, for instance, compute the NAND of two inputs. And any polynomial computable function can be computed with a polynomial number of NAND gates.

      • shawntan 3 days ago

        > The actual result of the paper is that any poly-time computable function can be computed with poly-many tokens.

        You're right.

        Re: NAND of two inputs. Isn't this doable even by a single layer (no hidden layers) neural network?

        Re: Polynomial computable function. I'm assuming this makes no assumption of constant-depth.

        Because my entire point was that the result of this paper is not actually impressive AND covered by a previous paper. Hopefully that's clearer.

  • [removed] 3 days ago
    [deleted]
  • __loam 3 days ago

    > We have mathematically proven that transformers can solve any problem

    We should require that you've passed an algorithms and a thermodynamics class before you can post.

    • nopinsight 3 days ago

      To be clear I think the tweet is a bit exaggerated (and the word ‘performance’ there doesn’t take into account efficiency, for example) but I don’t have the time to read the full paper (just skimmed the abstract and conclusion). I quoted the tweet by an author for people to discuss since it’s still a fairly remarkable result.

    • bonoboTP 3 days ago

      This is an accepted ICLR paper by authors from Stanford, Toyota and Google. That's not a guarantee for anything, of course, but they likely know basic algorithms and the second law. You can certainly argue against their claims, but you need to put in the legwork.

      • __loam 3 days ago

        I don't think I should need to argue with the absurd claim that these can solve any problem.

  • riku_iki 3 days ago

    > Remarkably, constant depth is sufficient.

    I think article also says log(n) embedding size (width?) is required, where n is size of input.

  • DarkNova6 3 days ago

    The more interesting is whether the ability of reason and solve problems scales linearly or logarithmically.

  • candiddevmike 3 days ago

    > We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.

    That seems like a bit of a leap here to make this seem more impressive than it is (IMO). You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.

    Wake me up when a LLM figures out stable fusion or room temperature superconductors.

    • krackers 3 days ago

      I think you're misrepresenting the study. It builds upon previous work that examines the computation power of the transformer architecture from a circuit-complexity perspective. Previous work showed that the class of problems that a "naive" Transformer architecture could compute was within TC0 [1, 2] and as a consequence it was fundamentally impossible for transformers to solve certain classes of mathematical problems. This study actually provides a more realistic bound of AC0 (by analyzing the finite-precision case) which rules out even more problems, including such 'simple' ones as modular parity.

      We also had previous work that hinted that part of the reason why chain-of-thought works from a theoretical perspective is that it literally allows the model to perform types of computations it could not under the more limited setting (in the same way jumping from FSMs to pushdown automata allows you to solve new types of problems) [3].

      [1] https://news.ycombinator.com/item?id=35609652 [2] https://blog.computationalcomplexity.org/2023/02/why-cant-li... [3] https://arxiv.org/abs/2305.15408

      • shawntan 3 days ago

        Generally, literature on the computational power of the SAME neural architecture can differ on their conclusions based on their premises. Assuming finite precision will give a more restrictive result, and assuming arbitrary precision can give you Turing completeness.

        From a quick skim this seems like it's making finite precision assumptions? Which doesn't actually tighten previous bounds, it just makes different starting assumptions.

        Am author of [1].

    • Horffupolde 3 days ago

      It is actually impressive.

      One could argue that writing enabled chain of thought across generations.

    • Veedrac 3 days ago

      > Wake me up when a LLM figures out stable fusion or room temperature superconductors.

      Man, the goalposts these days.

      • FeepingCreature 3 days ago

        "I love [goalposts]. I love the whooshing noise they make as they go by." --Douglas Adams, slightly adjusted

    • whimsicalism 3 days ago

      it's a TCS result.

      seems like many commenting don't know about computability

    • WalterSear 3 days ago

      > You can say the same thing about humans

      1. Holy shit.

      2. You can't apply Moore's law to humans.

      • Tostino 3 days ago

        You can't to chips any more either.

        Density has continued to increase, but so have prices. The 'law' was tied to the price to density ratio, and it's been almost a decade now since it died.

      • gryn 3 days ago

        > 2. You can't apply Moore's law to humans.

        not with that attitude. /s

        if you take reproduction into account and ignore all the related externalities you can definitely double your count of transistors (humans) every two years.

    • aurareturn 3 days ago

      > You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.

      Isn’t this a good thing since compute can be scaled so that the LLM can do generations of human thinking in a much shorter amount of time?

      Say humans can solve quantum gravity in 100 years of thinking by 10,000 really smart people. If one AGI is equal to 1 really smart person. Scale enough compute for 1 million AGI and we can solve quantum gravity in a year.

      The major assumption here is that transformers can indeed solve every problem humans can.

      • wizzwizz4 3 days ago

        > Isn’t this a good thing since compute and be scaled so that the LLM can do generations of human thinking in a much shorter amount of time?

        But it can't. There isn't enough planet.

        > The major assumption here is that transformers can indeed solve every problem humans can.

        No, the major assumptions are (a) that ChatGPT can, and (b) that we can reduce the resource requirements by many orders of magnitude. The former assumption is highly-dubious, and the latter is plainly false.

        Transformers are capable of representing any algorithm, if they're allowed to be large enough and run large enough. That doesn't give them any special algorithm-finding ability, and finding the correct algorithms is the hard part of the problem!

      • visarga 3 days ago

        > Scale enough compute for 1 million AGI and we can solve quantum gravity in a year.

        That is wrong, it misses the point. We learn from the environment, we don't secrete quantum gravity from our pure brains. It's a RL setting of exploration and exploitation, a search process in the space of ideas based on validation in reality. A LLM alone is like a human locked away in a cell, with no access to test ideas.

        If you take child Einstein and put him on a remote island, and come back 30 years later, do you think he would impress you with is deep insights? It's not the brain alone that made Einstein so smart. It's also his environment that had a major contribution.

  • [removed] 3 days ago
    [deleted]
  • m3kw9 3 days ago

    Sort of like quantum superposition state? So here is an idea, using quantum to produce all possible inferences and use some not yet invented algorithms to collapse to the final result

  • [removed] 3 days ago
    [deleted]
  • tooltower 3 days ago

    Constant depth circuits can solve everything? I feel like I missed some important part of circuit complexity. Or this is BS.

    • shawntan 3 days ago

      Using CoT implicitly increases the depth of the circuit. But yes, poorly worded.

JSDevOps 3 days ago

So given infinite time and resources it can solve any problem? Hardly groundbreaking is it.

HarHarVeryFunny 3 days ago

Sure, in same sense as an infinitely long tape let's a Turing machine solve arbitrary problems. In theory at least. If one had the right program.

  • falcor84 3 days ago

    It's not clear me what you're saying; isn't the whole deal here that by performing RL on the CoT (given sufficient size and compute) it would converge to the right program?

    • HarHarVeryFunny 3 days ago

      I was really saying two things:

      1) The theoretical notion that a fixed depth transformer + COT can solve arbitrary problems involving sequential computation is rather like similar theoretical notions of a Turing machine as universal computer, or of an ANN with a hidden layer able to represent arbitrary functions .. it may be true, but at the same time not useful

      2) The Turing machine, just as the LLM+COT, is only as useful as the program it is running. If the LLM-COT is incapable of runtime learning and just trying to mimic some reasoning heuristics, then that is going to limit it's function, even if theoretically such an "architecture" could do more if only it were running a universal AGI program

      Using RL to encourage the LLM to predict continuations according to some set of reasoning heuristics is what it is. It's not going to make the model follow any specific reasoning logic, but is presumably hoped to generate a variety of continuations that the COT "search" will be able to utilize to arrive at a better response than it otherwise would have done. More of an incremental improvement (as reflected in the benchmark scores it achieves) than "converging to the right program".

    • __loam 3 days ago

      Sometimes reading hackernews makes me want to slam my head on the table repeatedly. Given sufficient size and compute is one of the most load bearing phrases I've ever seen.

      • falcor84 3 days ago

        But it is load bearing. I mean, I personally can't stop being amazed at how with each year that passes, things that were unimaginable with all the world's technology a decade ago are becoming straightforward to run on a reasonably priced laptop. And at this stage, I wouldn't bet even $100 against any particular computational problem being solved in some FAANG datacenter by the end of the decade.

mg 3 days ago

Has it been publicly benchmarked yet, if this approach:

    Hello LLM, please solve this task: <task>
Can be improved by performing this afterwards?

    for iteration in range(10):
        Hello LLM, please solve this task: <task>
        Here is a possible solution: <last_reply>
        Please look at it and see if you can improve it.
        Then tell me your improved solution.
  • lorepieri 2 days ago

    Not sure if it has been benchmarked, but I've called this technique the "for-loop of thought". :)

  • Kiro 3 days ago

    Isn't that the whole reason that o1 works?

    • ben_w 3 days ago

      I think o1 is more like "pretend you're doing a job interview, think step and show your working".

      I tried something similar to the suggested iterative loop on a blog post I'd authored but wanted help copy editing; first few were good enough, but then it got very confused and decided the blog post wasn't actually a blog post to be edited and instead that what I really wanted to know was the implications of Florida something something Republican Party.

      Benchmark would be neat, because all I have is an anecdote.

tossandthrow 3 days ago

> We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.

This is also the case with plain and regular RNNs

  • baq 3 days ago

    Now just need an autoregressive transformer <==> RNN isomorphism paper and we're golden

    • logicchains 3 days ago

      Plain RNNs are theoretically weaker than transformers with COT: https://arxiv.org/abs/2402.18510 .

      • tossandthrow 3 days ago

        The paper says transformers perform better than RNNs, which is not surprising.

        However, they are both, theoretically, Turing complete computers. So they are equally expressive.

seydor 3 days ago

'can'

But will they? I believe the frontier has moved to making them make sense instead of just making infinite language.

The infinite monkey problem is not solved yet

scotty79 3 days ago

Chain of thought GPT is sort of a Turing machine with a tape that it's allowed to write to for purposes other than outputting the answer.

cpldcpu 3 days ago

Can any of these tools do anything that the Github copilot cannot do? (Apart from using other models?). I tried Continue.dev and cursor.ai, but it was not immediately obvious to me. Maybe I am missing something workflow specific?

floppiplopp 3 days ago

They have also mathematically proven that transformers are great randomness generators.

empath75 3 days ago

Is this more general than LLMs? Is it possible to do something Chain-of-Thought-like in a transformer model that _isn't_ trained on language?

glial 3 days ago

Apologies if this is a dumb question, but aren't all computations inherently serial? In that a Turing machine performs operations serially?

  • joe_the_user 3 days ago

    Aren't all computations inherently serial?

    No. "inherently serial" refers to problems that are specified serially and can't be spend up by parallel processing. The sum of a set of N numbers is an example of a problem that is not inherently serial. You can use parallel reduction to perform the computation in O(log(N)) time on an idealized parallel computer but it takes O(N) time on an idealized serial computer.

    And, it turns, exactly which problems are really are inherently serial is somewhat challenging problem.

    • visarga 3 days ago

      > The sum of a set of N numbers is an example of a problem that is not inherently serial.

      But addition with floats (not reals) is non associative.

      • immibis 3 days ago

        They didn't say floats, and the sum of a set of floats is not uniquely defined as a float for the rain you stated, at least not without specifying a rounding mode. Most people use "round to whatever my naïve code happens to do" which has many correct answers. To add up a set of floats with only the usual 0.5ULP imprecision, yes, isn't trivial.

  • tromp 3 days ago

    Turing Machines are just one of many computational models. Others offer more parallelism. Two examples:

    In lambda calculus, disjoint redexes can be reduced in parallel.

    And in interaction nets, all active pairs can be reduced in parallel [1].

    []1 https://en.wikipedia.org/wiki/Interaction_nets

  • ants_everywhere 3 days ago

    You can model parallel computation by an arbitrary finite product of Turing machines. And then, yes, you can simulate that product on a single Turing machine. I think that's the sort of thing you have in mind?

    But I'm not aware of what "inherently serial" means. The right idea likely involves talking about complexity classes. E.g. how efficiently does a single Turing machine simulate a product of Turing machines? An inherently serial computation would then be something like a problem where the simulation is significantly slower than running the machines in parallel.

  • ninetyninenine 3 days ago

    Yeah it's talking about a new feature for LLMs where the output of an LLM is fed back in as input and done again and again and again and this produces way more accurate output.

tonii141 3 days ago

Random generator of tokens can also solve any problem if you give it enough time and memory.

qmatch 3 days ago

Is this similar to the Universal Approximator Theorem?

CarRamrod 3 days ago

Damn, we just used our entire Round A acquiring an infinite amount of bananas and typewriter ink. The boss is not going to like this.

  • nopinsight 3 days ago

    No worries! With the magic bananas and ink you've acquired, those monkeys will surely produce output with a signal-to-noise ratio rivaling the best LLMs.

    I’m sure your startup will achieve the coveted Apeicorn status soon!

  • dotancohen 3 days ago

    Naturally.

    It's the printer ink that is forbiddingly expensive. And the bananas are carbon neutral.

  • imjonse 3 days ago

    Hopefully not Cavendish, as those are too sugary for monkeys and you'll just get hallucinations.

  • bryanrasmussen 3 days ago

    did you get both infinite bananas and infinite typewriter ink, or was there a limited supply of typewriter ink? If the first, it was worth it.

    • [removed] 3 days ago
      [deleted]
theshrike79 3 days ago

Are we getting to a point where the LLM will just answer "42" and we need to figure out the question? =)