LLMs have reached a point of diminishing returns

(garymarcus.substack.com)

131 points by signa11 3 days ago

abc-1 3 days ago

Anyone who followed Deep Learning in the 2010s would have guessed the same thing. Big boom with vision models by adding a lot of layers and data, but eventually there was diminishing returns there too. It’s unsurprising the same would happen with LLMs. I don’t know why people keep expecting anything other than a sigmoid curve. Perhaps they think it’s like Moore’s law but that’s simply not the case in this field.

But that’s fine, LLMs as-is are amazing without being AGI.

Reply View 4 replies

mikae1 3 days ago

> But that’s fine
Perhaps not to those who invested based on promises of rapid eternal growth ending with AGI.

Reply View | 2 replies
- 9dev 3 days ago
  
  But that’s the way of the market; it rightfully punishes those with a flawed (or missing) understanding of a technology. And that’s a good thing.
  
  Reply View | 1 reply
  
  born2web 3 days ago
  
  I wish the market limits its punishment only to the groups you stated... usually the market tends to over-correct
  
  Reply View | 0 replies
smahs 3 days ago

Leaving the distortions from inflated and unrealistic expectations (case in point: people expecting evolution of AGI somehow have not yet well defined what AGI is), I also think that in the mid-long run the current state of LLMs will bloom an entire economy for migration of legacy apps to have conversational APIs. The same investors will then have a new gold rush to chase, as it always happen.

Reply View | 0 replies

ilaksh 3 days ago

- There _was_ a problem with diminishing returns from increasing data size. Then they surpassed that by curating data.

- Then the limits on the amount of curatable data available made the performance gains level off. So they started generating data and that pushed the nose up again.

- Eventually, even with generated data, gains flattened out. So they started increasing inference time. They have now proven that this improves the performance quite a bit.

It's always been a series of S-curves and we have always (sooner or later) innovated to the next level.

Marcus has always been a mouth just trying to take down neural networks.

Someday we will move on from LLMs, large multimodal models, transformers, maybe even neural networks, in order to add new levels and types of intelligence.

But Marcus's mouth will never stop yapping about how it won't work.

I think we are now at the point where we can literally build a digital twin video avatar to handily win a debate with Marcus, and he will continue to deny that any of it really works.

Reply View 15 replies

trott 3 days ago

> Marcus has always been a mouth just trying to take down neural networks.
This isn't true. Marcus is against "pure NN" AI, especially in situations where reliability is desired, as would be the case with AGI/ASI.
He advocates [1] neurosymbolic AI, i.e. hybridizing NNs with symbolic approaches, as a path to AGI. So he's in favor of NNs, but not "pure NNs".
[1] https://arxiv.org/abs/2308.04445

Reply View | 4 replies
- ilaksh 3 days ago
  
  He does not spend an appreciable amount of effort or time advocating for that though. He spends 95% of his energy trying to take down the merits of NN-based approaches.
  If he had something to show for it, like neurosymbolic wins over benchmarks for LLMs, that would be different. But he's not a researcher anymore. He's a mouth, and he is so inaccurate that it is actually dangerous, because some government officials listen to him.
  I actually think that neurosymbolic approaches could be incredible and bring huge gains in performance and interpretability. But I don't see Marcus spending a lot of effort and doing quality research in that area that achieves much.
  The quality of his arguments seems to be at the level of a used furniture salesman.
  
  Reply View | 1 reply
  
  xpe 2 days ago
  
  > He spends 95% of his energy trying to take down the merits of NN-based approaches.
  The 95% figure comes from where? (I don't think the commenter above has a basis for it.)
  How often does Marcus-the-writer take aim at NN-based approaches? Does he get this specific?
  I often see Gary Marcus highlighting some examples where generative AI technologies are not as impressive as some people claim. I can't recall him doing the opposite.
  Neither can I recall a time when Marcus specifically explained why certain architectures are {inappropriate or irredeemable} either {in general or in particular}.
  Have I missed some writing where Marcus lays out a compelling multi-sided evaluation of AI systems or companies? I doubt it. But, please, if he has, let me know.
  Marcus knows how to cherry-pick failure. I'm not looking for a writer who has staked out one side of the arguments. Selection bias is on full display. It is really painful to read, because it seems like he would have the mental horsepower to not fall into these traps. Does he not have enough self-awareness nor intellectual honesty to write thoughtfully? Or is this purely a self-interested optimization -- he wants to build an audience, and the One-Sided Argument Pattern works well for him.
  
  Reply View | 0 replies
- ilaksh 3 days ago
  
  Just thinking about this.. do you know if anyone has figured out a way to reliably encode a Turing machine or simple virtual machine in the layers of a neural network, in a somewhat optimal way, using a minimized number of parameters?
  Or maybe fully integrating differentiable programming into networks. It just seems like you want to keep everything in matrices in the AI hardware to get the really high efficiency gains. But even without that, I would not complain about an article that Marcus wrote about something along those lines.
  But the one you showed has interesting ideas but lacked substance to me and doesn't seem up to date.
  
  Reply View | 1 reply
  
  trott 3 days ago
  
  > Turing machine or simple virtual machine in the layers of a neural network
  There's the Neural Turing Machine and the Differentiable Neural Computer, among others.
  
  Reply View | 0 replies
EternalFury 3 days ago

No one in their right mind will argue neural nets cannot outperform humans at resampling data they have previously been exposed to. So, digital twins and debate, they probably can do better than any human.

Reply View | 1 reply
- ilaksh 3 days ago
  
  Marcus would argue against digital Marcus on this point and lose.
  
  Reply View | 0 replies
sheeshkebab 3 days ago

I’m still waiting for a neural net that can do my laundry. Until there is one I’m on Marcus’ side.

Reply View | 7 replies
- Terretta 3 days ago
  
  That's less a GenAI problem, more robotics. And perhaps it's here, literally doing your laundry:
  https://x.com/physical_int/status/1852041726279794788
  https://www.physicalintelligence.company/blog/pi0
  https://www.entrepreneur.com/business-news/jeff-bezos-backed...
  
  Reply View | 3 replies
  
  sheeshkebab 3 days ago
  
  No teleoperation, can haul the basket from 3 floor down and back up fully folded and put away in my closet without me doing a thing.
  
  Reply View | 2 replies
- edanm 3 days ago
  
  I'm still waiting for a computer that can make my morning coffee. Until it's there I don't really believe in this whole "computer" or "internet" thing, it's all a giant scam that has no real-world benefit.
  
  Reply View | 2 replies
  
  GuB-42 a day ago
  
  Automatic coffee machines are literally a computer making your morning coffee :)
  But my washing machine doesn't have a neural network... yet. I am sure that there is some startup somewhere planning to do it.
  
  Reply View | 0 replies
  
  bathtub365 3 days ago
  
  What is lacking compared to current bean to cup coffee makers?
  
  Reply View | 0 replies

hedgehog 3 days ago

The context some commenters here seem to be missing is that Marcus is arguing that spending another $100B on pure scaling (more params, more data, more compute) is unlikely to repeat the qualitatively massive improvement we saw between say 2017 and 2022. We see some evidence this is true in the shift towards what I categorize as system integration approaches: RAG, step by step reasoning, function calling, "agents", etc. The theory and engineering is getting steadily better as evidenced by the rapidly improving capability of models down in the 1-10B param range but we don't see the same radical improvements out of ChatGPT etc.

Reply View 13 replies

int_19h 3 days ago

I don't see how that is evidence of the claim. We are doing all these things because they make existing models work better, but a larger model with RAG etc is still better than a small one, and everyone keeps working on larger models.

Reply View | 8 replies
- hedgehog 3 days ago
  
  There is a contingent that I think Marcus is responding to that have been claiming that all we need to get to AGI or ASI is pure transformer scaling, and that we were very close with only maybe $10B or $100B more investment to get there. If the last couple of years of research have given us only incrementally better models to the point that even the best funded teams are moving to hybrid approaches then that's evidence that Marcus is correct.
  
  Reply View | 6 replies
  
  klipt 3 days ago
  
  This website by a former OpenAI employee was arguing that a combination of hardware scaling, algorithmic improvements, etc would all combine to yield AGI in the near future: https://situational-awareness.ai/
  
  Reply View | 0 replies
  
  KaoruAoiShiho 3 days ago
  
  [flagged]
  
  Reply View | 4 replies
- williamtrask 3 days ago
  
  "a larger model with RAG etc is still better than a small one"
  This paper from DeepMind a few years ago offers a counter example to this claim.
  https://arxiv.org/abs/2112.04426
  
  Reply View | 0 replies
semicolon_storm 3 days ago

Perhaps because that's a strawman argument. "Scaling" doesn't mean double the investment and get double the performance. Even OpenAI's own scaling laws paper doesn't argue that, in the graphs compute increases exponentially. What LLM scaling means is that there hasn't been a wall found where the loss stops decreasing. Increase model size/data/compute and loss will decrease -- so far.

Reply View | 0 replies
edanm 3 days ago

That's important context.
But in the article, Gary Marcus does what he normally does - make far broader statements than the narrow "LLM architecture by itself won't scale to AGI" or even "we will or even are reaching diminishing returns with LLMs". I don't think that's as controversial a take as he might imagine.
However, he's going from a purely technical guess, which might or might not be true, and then making fairly sweeping statements on business and economics, which might not be true even if he's 100% right about the scaling of LLMs.
He's also seemingly extremely dismissive of the current value of LLMs. E.g. this comment which he made previously and mentions that he stands by:
> If enthusiasm for GenAI dwindles and market valuations plummet, AI won’t disappear, and LLMs won’t disappear; they will still have their place as tools for statistical approximation.
Is there anyone who thinks "oh gee, LLMs have a place for statistical approximation"? That's an insanely irrelevant way to describe LLMs, and given the enormous value that existing LLM systems have already created, talking about "LLMs won't disappear, they'll still have a place" just sounds insane.
It shouldn't be hard to keep two separate thoughts in mind:
1. LLMs as they currently exist, without additional architectural changes/breakthroughs, will not, on their own, scale to AGI .
2. LLMs are already a massively useful technology that we are just starting to learn how to use and to derive business value from, and even without scaling to AGI, will become more and more prevalent.
I think those are two statements that most people should be able to agree with, probably even including most of the people Marcus is supposedly "arguing against", and yet from reading his posts it sounds like he completely dismisses point 2.

Reply View | 2 replies
- jpc0 3 days ago
  
  > 2. LLMs are already a massively useful technology that we are just starting to learn how to use and to derive business value from, and even without scaling to AGI, will become more and more prevalent.
  No offence but every use of AI I have tried has been amazing but I haven't been comfortable deploying as a business use. The one or two places it is "good enough" it is effectively just reducing workforce and that reduction isn't translating into lower costs or general uplift, it is currently translating into job losses and increased profit margins.
  I'm AI sceptical, I feel it is a tradeoff where quality of output is reduced but also is (currently) cheaper so businesses are willing to jump in.
  At what point does OpenAI/Claude/Gemini etc stop hyperscaling and start running a profit which will translate into higher costs. So then the current reduction in cost isn't there. We will be left holding the bag of higher unemployment and an inferior product that costs the same amount of money.
  There are large unanswered questions about AI which makes me entirely anti-AI. Sure the technology is amazing as it stands, but it is fundamentally a lossy abstraction over reality and many people will happily accept the lossy abstraction but not look forward into what happens when that is the only option you have and it's no cheaper than the less lossy option (humans).
  
  Reply View | 1 reply
  
  munksbeer 15 hours ago
  
  > The one or two places it is "good enough" it is effectively just reducing workforce and that reduction isn't translating into lower costs or general uplift, it is currently translating into job losses and increased profit margins
  What sort of examples show this?
  
  Reply View | 0 replies

z7 3 days ago

Meanwhile, two days ago Altman said that the pathway to AGI is now clear and "we actually know what to do", that it will be easier than initially thought and "things are going to go a lot faster than people are appreciating right now."

To which Noam Brown added: "I've heard people claim that Sam is just drumming up hype, but from what I've seen everything he's saying matches the ~median view of OpenAI researchers on the ground."

https://x.com/polynoamial/status/1855037689533178289

Reply View 17 replies

alsetmusic 3 days ago

It’s in his (and his company’s) best interest to drive hype as hard and fast as he can. The deal they inked to go private includes penalties if they don’t do so within a defined timeframe (two or three years, I think?). I believe the terms specify that they can be made to pay back investors if they fail to meet that goal. They don’t have that money, not even close. It would mean death for OpenAI.
Show me a better reason to lie and pump up your company’s tech and I’ll buy you lunch. AGI is nowhere on their (feasible) near-term roadmap.

Reply View | 6 replies
- raincole 3 days ago
  
  It's also Marcus's best interest to push "LLM is hitting a wall" agenda. Check his blog. It's basically his whole online personality now.
  So Marcus and Altman are both speaking out of their agendas, except Altman has a product and Marcus has... a book.
  
  Reply View | 3 replies
  
  xnx 3 days ago
  
  > except Altman has a product and Marcus has... a book.
  That makes it sound like Altman has even greater incentive for motivated reasoning.
  
  Reply View | 2 replies
- tjr 3 days ago
  
  If they signed such a deal, I wonder what the legal definition of AGI is?
  
  Reply View | 1 reply
  
  alsetmusic a day ago
  
  I think it's only about going from non-profit to private, but I haven't read the deal.
  
  Reply View | 0 replies
blharr 3 days ago

To be fair 1. He's not saying that this is true. Only that the view is popular among the researchers.
2. This view of "were just this close and were only getting closer" is exactly the kind of dogma that you have to accept when you become a researcher.

Reply View | 1 reply
- mewpmewp2 3 days ago
  
  But how can then someone from the outside claim anything more accurate or even more claim that Altman is finding diminishing returns when Altman themselves claim otherwise. They offerene no arguments of substance in this article except some dramatic language on how they predicted it even before GPT3.5 came out.
  
  Reply View | 0 replies
laweijfmvo 3 days ago

I find it difficult to believe that LLMs were even on the path toward AGI, let alone one of the last steps.

Reply View | 1 reply
- grugagag 3 days ago
  
  I think LLMs will have something to contribute to AGI but by themselves they ain’t no AGI. Maybe some LLM of concepts and abstract thought would yield more squeeze but some fundamentally new (or old) things need to be added to the mix IMO.
  
  Reply View | 0 replies
julianeon 3 days ago

I don't know if we can trust OpenAI researchers to be objective after the recent escapades with Sam Altman, public opinion and its effect on OpenAI's valuation. They are intelligent people and we all know now what the public wants - needs - to hear.

Reply View | 0 replies
croes 3 days ago

Sounds like Musk describing FSD.

Reply View | 0 replies
TaylorAlexander 3 days ago

I don’t think Altman’s predictions about AI progress can be relied upon. With tens of billions of dollars or more in company value tied up in that claim, I don’t think any person could be capable of true objective assessment. See for example Musk’s decade of baffling promises about self driving, which have ensured high stock values for Tesla while also failing to come to pass.

Reply View | 0 replies
cratermoon 3 days ago

News Flash: company that has sunk billions into GPT and LLMs trying to get AGI asserts that AGI is just around the corner.

Reply View | 0 replies
[removed] 3 days ago

[deleted]

Reply View | 0 replies
suprjami 3 days ago

We can send the AGI to the Mars colony which Elon Musk will have going by 2022.

Reply View | 0 replies

lukev 3 days ago

I am in full agreement that LLMs themselves seem to be beginning to level out. Their capabilities do indeed appear to be following a sigmoid curve rather than an exponential one, which is entirely unsurprising.

That doesn't mean there's not a lot of juice left to squeeze out of what's available now. Not just from RAG and agent systems, but also integrating neuro-symbolic techniques.

We can do this already just with prompt manipulation and integration with symbolic compute systems: I gave a talk on this at Clojure Conj just the other week (https://youtu.be/OxzUjpihIH4, apologies for the self promotion but I do think it's relevant.).

And that's just using existing LLMs. If we start researching and training them specifically for compatibility with neuro-symbolic data (e.g, directly tokenizing and embedding ontologies and knowledge graphs), it could unlock a tremendous amount of capability.

Reply View 16 replies

joe_the_user 3 days ago

Even more, each earlier explosion of AI optimism involved tech that barely panned-out at all. For investors, something that's yielded things of significant utility, is yielding more and promises the potential of far more if X or Y hurdle is cleared, is a pretty appealing thing.
I respect Marcus' analysis of the technology. But a lot of AI commentators have become habituated to shouting "AI winter" every time the tech doesn't live up to promises. Now that some substance is clearly present in AI, I can't imagine people stop trying to get a further payoff for the foreseeable future.

Reply View | 12 replies
- cratermoon 3 days ago
  
  > For investors, something that's yielded things of significant utility
  what exactly have investors gotten in return for their investment?
  
  Reply View | 11 replies
  
  PopePompus 3 days ago
  
  A product which will significantly improve the productivity of programmers, if nothing else. That may not be a good return on investment, but I think it is undeniable that recent AI advances have nonzero value for coding.
  
  Reply View | 10 replies
YetAnotherNick 3 days ago

I tracked ELO rating in Chatbot Arena for GPT-4/o series models over around 1.5 years(which are almost always highest rated), and at least on this metric it not only seems to be not stagnated, but also growth seems to be increasing[1]
[1]: https://imgur.com/a/r5qgfQJ

Reply View | 2 replies
- zamadatix 3 days ago
  
  Something seems quite off with the metric. Why would 4o recently increase on itself at a rate ~17x faster than 4o increased on 4 in that graph? E.g. ELO is a competitive metric, not an absolute metric, so someone could post the same graph with the claim the cause was "many new LLMs are being added to the system are not performing better than previous large models like they used to" (not saying it is or isn't, just saying the graph itself doesn't give context that LLMs are actually advancing at different rates or not).
  
  Reply View | 1 reply
  
  YetAnotherNick 2 days ago
  
  Chatbot arena also has H2H win rate for each pair of models for non tied results[1], so as to detect the global drift. e.g the gpt-4o released on 2024/09/03 wins 69% of the times with respect to gpt-4o released on 2024/05/13 in blind test.
  [1]: https://lmarena.ai/
  
  Reply View | 0 replies

prpl 3 days ago

I’m not going to argue with the possibility they may have hit a wall, but pointing to 2022 as when this wall happened is weird considering the enormous capability gap between models available then and the ones now.

There’s probably a wall, but what exists might just be good enough for it to not matter much.

Reply View 7 replies

hedgehog 3 days ago

Compare 2022 to 2020 or 2017 though.

Reply View | 0 replies
nickpsecurity 3 days ago

I’m still waiting for a large, OSS one with 100% legal, pre-training data. We don’t even have a 1B model that I’m sure meets that standard. There’s a fair-trained model for lawyers claiming it.
I think someone running a bunch of epochs of a 30B or 70B on Project Gutenberg would be a nice start. We could do continued pre-training from there.
So, if counting legal and at least trainable (open weights), the performance can only go up from here.

Reply View | 5 replies
- copperx 3 days ago
  
  I understand the desire, but most of the world's knowledge is under copyright. 100% legal will never give you the same performance.
  
  Reply View | 2 replies
  
  nickpsecurity 2 days ago
  
  Both of your claims are true. That doesn’t justify breaking the law.
  I could likewise argues that most of the world money is in the hands of other people, I could perform more in the markets if I had it, and so I should just go take it. We still follow the law and respect others’ rights in spite of what acting morally cost us.
  The law abiding, moral choice is to do what we can within the law while working to improve the law. That means we use a combination of permissively licensed works and works to train our models. We also push for legislation that creates exceptions in copyright law for training machine learning models. We’re already seeing progress in Israel and Singapore on those.
  
  Reply View | 0 replies
  
  mewpmewp2 3 days ago
  
  Meanwhile countries who whistle on that copyright would be able to gain a huge advantage.
  
  Reply View | 0 replies
- hedgehog 3 days ago
  
  Are you aware of any efforts to do this? Even a 3B param attempt would be informative.
  
  Reply View | 1 reply
  
  nickpsecurity 2 days ago
  
  Here is the only legal efforts I know about that’s available in some way:
  https://www.fairlytrained.org/
  https://www.kl3m.ai/#features
  Here’s a dataset that could be used for a public domain model:
  https://www.tensorflow.org/datasets/catalog/pg19
  If non-public domain, one can add in the code from The Stack. That would be tens of gigabytes of both English text and code. Then, third-party could add licensed, modern works to the model with further pre-training.
  I also think a model trained on a large amount of public domain data would be good for experimentation with reproduceability. There would be no intellectual property issues in the reproduction of the results. Should also be useful in a lot of ways.
  
  Reply View | 0 replies

lambdaone 3 days ago

He's part right. There's certainly a law of diminishing returns in terms of model size, compute time, dataset size etc. if all that is to be done is to do the same as we are currently doing, only more so.

But what Marcus seems to be assuming is the impossibility of any fundamental theoretical improvements in the field. I see the reverse; the insights being gained from brute-force models have resulted in a lot of promising research.

Transformers are not the be-all and end-all of models, nor are current training methods the best that can ever be achieved. Discounting any possibility of further theoretical developments seems a bold position to take.