Comment by bkettle

Comment by bkettle 6 hours ago

This free tradition in software is I think one of the things that I love so much, but I don't see how it can continue with LLMs due to the extremely high training costs and the powerful hardware required for inference. It just seems like writing software will necessarily require paying rent to the LLM hosts to keep up. I guess it's possible that we'll figure out a way to do local inference in a way that is accessible to everyone in the way that most other modern software tools are, but the high training costs make that seem unlikely to me.

I also worry that as we rely on LLMs more and more, we will stop producing the kind of tutorials and other content aimed at beginners that makes it so easy to pick up programming the manual way.

levocardia 5 hours ago

There's a Stephen Boyd quote that's something like "if your optimization problem is too computationally expensive, just go on vacation to Greece for a few weeks and by the time you get back, computers might be fast enough to solve it." With LLMs there's sort of an equivalent situation with cost: how mindblowing would it be able to train this kind of LLM at all even just 4 years ago? And today you can get a kindergartener level chat model for about $100. Not hard to imagine the same model costing $10 of compute in a few years.

There's also a reasonable way to "leapfrog" the training cost with a pre-trained model. So if you were doing nanochat as a learning exercise and had no money, the idea would be to code it up, run one or two very slow gradient descent iterations on your slow machine to make sure it is working, then download a pre-trained version from someone who could spare the compute.

Reply View 4 replies

dingnuts 5 hours ago

> today you can get a kindergartener level chat model for about $100. Not hard to imagine the same model costing $10 of compute in a few years.
No, it's extremely hard to imagine since I used one of Karpathy's own models to have a basic chat bot like six years ago. Yes, it spoke nonsense; so did my GPT-2 fine tune four years ago and so does this.
And so does ChatGPT
Improvement is linear at best. I still think it's actually a log curve and GPT3 was the peak of the "fun" part of the curve. The only evidence I've seen otherwise is bullshit benchmarks, "agents" that increase performance 2x by increasing token usage 100x, and excited salesmen proclaiming the imminence of AGI

Reply View | 3 replies
- simonw 4 hours ago
  
  Apparently 800 million weekly users are finding ChatGPT useful in its present state.
  
  Reply View | 2 replies
  
  infinitezest 3 hours ago
  
  1. According to who? Open AI? 2. Its current state is "basically free and containing no ads". I don't think this will remain true given that, as far as I know, the product is very much not making money.
  
  Reply View | 1 reply
  
  simonw 3 hours ago
  
  Yes, that number is according to OpenAI. They released that 800m number at DevDay last week.
  The most recent leaked annualized revenue rate was $12bn/year. They're spending a lot more than that but convincing customers to hand over $12bn is still a very strong indicator of demand. https://www.theinformation.com/articles/openai-hits-12-billi...
  
  Reply View | 0 replies

DennisP 3 hours ago

Maybe this isn't possible for LLMs yet, but open source versions of AlphaZero have been trained on peer-to-peer networks.

https://zero.sjeng.org/

https://katagotraining.org/

Reply View 0 replies

hodgesrm 6 hours ago

This. It looks like one of the keys to maintaining open source is to ensure OSS developers have access to capable models. In the best of worlds, LLM vendors would recognize that open source software is the commons that feeds their models and ensure it flourishes.

In the real world...

Reply View 0 replies