Comment by rahen

Comment by rahen 2 days ago

No need for an RPi 5. Back in 1982, a dual or quad-CPU X-MP could have run a small LLM, say, with 200–300K weights, without trouble. The Crays were, ironically, very well suited for neural networks, we just didn’t know it yet. Such an LLM could have handled grammar and code autocompletion, basic linting, or documentation queries and summarization. By the late 80s, a Y-MP might even have been enough to support a small conversational agent.

A modest PDP-11/34 cluster with AP-120 vector coprocessors might even have served as a cheaper pathfinder in the late 70s for labs and companies who couldn't afford a Cray 1 and its infrastructure.

But we lacked both the data and the concepts. Massive, curated datasets (and backpropagation!) weren’t even a thing until the late 80s or 90s. And even then, they ran on far less powerful hardware than the Crays. Ideas and concepts were the limiting factor, not the hardware.

fentonc 19 hours ago

I think a quad-CPU X-MP is probably the first computer that could have run (not train!) a reasonably impressive LLM if you could magically transport one back in time. It supported a 4GB (512 MWord) SRAM-based "Solid State Drive" with a supported transfer bandwidth of 2 GB/s, and about 800 MFLOPS CPU performance on something like a big matmul. You could probably run a 7B parameter model with 4-bit quantization on it with careful programming, and get a token every couple seconds.

Reply View 0 replies

adwn a day ago

> a small LLM, say, with 200–300K weights

A "small Large Language Model", you say? So a "Language Model"? ;-)

> Such an LLM could have handled grammar and code autocompletion, basic linting, or documentation queries and summarization.

No, not even close. You're off by 3 orders of magnitude if you want even the most basic text understanding, 4 OOM if you want anything slightly more complex (like code autocompletion), and 5–6 OOM for good speech recognition and generation. Hardware was very much a limiting factor.

Reply View 8 replies

rahen a day ago

I would have thought the same, but EXO Labs showed otherwise by getting a 300K-parameter LLM to run on a Pentium II with only 128 MB of RAM at about 50 tokens per second. The X-MP was in the same ballpark, with the added benefit of native vector processing (not just some extension bolted onto a scalar CPU) which performs very well on matmul.
https://www.tomshardware.com/tech-industry/artificial-intell...
John Carmack was also hinting at this: we might have had AI decades earlier, obviously not large GPT-4 models but useful language reasoning at a small scale was possible. The hardware wasn't that far off. The software and incentives were.
https://x.com/ID_AA_Carmack/status/1911872001507016826

Reply View | 7 replies
- adwn a day ago
  
  > EXO Labs showed otherwise by getting a 300K-parameter LLM to run on a Pentium II with only 128 MB of RAM at about 50 tokens per second
  50 token/s is completely useless if the tokens themselves are useless. Just look at the "story" generated by the model presented in your link: Each individual sentence is somewhat grammatically correct, but they have next to nothing to do with each other, they make absolutely no sense. Take this, for example:
  "I lost my broken broke in my cold rock. It is okay, you can't."
  Good luck tuning this for turn-based conversations, let alone for solving any practical task. This model is so restricted that you couldn't even benchmark its performance, because it wouldn't be able to follow the simplest of instructions.
  
  Reply View | 6 replies
  
  rahen a day ago
  
  You're missing the point. No one is claiming that a 300K-param model on a Pentium II matches GPT-4. The point is that it works: it parses input, generates plausible syntax, and does so using algorithms and compute budgets that were entirely feasible decades ago. The claim is that we could have explored and deployed narrow AI use cases decades earlier, had the conceptual focus been there.
  Even at that small scale, you can already do useful things like basic code or text autocompletion, and with a few million parameters on a machine like a Cray Y-MP, you could reasonably attempt tasks like summarizing structured or technical documentation. It's constrained in scope, granted, but it's a solid proof of concept.
  The fact that a functioning language model runs at all on a Pentium II, with resources not far off from a 1982 Cray X-MP, is the whole point: we weren’t held back by hardware, we were held back by ideas.
  
  Reply View | 5 replies