Comment by rahen
I would have thought the same, but EXO Labs showed otherwise by getting a 300K-parameter LLM to run on a Pentium II with only 128 MB of RAM at about 50 tokens per second. The X-MP was in the same ballpark, with the added benefit of native vector processing (not just some extension bolted onto a scalar CPU) which performs very well on matmul.
https://www.tomshardware.com/tech-industry/artificial-intell...
John Carmack was also hinting at this: we might have had AI decades earlier, obviously not large GPT-4 models but useful language reasoning at a small scale was possible. The hardware wasn't that far off. The software and incentives were.
> EXO Labs showed otherwise by getting a 300K-parameter LLM to run on a Pentium II with only 128 MB of RAM at about 50 tokens per second
50 token/s is completely useless if the tokens themselves are useless. Just look at the "story" generated by the model presented in your link: Each individual sentence is somewhat grammatically correct, but they have next to nothing to do with each other, they make absolutely no sense. Take this, for example:
"I lost my broken broke in my cold rock. It is okay, you can't."
Good luck tuning this for turn-based conversations, let alone for solving any practical task. This model is so restricted that you couldn't even benchmark its performance, because it wouldn't be able to follow the simplest of instructions.