Comment by adwn
> a small LLM, say, with 200–300K weights
A "small Large Language Model", you say? So a "Language Model"? ;-)
> Such an LLM could have handled grammar and code autocompletion, basic linting, or documentation queries and summarization.
No, not even close. You're off by 3 orders of magnitude if you want even the most basic text understanding, 4 OOM if you want anything slightly more complex (like code autocompletion), and 5–6 OOM for good speech recognition and generation. Hardware was very much a limiting factor.
I would have thought the same, but EXO Labs showed otherwise by getting a 300K-parameter LLM to run on a Pentium II with only 128 MB of RAM at about 50 tokens per second. The X-MP was in the same ballpark, with the added benefit of native vector processing (not just some extension bolted onto a scalar CPU) which performs very well on matmul.
https://www.tomshardware.com/tech-industry/artificial-intell...
John Carmack was also hinting at this: we might have had AI decades earlier, obviously not large GPT-4 models but useful language reasoning at a small scale was possible. The hardware wasn't that far off. The software and incentives were.
https://x.com/ID_AA_Carmack/status/1911872001507016826