Comment by giancarlostoro

Comment by giancarlostoro 2 days ago

This is something I've been wondering about myself. What's the "Minimally Viable LLM" that can have simple conversations. Then my next question is, how much can we push it so it can learn from looking up data externally, can we build a tiny model with an insanely larger context window? I have to assume I'm not the only one who has asked or thought of these things.

Ultimately, if you can build an ultra tiny model that can talk and learn on the fly, you've just fully localized a personal assistant like Siri.

andy12_ 2 days ago

This is extremely similar to Karpathy's idea of a "cognitive core" [1]; an extremely small model with near-0 encyclopedic knowledge and basic reasoning and tool-use capabilities.

[1] https://x.com/karpathy/status/1938626382248149433

Reply View 0 replies

fho 2 days ago

You might be interested in RWKV: https://www.rwkv.com/

Not exactly "minimal viable", but a "what if RNNs where good for LLMs" case study.

-> insanely fast on CPUs

Reply View 1 reply

giancarlostoro 2 days ago

My personal idea revolves around "can I run it on a basic smartphone, with whatever the 'floor' for basic smartphones under lets say $300 is for memory (let's pretend RAM prices are normal).
Edit: The fact this runs on a Smartphone means it is highly relevant. My only thing is, how do we give such a model an "unlimited" context window, so it can digest as much as it needs. I know some models know multiple languages, I wouldnt be surprised if sticking to only English would reduce the model size / need for more hardware and make it even smaller / tighter.

Reply View | 0 replies

qingcharles 2 days ago

I think what's amazing to speculate is how we could have had some very basic LLMs in at least the 90s if we'd invented the tech previously. I wonder what the world would be like now if we had?

Reply View 0 replies

Dylan16807 2 days ago

For your first question, the LLM someone built in Minecraft can handle simple conversations with 5 million weights, mostly 8 bits.

I doubt it would be able to make good use of a large context window, though.

Reply View 0 replies