Comment by giancarlostoro

Comment by giancarlostoro 2 days ago

5 replies

This is something I've been wondering about myself. What's the "Minimally Viable LLM" that can have simple conversations. Then my next question is, how much can we push it so it can learn from looking up data externally, can we build a tiny model with an insanely larger context window? I have to assume I'm not the only one who has asked or thought of these things.

Ultimately, if you can build an ultra tiny model that can talk and learn on the fly, you've just fully localized a personal assistant like Siri.

fho 2 days ago

You might be interested in RWKV: https://www.rwkv.com/

Not exactly "minimal viable", but a "what if RNNs where good for LLMs" case study.

-> insanely fast on CPUs

  • giancarlostoro 2 days ago

    My personal idea revolves around "can I run it on a basic smartphone, with whatever the 'floor' for basic smartphones under lets say $300 is for memory (let's pretend RAM prices are normal).

    Edit: The fact this runs on a Smartphone means it is highly relevant. My only thing is, how do we give such a model an "unlimited" context window, so it can digest as much as it needs. I know some models know multiple languages, I wouldnt be surprised if sticking to only English would reduce the model size / need for more hardware and make it even smaller / tighter.

qingcharles 2 days ago

I think what's amazing to speculate is how we could have had some very basic LLMs in at least the 90s if we'd invented the tech previously. I wonder what the world would be like now if we had?

Dylan16807 2 days ago

For your first question, the LLM someone built in Minecraft can handle simple conversations with 5 million weights, mostly 8 bits.

I doubt it would be able to make good use of a large context window, though.