Comment by eru
You should start a company and try your strategy. I hope it works! (Though I am doubtful.)
In any case, models are useful, even when they don't hit these efficiency targets you are projecting. Just like cars are useful, even when they are bigger than a pack of cards.
If someone wants to fund me, Ill gladly work on this. There is no money in this though, because selling cloud service is much more profitable.
Its also not a matter of it working or not. It already works. Take a small model that fits on a GPU with a large context window, like Gemma 27b or smaller ones, give it a whole bunch of context on the topic, and ask it questions and it will generate very accurate results based on the context.
So instead of encoding everything into the model itself, you can just take training data, store it in vector DBs, and train a model to retrieve that data based on query, and then the rest of it is just training context extraction.