Comment by brookst
But that’s not accurate. There are all sorts of tricks around KV cache where different users will have the same first X bytes because they share system prompts, caching entire inputs / outputs when the context and user data is identical, and more.
Not sure if you were just joking or really believe that, but for other peoples’ sake, it’s wildly wrong.
Really? So the system recognises someone asked the same question and serves the same answer? And who on earth shares the exact same context?
I mean i get the idea but sounds so incredibly rare it would mean absolutely nothing optimisation wise.