Comment by kmacdough
> Will it just allow me to run let’s say a model with a 2048 token context window with a 4-6k context window
It reduces the memory footprint of a particular model. You can do what you like with that. Extending the context window post-training isn't trivial, so unless you know what you're doing, you'd be better off finding a model trained on a larger context window.
Many uses for local models like working offline or privacy/security. Most folks, though, are using it to experiment with tweaking models.
Will that make the model run/feel faster?
I can run models with 30-40b parameters on my computer, but they feel a lot slower than the 1-7b ones
So would this make the 30-40b parameter modes run faster? Or at least “feel” faster?