Comment by menaerus
~20 tokens/second is actually pretty good. I see he's using the q5 version of the model. I wonder how it scales with the larger contexts. And the same guy published the video today with the new 3.2 version: https://www.youtube.com/watch?v=b6RgBIROK5o