Comment by EagnaIonat
Comment by EagnaIonat 6 hours ago
Tried out the Ollama version and it's insanely fast with really good results for 1.9GB size. Supposed to have a 1M context window, would be interested where the speed goes then.
No Mamba in the Ollama version though.
(I've only just starting running local LLMs so excuse the dumb question).
Would Granite run with llama.cpp and use Mamba?