Comment by Flere-Imsaho
Comment by Flere-Imsaho 13 hours ago
(I've only just starting running local LLMs so excuse the dumb question).
Would Granite run with llama.cpp and use Mamba?
Comment by Flere-Imsaho 13 hours ago
(I've only just starting running local LLMs so excuse the dumb question).
Would Granite run with llama.cpp and use Mamba?
> Last I checked Ollama inference is based on llama.cpp
Yes and no. They've written their own "engine" using GGML libraries directly, but fall back to llama.cpp for models the new engine doesn't yet support.
Last I checked Ollama inference is based on llama.cpp so either Ollama has not caught up yet, or the answer is no.
EDIT: Looks like Granite 4 hybrid architecture support was added to llama.cpp back in May: https://github.com/ggml-org/llama.cpp/pull/13550