Comment by onli

Comment by onli 3 months ago

Fact of the matter is that I have a Radeon RX 6600, which I can't use with ollama. First, there is no ROCm at all in my distros repository - it doesn't compile reliably and needs too many ressources. Then, when compiling it manually, it turns out that ROCm doesn't even support the card in the first place.

I'm aware that 8GB Vram are not enough for most such workloads. But no support at all? That's ridiculous. Let me use the card and fall back to system memory for all I care.

Nvidia, as much as I hate their usually awfully insufficient linux support, has no such restrictions for any of their modern cards, as far as I'm aware.

smerrill 3 months ago

You should be able to use ollama’s Vulkan backend and in my experience the speed will be the same. (I just spent a bunch of time putting Linux on my 2025 ASUS ROG Flow Z13 to use ROCm, only to see the exact same performance as Vulkan.)

Reply View 1 reply

onli 3 months ago

That would mean switching to https://github.com/whyvl/ollama-vulkan? I see no backend selection in ollama, nor anything in the faq.

Reply View | 0 replies

pja 3 months ago

My recent experience has been that the Vulkan support in llama.cpp is pretty good. It may lag behind Cuda / Metal for the bleeding edge models if they need a new operator.

Try it out! Benchmarks here: https://github.com/ggml-org/llama.cpp/discussions/10879

(ollama doesn’t support vulkan for some weird reason. I guess they never pulled the code from llama.cpp)

Reply View 1 reply

onli 3 months ago

Thanks, I might indeed give this a test!

Reply View | 0 replies

JonChesterfield 3 months ago

I know it's in Debian (and thus Ubuntu), Arch, Gentoo. Pretty sure RedHat, Suse, Nix have it. What distro are you using?

ROCm is a train wreck to compile from source but can be done with sufficient bloodymindedness.

The RX6600 is a gfx1032. I used a gfx1010 for ages with this stuff. Seems likely it'll run for you if you ignore the "supported cards" list, which really should be renamed to something that antagonises people less.

Reply View 1 reply

onli 3 months ago

I'm using void. https://github.com/void-linux/void-packages/issues/26415 gives an insight, though it doesn't explain the whole problem, if I remember correctly what maintainers wrote elsewhere.
> ROCm is a train wreck to compile from source but can be done with sufficient bloodymindedness.
Yeah, I did that myself. Not impossible, just a bit annoying and time consuming. The issue I ran into then was exactly picking the gpu model (incredible that this is even necessary) and not having the gfx1032 available, see https://github.com/void-linux/void-packages/issues/26415#iss... for what I was following back then. I tried to edit the configuration for the gfx1032 anyway, but it did not succeed.
Side note: Already having to know which card corresponds to which code is annoying, and completely unnecessary. They could also just map the consumer facing name. But that would be too easy I assume.

Reply View | 0 replies

yjftsjthsd-h 3 months ago

> I'm aware that 8GB Vram are not enough for most such workloads. But no support at all? That's ridiculous. Let me use the card and fall back to system memory for all I care.

> Nvidia, as much as I hate their usually awfully insufficient linux support, has no such restrictions for any of their modern cards, as far as I'm aware.

In fact, I regularly run llamafile (and sometimes ollama) on an nvidia dGPU in a laptop, with 4GB of VRAM, and it works fine (ish... I mostly do the thing where some layers are on the GPU and some are CPU; it's still faster than pure CPU so whatever).

Reply View 0 replies