Comment by ondra

Comment by ondra 6 months ago

3 replies

View on Hacker News

Is this any different from using --cache-type-k and --cache-type-v?

Aurornis 6 months ago

No, it appears to be an LLM-generated attempt to gain GitHub stars.

See my other comment for a sampling of the other oddities in the repo.

Reply View 0 replies

landl0rd 6 months ago

I'm guessing it's a bit different since MLX/MPS doesn't have native 4-bit support (or even 8 if I remember correctly?) It didn't launch with bf16 support even. So I think the lowest you could go on the old type_k/v solution and apple GPUs was 16-bit f16/bf16 but not a llama.cpp internals expert so maybe wrong?

Reply View 0 replies

azinman2 6 months ago

That’s what I want to know!

Reply View 0 replies