Comment by segmondy

Comment by segmondy 2 days ago

1 reply

you can do this already with -ctk and -ctv, why would anyone need this?

-ctk, --cache-type-k TYPE KV cache data type for K

                                        allowed values: f32, f16, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1

                                        (default: f16)     (env: LLAMA_ARG_CACHE_TYPE_K)
-ctv, --cache-type-v TYPE KV cache data type for V

                                        allowed values: f32, f16, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1

                                        (default: f16)