Comment by Aurornis
> Using `--flash-attn --cache-type-k q8_0 --cache-type-v q8_0`
I think you meant ‘--cache-type-v q4_0’
I would also like an explanation for what’s different in this patch compared to the standard command line arguments.
> Using `--flash-attn --cache-type-k q8_0 --cache-type-v q8_0`
I think you meant ‘--cache-type-v q4_0’
I would also like an explanation for what’s different in this patch compared to the standard command line arguments.