Comment by oceanplexian

Comment by oceanplexian 15 days ago

1 reply

I can’t speak to the Tesla stuff but I run an Epyc 7713 with a single 3090 and creatively splitting the model between GPU/8 channels of DDR4 I can do about 9 tokens per second on a q4 quant.