Comment by oceanplexian

Comment by oceanplexian 15 days ago

I can’t speak to the Tesla stuff but I run an Epyc 7713 with a single 3090 and creatively splitting the model between GPU/8 channels of DDR4 I can do about 9 tokens per second on a q4 quant.

CamperBob2 15 days ago

Impressive. Is that a distillation, or the real thing?

Reply View 0 replies