Comment by aliljet

Comment by aliljet 3 days ago

1 reply

If the SWE Bench results are to be believed... this looks best in class right now for a local LLM. To be fair, show me the guy who is running this locally...

selfhoster11 3 days ago

It's challenging, but not impossible. With 2-bit quantisation, only about 250-ish gigabytes of RAM is required. It doesn't have to be VRAM either, and you can mix and match GPU+CPU inference.

In addition, some people on /r/localLlama are having success with streaming the weights off SSD storage at 1 token/second, which is about the rate I get for DeepSeek R1.