HN Top New Show Ask Jobs

settings

Theme

Hand Mode

Feed

Comment by ThatPlayer

Comment by ThatPlayer 3 days ago

0 replies

View on Hacker News

With MoE models like gpt-oss, you can run some layers on the CPU (and some on GPU): https://github.com/ggml-org/llama.cpp/discussions/15396

Mentions 120b is runnable on 8GB VRAM too: "Note that even with just 8GB of VRAM, we can adjust the CPU layers so that we can run the large 120B model too"