Comment by syntaxing

I use it directly with Claude code [1]. Honestly, it just makes sense IMO to host your own model when you have your own company. You can try something like openrouter for now and then setup your own hardware. Since most of these models are MoE, you dont have to load everything in VRAM. A mixture of a 5090 + EPYC CPU + 256GB of DDR5 RAM can go a very long way. You can unload most of the expert layers onto CPU and leave the rest on GPU. As usual Unsloth has a great page about it [2]

[1] https://docs.z.ai/scenario-example/develop-tools/claude [2] https://docs.unsloth.ai/models/glm-4.6-how-to-run-locally