Comment by devnen

This is really impressive work and the dialogue quality is fantastic.

For anyone wanting a quick way to spin this up locally with a web UI and API access, I put together a FastAPI server wrapper around the model: https://github.com/devnen/Dia-TTS-Server

The setup is just a standard pip install -r requirements.txt (works on Linux/Windows). It pulls the model from HF automatically – defaulting to the faster BF16 safetensors (ttj/dia-1.6b-safetensors), but that's configurable in the .env. You get an OpenAI-compatible API endpoint (/v1/audio/speech) for easy integration, plus a custom one (/tts) to control all the Dia parameters. The web UI gives you a simple way to type text, adjust sliders, and test voice cloning. It'll use your CUDA GPU if you have one configured, otherwise, it runs on the CPU.

Might be a useful starting point or testing tool for someone. Feedback is welcome!