Comment by Havoc
Sounds really good & human! Got a fair bit of unexpected artifacts though. e.g. 3 seconds hissing noise before dialogue. And music in background when I added (happy) in an attempt to control tone. Also don't understand how to control the S1 and S2 speakers...is it just random based on temp?
> TODO Docker support
Got this adapted pretty easily. Just latest nvidia cuda container, throw python and modules on it and change server to serve on 0.0.0.0. Does mean it pulls the model every time on startup though which isn't ideal
Thank you for the kind words! Dia wasn’t fine tuned on certain speaker, so you will get random voices every time you run it, unless you add a prompt / fix the seed.
The outputs are a bit unstable, might need to add cleaner training data and run longer training sessions. Hopefully we can do something like OAI Whisper and update with better performing checkpoints!