Comment by Havoc
Sounds really good & human! Got a fair bit of unexpected artifacts though. e.g. 3 seconds hissing noise before dialogue. And music in background when I added (happy) in an attempt to control tone. Also don't understand how to control the S1 and S2 speakers...is it just random based on temp?
> TODO Docker support
Got this adapted pretty easily. Just latest nvidia cuda container, throw python and modules on it and change server to serve on 0.0.0.0. Does mean it pulls the model every time on startup though which isn't ideal
> Also don't understand how to control the S1 and S2 speakers...
Do a clip with the speakers you want as the audio prompt, add the text of that clip (with speaker tags) of the clip at the beginning of your text prompt, and it clones the voices from your audio prompt for the output.