Comment by jokethrowaway
Comment by jokethrowaway 2 days ago
Looking forward to try. My current go-to solution is E5-F2 (great cloning, decent delivery, ok audio quality, a lot of incoherence here and there forcing you to do multiple generations).
I've just been massively disappointed by Sesame's CSM: on their gradio on the website it was generating flawless dialogs with amazing voice cloning. When running it local the voice cloning performance is awful.
Thanks for the interest! We also enjoyed using E5-F2 :) You can try it now on HF Spaces: https://huggingface.co/spaces/nari-labs/Dia-1.6B