Comment by swores

Comment by swores 6 months ago

Can anyone recommend an open source option that would allow training on a custom voice (my own, so I'd be able to record as many snippets as it needed to train on) to allow me to use it for TTS generation without sharing it off my machine?

Edit: I'll wait to see if any recommendations get made here, if not I might give this one a go: https://github.com/coqui-ai/TTS

hm64 6 months ago

Coqui is great, but in practice, I found Piper easier to set up, train, and deploy as an ONNX file. Big thanks to the Sherpa development team for their helpful resources: https://k2-fsa.github.io/sherpa/onnx/tts/piper.html and to the Rhasspy team for their training guide: https://github.com/rhasspy/piper/blob/master/TRAINING.md.

I also found DEMUCS + Whisper + pydub to be a super helpful combo for creating quality datasets.

Reply View 0 replies

phrotoma 6 months ago

https://github.com/DrewThomasson/ebook2audiobook

Reply View 0 replies

drewbitt 6 months ago

There is a fork here https://github.com/idiap/coqui-ai-TTS 'coqui-tts'

Though according to the TTS leaderboard, Fish Speech https://github.com/fishaudio/fish-speech and Kokoro are higher.

https://huggingface.co/hexgrad/Kokoro-82M

https://huggingface.co/fishaudio/fish-speech-1.5

Reply View 1 reply

xnx 6 months ago

AFAIK Kokoro can't be fine tuned

Reply View | 0 replies

numpad0 6 months ago

I think you can probably generate TTS audio by classical means, and voice2voice that audio through RVC or Beatrice V2. Haven't looked into it in a while but Beatrice is apparently super fast and CPU only.

Reply View 0 replies

eamag 6 months ago

F5-TTS, I wrote a post about it https://eamag.me/2025/Voice-Cloning

Reply View 0 replies

jsemrau 6 months ago

I wrote this a while ago about xTTSv2 mixed with Nvidia's Nemo. Maybe it kicks off your journey.