Comment by CMay
Haven't really been following the latest in TTS ML, but I expected this to be better or at least as good-bad as the stuff you hear on YouTube. Somehow it sounds worse. It really is jarring to listen to any of these ML voices and can't really stand it. Nope out of every video that uses them and can't tell if YouTube never recommends them to me for that reason, or just because the recommendations around what I watch are just so rarely going to be from some low reputation channel.
Take a moment here for a second though and think about it. Even if these voices got to be really good, indistinguishable almost... would I want to listen to it even then? If it was an NPC's generated voice and generated dialogue in a game to help enrich the world building, maybe in that context. On YouTube or with newscasters? Probably not. Audio books? Think I would still rather have it be a real person, because it's like they're reading a story to me and it feels better if it's coming from someone. There's also the unknown factor, where if it's ML generated it's so sterile that the unknowns are kind of gone.
Think about it like this, in the movie industry we had practical effects that were charming in a way. You could think about the physical things that had to occur to make that happen. Movie magic. Now, everything is so CG it's like the magic is gone. Even though you know people put serious hard work into it, there's a kind of inauthenticity and just lack of relevance to the real world that takes something away from it.
It's like a real magician has interesting tricks, while an artificial magician is most likely just a liar.
Still, I grant that it makes some cool things possible and there is potential if things are done right. Some positive mixture of real humans and machine generated stuff so it isn't devoid of anything connected to real life effort.