Comment by xdennis
Among other things, what I don't like is the hallucinated stress. Take the classic example of:
> I never said she stole my money
It can have 7 different meanings based on which word you stress out.
The new AI voices sound very natural at a shallow level, but overall pronounce things in odd ways. Not quite wrong, but subtly unnatural which introduces some cognitive load.
Old TTS systems with their monotonic voices are less confusing, but sound very robotic.
erroneous or inappropriate ≠ hallucinated