Comment by cess11

Comment by cess11 4 days ago

10 replies

I would for sure not want this for fiction, it's too obvious that the voice has no understanding whatsoever of the text, but it's probably pretty nice for converting short news texts or notifications to audio.

vanderZwan 4 days ago

Your point is a valid one, but I want to add to it that it is also a matter of expectations and how one listens.

Years ago, when I was dating someone who spoke Russian as one of her native languages, we had to do a funny compromise when watching films together with her parents: they didn't speak a word of English, so we'd use the Russian dub with English subtitles.

I noticed that the Russian dub was just one man reading a translation in a flat voice over what was happening on the screen, no attempts at voice acting or matching the emotions. Usually the dub would have a split second delay to the actual lines, so you'd still hear the original voices for a moment (and also a little bit in the background).

At first I found it very jarring, but they explained that this flatness was a feature. You'll quickly learn to "filter out" the voice while still hearing the translation, and the faint presence of the original voices was enough to bring the emotional flavor back. The lack of voice acting helped with the filtering.

This turned out to apply to me as well, even though I don't speak Russian! My brain subconsciously would filter out the dub, and extract most of the original performance through the subtitles and faint presence of the original voices. Obviously the original version would have been a better experience for me, but it was still very enjoyable.

Of course a generated audiobook is not a dub, as there is no "original voice" to extract an emotional performance from. But some listeners might still be able do something similar. The lack of understanding in the generated voice and its predictable monotony might allow them to filter out everything but the literal text, and then fill it in with their own emotional interpretations. Still not as great as having proper story teller who does understand the text and knows how to deliver dramatic lines, but perhaps not as bad as expected either.

  • arafalov 4 days ago

    Here is the rest of that story.

    When the foreign movies started to filter into the Soviet Union's illegal movie theatres, you would get 3 or 4 movies playing at once in one room. There would be a TV in each corner of the room and 4 or 5 rows of plastic chairs in front of it in an arch.

    ALL of the movies were being revoiced by the same person. So, if you were sitting in the back of the 5th row, you were potentially getting the sound from an action movie, a comedy, a horror movie and a romance at the same time. In the same voice.

    You learned to filter really well. So, if that's what they were trained on, watching a single movie must have been very relaxing.

    • vanderZwan 3 days ago

      Looking at the modern internet experience it sounds like the Soviet Union's illegial movie theatres were ahead of their time!

  • aleksiy123 4 days ago

    Watching these as russian/english bilingual is very painful, tho I grew up in western world so maybe I'm just not used to it.

    To add on a slight tangent. Many books/audiobooks just don't exist in other languages at all. So even getting some monotone is a lot better than getting nothing.

    I think this is where these models really shine. Cheaply creating cross language media and unlocking the knowledge/media to underprivileged parts of the world.

    • vanderZwan 4 days ago

      > Watching these as russian/english bilingual is very painful, tho I grew up in western world so maybe I'm just not used to it.

      I figured that their opinion probably wasn't universal, hahaha.

      And yes, it's at the very least a win for accessibility

  • em-bee 4 days ago

    indeed, audio books come in many forms, some are rather flat, and some include different voices, even by different speakers, or include a few voiced sound effects, laughing, crying, singing, etc. TTS is extra flat, but if the quality is good otherwise then it is like reading with my ears, and i add the emotions myself.

  • cess11 4 days ago

    It's not a "point", I didn't make an argument.

    I dislike german and russian style dubs as well, I'd rather learn a bit of the original language.

calgoo 4 days ago

Audible has thousands of books available "for free" with their membership that are all AI generated. I was the same in the start, but after listening to a few, it really comes down to the voice used. I spent 8h on a plane listening to 1 book, and there was maybe 5 occasions where i had an issue with the voice; and i think all where just "AI weirdness", similar to chat LLMs messing up simple sentence structure or image generating LLMs adding an extra finger.

  • arafalov 4 days ago

    The one I tried, had a lot of issues. It was a music theory book and it did not know how to pronounce C# (it kept saying C 'hash'). It also referred to, but did not read out the diagrams, or tables.

    So, it was not just the voice, but the quality control pipeline that was missing as well.

    Maybe it mostly works for old plain text books, but if nobody is checking.....

  • cess11 4 days ago

    I don't think dominant suppliers like Audible should exist so that matters little to me.