Comment by genewitch
I've anecdotally tested translations by ripping the video with subtitles and having whisper subtitle it, and also asking several AI to translate the .srt or .vtt file (subtotext I think does this conversion if you don't wanna waste tokens on the metadata)
Whisper large-v3, the largest model I have, is pretty good, getting nearly identical translations to chatgpt or whatever, Google's default speech to text. The fun stuff is when you ask for text to text translations from LLMs.
I did a real small writeup with an example but I don't have a place to publish nor am I really looking for one.
I used whisper to transcribe nearly every "episode" of the Love Line syndicated radio show from 1997-2007 or so. It took, iirc, several days. I use it to grep the audio, as it were. I intend to do the same with my DVDs and such, just so I never have to Google "what movie / tv show is that line from?" I also have a lot of art bell shows, and a few others to transcribe.
> I used whisper to transcribe nearly every "episode" of the Love Line syndicated radio show from 1997-2007 or so.
Yes - second this. I found 'Whisper' great for that type of scenario as well.
A local monastery had about 200 audio talks (mp3). Whisper converted them all to text and GPT did a small 'smoothing' of the output to make it readable. It was about half a million words and only took a few hours.
The monks were delighted - they can distribute their talks in small pamplets / PDFs now and is extra income for the community.
Years ago as a student I did some audio transcription manually and something similar would have taken ages...