Comment by jnmandal

Comment by jnmandal 5 days ago

Looks like a really cool project. Do you have any opinions on which transcription models are the best, from a quality perspective? I have heard a lot of mixed opinions on this. Curious what you've found in your development process?

braden-w 5 days ago

I'm a huge fan of using Whisper hosted on Groq since the transcription is near instantaneous. ElevenLabs' Scribe model is also particularly great with accuracy, and I use it for high-quality transcriptions or manually upload files to their API to get diarization and timestamps (https://elevenlabs.io/app/speech-to-text). That being said, I'm not the biggest expert on models. In my day-to-day workflow, I usually swap between Whisper C++ for local transcription or Groq if I want the best balance of speed/performance, unless I'm working on something particularly sensitive.

Reply View 1 reply

jnmandal 4 days ago

Nice. Yeah, we are dogfooding some systems I built in my household. We use whisper.cpp and I haven't had any issues. I get told frequently I should be using eleven labs but I just have been too lazy to build a benchmark that would help me decide

Reply View | 0 replies