Comment by pjc50

According to the Common Voice 15 graph on OpenAI's github repository, Albanian is the single worst performance you could have had: https://github.com/openai/whisper

But for what it's worth, I tried putting the YouTube video of Tom Scott presenting at the Royal Institute into the model, and even then the results were only "OK" rather than "good". When even a professional presenter and professional sound recording in a quiet environment has errors, the model is not really good enough to bother with.