Comment by sgt

How do you actually go about training specialized speech models? Let's say you have a language dialect you want to specialize on, or a pidgin English from West Africa, or a regular language but with highly specialized terminologies being used.

Just curious - would you need insane HW infrastructure to begin with, or hosted/managed. And what tooling is preferred by the industry for the "training"?