Comment by akshayp29

Comment by akshayp29 6 days ago

Pretty cool! Do you think the model would be good at other under-served languages as well? Or is it hypertuned to just these?

zaidqureshi 6 days ago

The model itself can work well for new languages, its just the process of data gathering and maintaining high quality of data is what we have to figure out as we scale across languages.

Currently the model is only given data for these languages so it doesn't know anything else.

Reply View 3 replies

mandeepj 6 days ago

> just the process of data gathering and maintaining high quality of data is what we have to figure out as we scale across languages.
À crawler and data ingestion pipeline will not help with that?

Reply View | 1 reply
- zaidqureshi 6 days ago
  
  Gathering audio data online is not that hard, but getting it accurately labelled is challenging, as the speech understanding systems for those languages aren't there either, so we can't automatically do that
  
  Reply View | 0 replies
akshayp29 6 days ago

Cool - makes sense!

Reply View | 0 replies