Comment by kouteiheika

I'm almost sure they did not find tune an LLM. They are using existing LLMs because fine tuning to best the SOTA models at translation is impractical unless you target very niche languages and even then it would be very hard to get a better dataset than what is already used for those models.

Probably all they are doing is like switching between some Qwen model (for Chinese) and large Llama or maybe OpenAI or Gemini.

So they just have a step (maybe also an LLM) to guess which model is best or needed for the input. Maybe something really short and simple just goes to a smaller simpler less expensive model.