Comment by kouteiheika
Comment by kouteiheika 6 days ago
I'm pretty sure it's just a finetuned LLM.
I have some experience experimenting in this space; it's not actually that hard to build a model which surpasses DeepL, and the wide language support is just a consequence of using an LLM trained on the whole Internet, so the model picks up the ability to use a bunch of languages.
I'm almost sure they did not find tune an LLM. They are using existing LLMs because fine tuning to best the SOTA models at translation is impractical unless you target very niche languages and even then it would be very hard to get a better dataset than what is already used for those models.
Probably all they are doing is like switching between some Qwen model (for Chinese) and large Llama or maybe OpenAI or Gemini.
So they just have a step (maybe also an LLM) to guess which model is best or needed for the input. Maybe something really short and simple just goes to a smaller simpler less expensive model.