Comment by gen3

Comment by gen3 6 days ago

Has anyone seen info on how this works? "It’s not revolutionary" seems like an understatement when you can do better then DeepL and support more languages then google?

a2128 6 days ago

It just uses LLMs, I've had it output a refusal in the target language by entering stuff about nukes in the input

Reply View 0 replies

kouteiheika 6 days ago

I'm pretty sure it's just a finetuned LLM.

I have some experience experimenting in this space; it's not actually that hard to build a model which surpasses DeepL, and the wide language support is just a consequence of using an LLM trained on the whole Internet, so the model picks up the ability to use a bunch of languages.

Reply View 1 reply

ilaksh 6 days ago

I'm almost sure they did not find tune an LLM. They are using existing LLMs because fine tuning to best the SOTA models at translation is impractical unless you target very niche languages and even then it would be very hard to get a better dataset than what is already used for those models.
Probably all they are doing is like switching between some Qwen model (for Chinese) and large Llama or maybe OpenAI or Gemini.
So they just have a step (maybe also an LLM) to guess which model is best or needed for the input. Maybe something really short and simple just goes to a smaller simpler less expensive model.

Reply View | 0 replies

freediver 6 days ago

It uses a combination of LLMs, selecting the best output. (from the blog post)

Reply View 1 reply

gen3 6 days ago

Ah, I missed that. Thank you!

Reply View | 0 replies