Comment by andai

Comment by andai 3 days ago

1 reply

>the similarity check doesn't appear to do translation

This surprises me. The system is based on embeddings. AFAIK embeddings cluster the same concept in different languages in roughly the same place? Maybe it depends on the model (or maybe it's not exact and the clustering cutoff loses it).

antiochIst 2 days ago

I'm basically throwing away non english articles for now... I'll pry get them in later, but I want to get english right first before trying to move to other languages...

The embeddings themselves will (pry) cluster ok in different languages (but I have not tested this yet)