Comment by YmiYugy

Comment by YmiYugy 3 days ago

10 replies

The idea is pretty cool, but it doesn't work super well. 1. I imagine most major news outlets don't have RSS feeds these days. 2. A lot of stuff originates from news agencies, so they don't spread from website to website, but radiate out from the agency. 3. Most of the included sources are pretty small. To draw meaningful conclusions we would need infos like popularity, political leaning, nation of origin, etc. 4. The similarity check doesn't appear to do translation. So when news spreads from one country to another we loose the thread.

Animats 3 days ago

Yes. For example, this story about Ukraine [1] is credited to WNYT as first, but the story itself credits the Associated Press. This problem is worth solving, because it's something search engines should be doing.

[1] https://wnyt.com/ap-top-news/rubio-says-us-ukraine-talks-on-...

  • antiochIst 2 days ago

    yea, what im currently doing is pretty simple check on published at date from the rss feed (with some small validation checks)... but its causing issues bc it can be wrong and mess up everything...

    I think checking source in story is next step...

    • justin66 5 hours ago

      Treating the Associated Press as a special case might be worthwhile. Its stories will appear in hundreds of places, some with a little alteration and some fully intact.

antiochIst 2 days ago

Yea not all major have rss feeds, but it seems like the majority still do.

No translation yet.

I think the biggest problem is im relying on published date from the news source itself too much and its wrong sometimes... not super often, but if 1 out of 100 sources get its wrong then it can steal credit for being source article when its not.

dleeftink 3 days ago

Also, not all information spreads through public channels, and might not even be/become publicly known. But that doesn't mean news refraction based on textual similarity isn't worthwhile to pursue, as it can reveal a lot about the self-organising principles by which the media operate.

andai 3 days ago

>the similarity check doesn't appear to do translation

This surprises me. The system is based on embeddings. AFAIK embeddings cluster the same concept in different languages in roughly the same place? Maybe it depends on the model (or maybe it's not exact and the clustering cutoff loses it).

  • antiochIst 2 days ago

    I'm basically throwing away non english articles for now... I'll pry get them in later, but I want to get english right first before trying to move to other languages...

    The embeddings themselves will (pry) cluster ok in different languages (but I have not tested this yet)

fcarraldo 3 days ago

> I imagine most major news outlets don't have RSS feeds these days

I’m not aware of any that don’t. RSS is alive and well.