Comment by YmiYugy

Comment by YmiYugy 3 days ago

The idea is pretty cool, but it doesn't work super well. 1. I imagine most major news outlets don't have RSS feeds these days. 2. A lot of stuff originates from news agencies, so they don't spread from website to website, but radiate out from the agency. 3. Most of the included sources are pretty small. To draw meaningful conclusions we would need infos like popularity, political leaning, nation of origin, etc. 4. The similarity check doesn't appear to do translation. So when news spreads from one country to another we loose the thread.

Animats 3 days ago

Yes. For example, this story about Ukraine [1] is credited to WNYT as first, but the story itself credits the Associated Press. This problem is worth solving, because it's something search engines should be doing.

[1] https://wnyt.com/ap-top-news/rubio-says-us-ukraine-talks-on-...

Reply View 2 replies

antiochIst 2 days ago

yea, what im currently doing is pretty simple check on published at date from the rss feed (with some small validation checks)... but its causing issues bc it can be wrong and mess up everything...
I think checking source in story is next step...

Reply View | 1 reply
- justin66 5 hours ago
  
  Treating the Associated Press as a special case might be worthwhile. Its stories will appear in hundreds of places, some with a little alteration and some fully intact.
  
  Reply View | 0 replies

badestrand 3 days ago

The devil really is always in the details.

Reply View 1 reply

Joel_Mckay 3 days ago

Being consistent in message framing even when its not in the best interest of the public should not reasonably be considered "news" =3
https://en.wikipedia.org/wiki/Sinclair_Broadcast_Group
https://www.youtube.com/watch?v=GvtNyOzGogc

Reply View | 0 replies

antiochIst 2 days ago

Yea not all major have rss feeds, but it seems like the majority still do.

No translation yet.

I think the biggest problem is im relying on published date from the news source itself too much and its wrong sometimes... not super often, but if 1 out of 100 sources get its wrong then it can steal credit for being source article when its not.

Reply View 0 replies

dleeftink 3 days ago

Also, not all information spreads through public channels, and might not even be/become publicly known. But that doesn't mean news refraction based on textual similarity isn't worthwhile to pursue, as it can reveal a lot about the self-organising principles by which the media operate.

Reply View 0 replies

andai 3 days ago

>the similarity check doesn't appear to do translation

This surprises me. The system is based on embeddings. AFAIK embeddings cluster the same concept in different languages in roughly the same place? Maybe it depends on the model (or maybe it's not exact and the clustering cutoff loses it).

Reply View 1 reply

antiochIst 2 days ago

I'm basically throwing away non english articles for now... I'll pry get them in later, but I want to get english right first before trying to move to other languages...
The embeddings themselves will (pry) cluster ok in different languages (but I have not tested this yet)

Reply View | 0 replies

fcarraldo 3 days ago

> I imagine most major news outlets don't have RSS feeds these days

I’m not aware of any that don’t. RSS is alive and well.

Reply View 0 replies