Comment by elorant
More important for me is how you identify news sites, let alone 200k of them. Is there any online source that lists them? Or do you cherry pick them one by one?
More important for me is how you identify news sites, let alone 200k of them. Is there any online source that lists them? Or do you cherry pick them one by one?
And to add to the above, is there a list of the websites you use and any information on sampling methodology? Is it perfectly random or weighted? Do you trust the timestamp from an RSS feed?
It's a whole thing... I run a project called websitelaunches, so I have index of basically the whole internet (500M+) sites. I took the top ~200k news related sites from there that had rss feed.