Comment by bangaladore

Comment by bangaladore 2 days ago

11 replies

Here's a thought: someone "trustworthy" should maintain a Chrome extension or Tapermonkey script that automatically scrapes data from various social media sites in a fully anonymized fashion. As people browse Twitter, Reddit, or XYZ, the posts/comments are sent to some aggregation system. It might be against TOS, but certainly far less than scraping, and you couldn't tell, as it's the user driving what gets scraped.

I don't use Twitter often, but I'd run something like that if there were strong anonymity guarantees. Seems like a win-win for everyone.

Does anything like this exist today?

yojo 2 days ago

Reminds me a little of RECAP (https://free.law/recap), an automated scraper/saver/sharer for PACER (the US court electronic records system).

Obviously the content is very different, but the technology is basically doing what you’re talking about, minus anonymizing the data.

nicbou 2 days ago

I was thinking of that exact strategy for two scraping problems I have. It would be a good way to gently scrape Berlin.de for appointments, and Immobilienscout24 for new flats.

From what I can tell, it would work fine so long as the user is actively looking at those pages.

  • bangaladore 2 days ago

    Yeah, something generic to work for any use case would be nice, but privacy becomes more difficult as you need to tailor the situation to each site to maintain privacy (i.e. only pull the information on the apartment/flat listing, or public tweet or reddit comment, etc...)

    • mewpmewp2 2 days ago

      I wonder if those websites as a response might start adding fingerprint code in the source to indicate whose user session it was.

      • nicbou a day ago

        The city of Berlin wouldn't care. The housing website would probably start with lawyers.