Comment by bangaladore

Comment by bangaladore a year ago

Here's a thought: someone "trustworthy" should maintain a Chrome extension or Tapermonkey script that automatically scrapes data from various social media sites in a fully anonymized fashion. As people browse Twitter, Reddit, or XYZ, the posts/comments are sent to some aggregation system. It might be against TOS, but certainly far less than scraping, and you couldn't tell, as it's the user driving what gets scraped.

I don't use Twitter often, but I'd run something like that if there were strong anonymity guarantees. Seems like a win-win for everyone.

Does anything like this exist today?

yojo a year ago

Reminds me a little of RECAP (https://free.law/recap), an automated scraper/saver/sharer for PACER (the US court electronic records system).

Obviously the content is very different, but the technology is basically doing what you’re talking about, minus anonymizing the data.

Reply View 3 replies

qingcharles a year ago

I use RECAP constantly and try to always upload everything I grab from PACER.
I think this administration finally passed a law to make PACER free, which is good.

Reply View | 1 reply
- toomuchtodo a year ago
  
  https://www.reuters.com/legal/government/fed-judiciary-says-... | https://archive.today/2wddY
  https://fingfx.thomsonreuters.com/gfx/legaldocs/egvbkwemjpq/...
  
  Reply View | 0 replies
bangaladore a year ago

Good find. Yeah, pretty much exactly like this.

Reply View | 0 replies

kllrnohj a year ago

Not quite the same, but https://returnyoutubedislike.com/ is in a similar vein of an extension crowd-sourcing a return of data.

Reply View 0 replies

michaelscrypt a year ago

George Hotz aka geohot proposed this some time ago [1] and he called it a vampire attack because it syphons off users from Twitter. [1] https://geohot.github.io/blog/jekyll/update/2022/04/16/vampi...

Reply View 0 replies

nicbou a year ago

I was thinking of that exact strategy for two scraping problems I have. It would be a good way to gently scrape Berlin.de for appointments, and Immobilienscout24 for new flats.

From what I can tell, it would work fine so long as the user is actively looking at those pages.

Reply View 3 replies

bangaladore a year ago

Yeah, something generic to work for any use case would be nice, but privacy becomes more difficult as you need to tailor the situation to each site to maintain privacy (i.e. only pull the information on the apartment/flat listing, or public tweet or reddit comment, etc...)

Reply View | 2 replies
- mewpmewp2 a year ago
  
  I wonder if those websites as a response might start adding fingerprint code in the source to indicate whose user session it was.
  
  Reply View | 1 reply
  
  nicbou a year ago
  
  The city of Berlin wouldn't care. The housing website would probably start with lawyers.
  
  Reply View | 0 replies

[removed] a year ago

[deleted]

Reply View 0 replies