Comment by krick
Comment by krick 2 days ago
Does anyone know solid (not SaaS, obviously) solution for scraping these days? It's getting pretty hard to get around some pretty harmless cases (like bulk-downloading MY OWN gpx tracks from some fucking fitness-watch servers), with all these js tricks, countless redirects, cloudflare and so on. Even if you already have the cookies, getting non-403 response to any request is very much not trivial. I feel like it's time to upgrade my usual approach of python requests+libxml, but I don't know if there is a library/tool that solves some of the problems for you.
- launch chrome with loading of specified data dir.
- connect to it remotely
- ghost cursor and friends
- save cookies and friends to data dir
- run from residential ip
- if get served captcha or cloudflare, direct to solver and to then route back.
- mobile ip if possible
…can’t go into anymore specifics than that
…I forget the site right now, but there a guy that gives a good rundown of this stuff. I’ll see id I can find it.