Comment by krick
Comment by krick 10 months ago
Does anyone know solid (not SaaS, obviously) solution for scraping these days? It's getting pretty hard to get around some pretty harmless cases (like bulk-downloading MY OWN gpx tracks from some fucking fitness-watch servers), with all these js tricks, countless redirects, cloudflare and so on. Even if you already have the cookies, getting non-403 response to any request is very much not trivial. I feel like it's time to upgrade my usual approach of python requests+libxml, but I don't know if there is a library/tool that solves some of the problems for you.
- launch chrome with loading of specified data dir.
- connect to it remotely
- ghost cursor and friends
- save cookies and friends to data dir
- run from residential ip
- if get served captcha or cloudflare, direct to solver and to then route back.
- mobile ip if possible
…can’t go into anymore specifics than that
…I forget the site right now, but there a guy that gives a good rundown of this stuff. I’ll see id I can find it.