Comment by renegat0x0

Comment by renegat0x0 10 hours ago

0 replies

What I have seen it is hard to tell what "serious scrapers" use. They use many things. Some use this, some not. This is what I have learned reading webscraping on reddit. Nobody speaks things like that out loud.

There are many tools, see links below

Personally I think that running selenium can be a bottle neck, as it does not play nice, sometimes processes break, even system sometimes requires restart because of things blocked, can be memory hog, etc. etc. That is my experience.

To be able to scale I think you have to have your own implementation. Serious scrapers complain about people using selenium, or derivatives as noobs, who will come back asking why page X does not work in scraping mechanisms.

https://github.com/lexiforest/curl_cffi

https://github.com/encode/httpx

https://github.com/scrapy/scrapy

https://github.com/apify/crawlee