Comment by supriyo-biswas
Comment by supriyo-biswas 18 hours ago
A major player in this space is apparently looking for people experienced in scraping without using browser automation. My guess is that not running a browser results in using far fewer resources, thus reducing their costs heavily.
Running a headless browser also means that any differences in the headless environment vs. a "headed" one can be discovered, as well as any of your Javascript executing within the page, which significantly makes it difficult to scale your operation.
My experience is that headless browsers use about 100x more RAM, and at least 10x more bandwidth and 10x more processing power, and page loads take about 10x as long time to finish (vs curl). Though these numbers may be a bit low, there are instances you need to add another zero to one or more of them.
There's also considerably more jank with headless browsers, since you typically want to re-use instances to avoid incurring the cost of spawning a new browser for each retrieval.