Comment by marginalia_nu

Comment by marginalia_nu 17 hours ago

4 replies

My experience is that headless browsers use about 100x more RAM, and at least 10x more bandwidth and 10x more processing power, and page loads take about 10x as long time to finish (vs curl). Though these numbers may be a bit low, there are instances you need to add another zero to one or more of them.

There's also considerably more jank with headless browsers, since you typically want to re-use instances to avoid incurring the cost of spawning a new browser for each retrieval.

lozenge 17 hours ago

Is it possible to pause a VM just after the browser has started up? Then map it as copy-on-write memory and spin up many VMs from that "image".

  • supriyo-biswas 17 hours ago

    Your comment is interesting and there are some people doing work on this although not specific to browser automation, e.g. AWS Lambda SnapStart is just them trying to boot your Java Lambda code and freeze the Firecracker MicroVM's snapshot and then starting other Lambda functions from there.

    However, even with a VM approach, you tend to lose out on the fact that you can make 100s or 1000s of requests on a small box (~512 MB) every second if it's just restricted to HTTP(s). However, once you're booting up a headless browser, you're probably restricted to no more than loading 3-4 pages per second.

  • marginalia_nu 14 hours ago

    ... but then you have even larger overhead, as well as the added layer of complexity from managing VMs on top of headless browsers.

palmfacehn 17 hours ago

On the other hand you need to be able to do basics like match the headers, sometimes request irrelevant resources, handle malformed documents, catch changing form parameters, and other gotchas. Many would just copy the request from the browser console.