Comment by rob_c
The argument doesn't quite hold. The mass scraping (for training) is almost never doing by a GPU system it's almost always done by a dedicated system running a full chrome fork in some automated way (not just the signatures but some bugs give that away).
And frankly processing a single page of text is run within a single token window so likely is run for a blink (ms) before moving onto the next data entry. The kicker is it's run over potentially thousands of times depending on your training strategy.
At inference there's now a dedicated tool that may perform a "live" request to scrape the site contents. But then this is just pushed into a massive context window to give the next token anyway.
The point is that scraping is already inherently cost-intensive so a small additional cost from having to solve a challenge is not going to make a dent in the equation. It doesn't matter what server is doing what for that.