Comment by pixl97
The biggest problem here is preventing said scraping from shutting down the sites with cost.
Over the years most of the problems I had with sites getting overloaded were from valid 'scrapers' like Google. Quite often providers were adding more memory/cpu just to ensure Google could index the site.
While hosting costs are cheaper than ever, being on the internet can still get very expensive very fast.
In theory it could be solved by websites charging a very small fee (maybe crypto) for incoming requests, to pay operating costs. The fees from human browsing (even excessive like 1 site / second) would be negligible. APIs that use scraping would forward the fee to their users. Training or search (index) data would cost a lot to generate, but probably still insignificant compared to training the ML model or operating the search.
It already costs a small amount of electricity for clients to send requests, so maybe paying for the server to handle them wouldn’t be a big difference, but I don’t know much about networking.
Although in practice, similar things have been tried and those haven’t worked out, e.g. Coil. It would require adoption by most browsers or ISPs, otherwise participating sites would be mostly ignored (most clients don’t pay the fee so they don’t get access, they don’t cares because it’s probably a small site and/or there are alternatives).