Comment by jsheard
For the "good" bots which at least respect robots.txt you can use this list to get ahead of them before they pummel your site.
https://github.com/ai-robots-txt/ai.robots.txt
There's no easy solution for bad bots which ignore robots.txt and spoof their UA though.
Such as OpenAI, who will ignore robots.txt and change their user agent to evade blocks, apparently[1]
1: https://www.reddit.com/r/selfhosted/comments/1i154h7/openai_...