Comment by chaosharmonic
Comment by chaosharmonic 9 days ago
I've been on a similar thread w my own crawler project -- conceptually at least, since I'm intentionally building as much of it by hand as possible... Anyway, after a lot of browser automation, I've realized that it's more flexible and easier to maintain to just use a DOM polyfill server-side and then use the client to get raw HTML responses wherever possible. (And, in conversations about similar LLM-focused tools, that if you generate parsing functions you can reuse, you don't necessarily need an LLM to process your results.)
I'm still trying to figure out the boundaries of where and how I want to scale that out into other stuff -- things like when to use `page` methods directly, vs passing a function into `page.evaluate`, vs other alternatives like a browser extension or a CLI tool. And I'm still needing to work around smaller issues with the polyfill and its spec coverage (leaving me to use things like `getAttribute` more than I would otherwise). But in the meantime it's simplified a lot of ancillary issues, like handling failures on my existing workflows and scaling out to new targets, while I work on other bot detection issues.