Comment by mpalmer
Comment by mpalmer 9 days ago
This looks very cool and makes a lot of sense, except for the idea that it should take the place of Playwright et al.
Personally I'd love to use this as an intermediate workflow for producing deterministic playwright code, but it looks like this is intended for running directly.
I don't think I could plausibly argue for using LLMs at runtime in our test suite at work...
It's funny you mentioned "deterministic Playwright code," because in my experience, that’s one of the most frustrating challenges of writing integration tests with browser automation tools. Authoring tests is relatively easy, but creating reliable, deterministic tests is much harder.
Most of my test failures come down to timing issues—CPU load subtly affects execution, leading to random timeouts. This makes it difficult to run tests both quickly and consistently. While proactive load-testing of the test environment and introducing artificial random delays during test authoring can help, these steps often end up taking more time than writing the tests themselves.
It would be amazing if tools were smart enough to detect these false positives automatically. After all, if a human can spot them, shouldn’t AI be able to as well?