Comment by Klaster_1
It's funny you mentioned "deterministic Playwright code," because in my experience, that’s one of the most frustrating challenges of writing integration tests with browser automation tools. Authoring tests is relatively easy, but creating reliable, deterministic tests is much harder.
Most of my test failures come down to timing issues—CPU load subtly affects execution, leading to random timeouts. This makes it difficult to run tests both quickly and consistently. While proactive load-testing of the test environment and introducing artificial random delays during test authoring can help, these steps often end up taking more time than writing the tests themselves.
It would be amazing if tools were smart enough to detect these false positives automatically. After all, if a human can spot them, shouldn’t AI be able to as well?
I was working on a side project over the holidays with the (I think) same idea as mpalmer imagined there too (though my project wouldn't be interested to him either, because my goal wasn't automating tests)
Basically, the goal would be to do it like with screenshot regression tests: basically you get 2 different execution phases: - generate - verify
And when verify fails in CI, you can automatically run a generate and open a MR/PR with the new script.
This let's you audit the script and make a plausibility check and you'll be notified on changes but have minimal effort to keep the tests running