Comment by gregpr07

1. Oh yes right. I remember trying it out thinking it was going to be brittle because of analytics etc but it filters for those surprisingly well.

2. We are working on https://www.launchskylight.com/ , agentic QA. For the self onboarding version we are using pure CUA without caching. (We wanted to avoid playwright to make it more flexible for canvas+iframe based apps,where we found HTML based approaches like browser-use limited, and to support desktop apps in the future).

We are betaing caching internally for customers, and releasing it for the self-onboarding soon. We use CUA actions for caching instead of playwright. Caching with pixel native models is def a bit more brittle for clicks and we focus on purely vision based analysis to decide to proceed or not. I think for scaling though you are 100% right, screenshots every step for validating are okay/worth it, but running an agent non-deterministically for actions is def an overkill for enterprise, that was what we found as well.

Geminis video understanding is also an interesting way to analyze what went wrong in more interactive apps. Apart from that i think we share quite a bit of the core thinking, would be interested to chat, will DM!