Comment by anerli
So the architecture is built with determinism in mind. The plan-caching system is still a work in progress, but especially once fully implemented it should be very consistent. As long as your interface doesn't change (or changes in trivial ways), Moondream alone can execute the same exact web actions as previous test runs without relying on any DOM selectors. When the interface does eventually change, that's where it becomes non-deterministic again by necessity, since the planner will need to generatively update the test and continue building the new cache from there. However once it's been adapted, it can once again be executed that way every time until the interface changes again.
Anerli wrote: “When the interface does eventually change, that's where it becomes non-deterministic again by necessity, since the planner will need to generatively update the test and continue building the new cache from there.”
But what determines that the UI has changed for a specific URL? Your software independent of the planner LLM or do you require the visual LLM to make a determination of change?
You should also stop saying 100% open source when test plan generation and execution depend on non-open source AI components. It just doesn’t make sense.