Comment by badmonster

Comment by badmonster a day ago

How does Magnitude differentiate between the planner and executor LLM roles, and how customizable are these components for specific test flows?

anerli a day ago

So the prompts that are sent to the planner vs executor are completely distinct. We allow complete customization of the planner LLM with all major providers (Anthropic, OpenAI, Google AI Studio, Google Vertex AI, AWS Bedrock, OpenAI compatible). The executor LLM on the other hand has to fit very specific criteria, so we only support the Moondream model right now. For a model to act as the executor it needs to be able to specific specific pixel coordinates (only a few models support this, for example OpenAI/Anthropic computer use, Molmo, Moondream, and some others). We like Moondream because its super tiny and fast (2B). This means as long as we still have a "smart" planner LLM we can have very fast/cheap execution and precise UI interaction.

Reply View 2 replies

badmonster a day ago

does Moondream handle multi-step UI tasks reliably (like opening a menu, waiting for render, then clicking), or do you have to scaffold that logic separately in the planner?

Reply View | 1 reply
- anerli a day ago
  
  The planner can plan out multiple web actions at once, which Moondream can then execute in sequence on its own. So Moondream is never deciding how to execute more than one web action in a single prompt.
  What this really means for developers writing the tests is you don't really have to worry about it. A "step" in Magnitude can map to any number of web actions dynamically based on the description, and the agents will figure out how to do it repeatably.
  
  Reply View | 0 replies