Comment by rahimnathwani

Comment by rahimnathwani 7 days ago

4 replies

If an agent gets a copy of the screen using browser_screenshot and then wants to click somewhere on that screen, how is it meant to find the right css selector to pass to browser_click?

There's a browser_find method, but that assumes you already know what type of element it is. But I can't always tell what type of element something is just by looking at a screenshot.

What have I missed or misunderstood?

coty 7 days ago

For right now, the MCP server doesn’t expose quite enough to navigate on its own.

I’ve added a browser_evaluate tool in my fork—though I haven’t committed or pushed a PR yet. With that, the agent can call JavaScript to get the accessibility tree and then use that to navigate via browser_find.

This and much more will be coming soon. See the V2 roadmap for more insight: https://github.com/VibiumDev/vibium/blob/main/V2-ROADMAP.md

  • hugs 7 days ago

    one of the wild things about vibe coding is... i want to add that feature, but i'm slightly more interested in using the prompt/spec you might have used to create it, not the patch itself.

    • coty 3 days ago

      Yeah. Let me see if I can find or reconstitute that prompt. Ultimately I wanted to have a system for automagically keeping Java up-to-date with JavaScript.

    • rahimnathwani 6 days ago

      Sometimes an AI-written spec based on the code is better than that the spec/prompts used to create the patch.