Comment by frabonacci
Comment by frabonacci 21 hours ago
UI detection’s a big focus - we use visual grounding + structured observations (like icons, OCR, app metadata, window state), so the agent can reason more like a user would. It’s surprisingly robust even with layout shifts or new themes