Comment by languse

Comment by languse 8 hours ago

0 replies

Hi HN, I'm the creator of Android Use.

I built this tool to enable natural language-driven automation on Android devices.

While many existing agents rely heavily on vision (taking screenshots and analyzing pixels), I took a different approach: XML parsing.

By analyzing the UI hierarchy directly via XML, the agent can:

Achieve precise positioning and interaction (clicking via index). Run faster and more efficiently. Work effectively with LLMs that don't have vision capabilities (or use cheaper text-only models like Deepseek/Kimi/Qwen). It's open source, and I'd love to hear your feedback or answer any questions about the implementation!