Comment by electroly

Comment by electroly 11 hours ago

0 replies

Looks awesome. I've attempted my own implementation, but I never got it to work particularly well. "Open Notepad and type Hello World" was a triumph for me. I landed on the UIA tree + annotated screenshot combination, too, but mine was too primitive, and I tried to use GPT which isn't as good at image tasks as Gemini as used here. Great job!