Comment by firefoxd
Comment by firefoxd 4 days ago
I must be using web browsers completely wrong. Like browsing a page isn't a problem for me. I can do it at the speed of my needs.
I'm having a hard time understanding why I will tell gemini to create an account on some website for me or send an email. Those are usually just a tab away. That's why I feel like I'm missing something here.
Basically none of their examples are just "browse a page"? They're multi-step tasks combining data from multiple pages.
Like the first example in the demo carousel (the Y2K party) starts from a photo and a prompt of roughly "buy the props needed for replicating this photo from Etsy". It first analyzes the image in the current tab, identifies a bunch of things to buy, searches for them on Etsy, customizes the orders, adds them to the shopping basket, and then asks for a confirmation to actually send an order.
The second one auto-fills a form with a couple of dozen fields from the data that's in a pdf in another tab. (And in the fiction of a demo, presumably a pdf that's you already had around, not one that you made just for the purposes of using it to auto-fill the form.)
I'm not the target market for this: automating a browser with my credentials is just too scary, but I can certainly see the utility. There's a huge amount of tasks taking a minute or two are not worth creating bespoke automation for but that are also pretty mechanical processes.