Comment by ygouzerh

Comment by ygouzerh 2 days ago

2 replies

The rate of progress on multimodal agents is impressive. OpenVLA was released in June 2024 and was state of the art at that time... 8 months later, on tasks like "Pick Place Hotdog Sausage" the success rate is passing from 2/10 to 6/10

paulluuk 2 days ago

"Pick Place Hotdog Sausage" is such a bizarre name, though. Is it meant to be human readable? AI-readable? Just a label for the researchers? Same with "Put Mushroom Place Pot". As far as I can see both labels are only used in this Magma paper, nowhere else that Google can find.

  • ekidd 2 days ago

    "Pick & place" is a term for a kind of robot that can pick up scattered items from a conveyor belt and arrange them in a regular fashion.

    The really fast multi-arm versions can be hypnotic to watch. You can see an example at 1:00 in this video: https://youtu.be/aPTd8XDZOEk

    The limitation of industrial pick & place robots is that they're configured for a single task, and reconfiguring them for a new product is notoriously expensive.

    Magma's "pick & place" demo is much slower and shakier than a specialized industrial robot. But Magma can apparently be adapted to a new task by providing plain English instructions.