Comment by godelski

Comment by godelski 9 hours ago

You are confidently wrong

  > Powered by Gemini, a multimodal large language model developed by Google, EMMA employs a unified, end-to-end trained model to generate future trajectories for autonomous vehicles directly from sensor data. Trained and fine-tuned specifically for autonomous driving, EMMA leverages Gemini’s extensive world knowledge to better understand complex scenarios on the road.

https://waymo.com/blog/2024/10/introducing-emma/

written-beyond 9 hours ago

You were confidently wrong for judging them to be confidently wrong

> While EMMA shows great promise, we recognize several of its challenges. EMMA's current limitations in processing long-term video sequences restricts its ability to reason about real-time driving scenarios — long-term memory would be crucial in enabling EMMA to anticipate and respond in complex evolving situations...

They're still in the process of researching it, noting in that post implies VLM are actively being used by those companies for anything in production.

Reply View 2 replies

godelski 7 hours ago
> They're still in the process of researching it
I should have taken more care to link a article, but I was trying you link something more clear.
But mind you, everything Waymo does is under research.
So let's look at something newer to see if it's been incorporated
> We will unpack our holistic AI approach, centered around the Waymo Foundation Model, which powers a unified demonstrably safe AI ecosystem that, in turn, drives accelerated, continuous learning and improvement. > Driving VLM for complex semantic reasoning. This component of our foundation model uses rich camera data and is fine-tuned on Waymo’s driving data and tasks. Trained using Gemini, it leverages Gemini’s extensive world knowledge to better understand rare, novel, and complex semantic scenarios on the road. > Both encoders feed into Waymo’s World Decoder, which uses these inputs to predict other road users behaviors, produce high-definition maps, generate trajectories for the vehicle, and signals for trajectory validation.
They also go on to explain model distillation. Read the whole thing, it's not long
https://waymo.com/blog/2025/12/demonstrably-safe-ai-for-auto...
But you could also read the actual research paper... or any of their papers. All of them in the last year are focused on multimodality and a generalist model for a reason which I think is not hard do figure since they spell it out
Reply View | 1 reply
- theamk 4 hours ago
  
  Note this is not end-to-end... All that VLM can do is to "contribute a semantic signal".
  So put a fake "detour" sign, so the vehicle thinks it's a detour and starts to follow? Possible. But humans can be fooled like this too.
  Put a "proceed" sign so the car runs over the pedestrian, like that article proposes? Get car to hit a wall? Not going to happen.
  
  Reply View | 0 replies

fsckboy 9 hours ago

>to generate future trajectories for autonomous vehicles directly from sensor data

we will not have achieved true AGI till we start seeing bumper stickers (especially Saturday mornings) that say "This Waymo Brakes for Yard Sales"

Reply View 0 replies