Comment by godelski
> They're still in the process of researching it
I should have taken more care to link a article, but I was trying you link something more clear.But mind you, everything Waymo does is under research.
So let's look at something newer to see if it's been incorporated
> We will unpack our holistic AI approach, centered around the Waymo Foundation Model, which powers a unified demonstrably safe AI ecosystem that, in turn, drives accelerated, continuous learning and improvement.
> Driving VLM for complex semantic reasoning. This component of our foundation model uses rich camera data and is fine-tuned on Waymo’s driving data and tasks. Trained using Gemini, it leverages Gemini’s extensive world knowledge to better understand rare, novel, and complex semantic scenarios on the road.
> Both encoders feed into Waymo’s World Decoder, which uses these inputs to predict other road users behaviors, produce high-definition maps, generate trajectories for the vehicle, and signals for trajectory validation.
They also go on to explain model distillation. Read the whole thing, it's not longhttps://waymo.com/blog/2025/12/demonstrably-safe-ai-for-auto...
But you could also read the actual research paper... or any of their papers. All of them in the last year are focused on multimodality and a generalist model for a reason which I think is not hard do figure since they spell it out
Note this is not end-to-end... All that VLM can do is to "contribute a semantic signal".
So put a fake "detour" sign, so the vehicle thinks it's a detour and starts to follow? Possible. But humans can be fooled like this too.
Put a "proceed" sign so the car runs over the pedestrian, like that article proposes? Get car to hit a wall? Not going to happen.