Comment by theamk
Note this is not end-to-end... All that VLM can do is to "contribute a semantic signal".
So put a fake "detour" sign, so the vehicle thinks it's a detour and starts to follow? Possible. But humans can be fooled like this too.
Put a "proceed" sign so the car runs over the pedestrian, like that article proposes? Get car to hit a wall? Not going to happen.