Comment by nasreddin

Comment by nasreddin 11 hours ago

3 replies

Its an incorrect assumption, the inference speed and particularly the inference speed of the on-device LLMs with which AVs would need to be using is not compatible with the structural requirements of driving.

[removed] 10 hours ago
[deleted]
nharada 9 hours ago

I think the assumption is valid. Most of the reasoning components of the next gen (and some current gen) robotics will use VLMs to some extent. Deciding if a temporary construction sign is valid seems to fall under this use case.

  • theamk 5 hours ago

    But unless you are using a single, end-to-end model for the entire driving stack, that "proceed" command will never influence accelerator pedal.

    Sure, there will be a VLM for reading the signs, but the worst it'd be able to output is things like "there is a "detour" sign at (123, 456) pointing to road #987" - and some other, likley non-LLM, mechanism will ensure that following that road is actually safe.