Comment by sarchertech
Comment by sarchertech 2 days ago
The average mechanic won’t do something completely different to your car because you added some extra filler words to your request though.
The average user may not care exactly what the mechanic does to fix your car, but they do expect things to be repeatable. If car repair LLMs function anything like coding LLMs, one request could result in an oil change, while a similar request could end up with an engine replacement.
I think we're making similar points, but I kind of phrased it weirdly. I agree that current LLMs are sensitive to phrasing and are highly unpredictable and therefore aren't useful in AI-based backends. The point I'm making is that these issues are potentially solvable with better AI and don't philosophically invalidate the idea of a non-programmatic backend.
One could imagine a hypothetical AI model that can do a pretty good job of understanding vague requests, properly refusing irrelevant requests (if you ask a mechanic to bake you a cake he'll likely tell you to go away), and behaving more or less consistently. It is acceptable for an AI-based backend to have a non-zero failure rate. If a mechanic was distracted or misheard you or was just feeling really spiteful, it's not inconceivable that he would replace your engine instead of changing your oil. The critical point is that this happens very, very rarely and 99.99% of the time he will change your oil correctly. Current LLMs have far too high of a failure rate to be useful, but having a failure rate at all is not a non-starter for being useful.