Comment by hyko

Comment by hyko 2 days ago

6 replies

The fatal problem with LLM-as-runtime-club isn’t performance. It’s ops (especially security).

When the god rectangle fails, there is literally nobody on earth who can even diagnose the problem, let alone fix it. Reasoning about the system is effectively impossible. And the vulnerability of the system is almost limitless, since it’s possible to coax LLMs into approximations of anything you like: from an admin dashboard to a sentient potato.

“zero UI consistency” is probably the least of your worries, but object permanence is kind of fundamental to how humans perceive the world. Being able to maintain that illusion is table stakes.

Despite all that, it’s a fun experiment.

cheema33 2 days ago

> The fatal problem with LLM-as-runtime-club isn’t performance. It’s ops (especially security).

For me it is predictability. I am a big proponent of AI tools. But even the biggest proponents admit that LLMs are non-deterministic. When you ask a question, you are not entirely sure what kind of answers you will get.

This behavior is acceptable as a developer assistance tool, when a human is in the loop to review and the end goal is to write deterministic code.

  • hyko 2 days ago

    Non-deterministic behaviour doesn’t help when trying to reason about the system. But you could in theory eliminate the non-determinism for a given input, and yet still be stuck with something unpredictable, in the sense that you can’t predict what new input will cause.

    Whereas that sort of evaluation is trivial with code (even if at times program execution is non-deterministic), because its mechanics are explainable. Things like only testing boundary conditions hinge on this property, but completely fall apart if it’s all probabilistic.

    Maybe explainable AI can help here, but to be honest I have no idea what the state of the art is for that.

finnborge 2 days ago

At this extreme, I think we'd end up relying on backup snapshots. Faulty outcomes are not debugged. They, and the ecosystem that produced them, are just erased. The ecosystem is then returned to its previous state.

Kind of like saving a game before taking on a boss. If things go haywire, just reload. Or maybe like cooking? If something went catastrophically wrong, just throw it out and start from the beginning (with the same tools!)

And I think the only way to even halfway mitigate the vulnerability concern is to identify that this hypothetical system can only serve a single user. Exactly 1 intent. Totally partitioned/sharded/isolated.

  • hyko 2 days ago

    Backup snapshots of what though? The defects aren’t being introduced through code changes, they are inherent in the model and its tooling. If you’re using general models, there’s very little you can do beyond prompt engineering (which won’t be able to fix all the bugs).

    If you were using your own model you could maybe try to retrain/finetune the issues away given a new dataset and different techniques? But at that point you’re just transmuting a difficult problem into a damn near impossible one?

    LLMs can be miraculous and inappropriate at the same time. They are not the terminal technology for all computation.

indigodaddy 2 days ago

What if they are extremely narrow and targeted LLMs running locally on the endpoint system itself (llamafile or whatever)? Would that make this concern at least a little better?