Comment by andai
>But why aren’t LLM inference engines deterministic? One common hypothesis is that some combination of floating-point non-associativity and concurrent execution leads to nondeterminism based on which concurrent core finishes first. We will call this the “concurrency + floating point” hypothesis for LLM inference nondeterminism.
Dang, so we don't even know why it's not deterministic, or how to make it so? That's quite surprising! So if I'm reading this right, it doesn't just have to do with LLM providers cutting costs or making changes or whatever. You can't even get determinism locally. That's wild.
But I did read something just the other day about LLMs being invertible. It goes over my head but it sounds like they got a pretty reliable mapping from inputs to outputs, at least?
https://news.ycombinator.com/item?id=45758093
> Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs could map to the same output and prevent exact recovery of the input from a model's representations. In this paper, we challenge this view. First, we prove mathematically that transformer language models mapping discrete input sequences to their corresponding sequence of continuous representations are injective and therefore lossless, a property established at initialization and preserved during training. Second, we confirm this result empirically through billions of collision tests on six state-of-the-art language models, and observe no collisions.
The distinction here appears to be between the output tokens versus some sort of internal state?