Comment by semiquaver
Comment by semiquaver 2 days ago
LLMs are completely deterministic. Their fundamental output is a vector representing a probability distribution of the next token given the model weights and context. Given the same inputs an identical output vector will be produced 100% of the time.
This fact is relied upon by for example https://bellard.org/ts_zip/ a lossless compression system that would not work if LLMs were nondeterministic.
In practice most LLM systems use this distribution (along with a “temperature” multiplier) to make a weighted random choice among the tokens, giving the illusion of nondeterminism. But there’s no fundamental reason you couldn’t for example always choose the most likely token, yielding totally deterministic output.
This is an excellent and accessible series going over how transformer systems work if you want to learn more. https://youtu.be/wjZofJX0v4M
>In practice most LLM systems use this distribution (along with a “temperature” multiplier) to make a weighted random choice among the tokens
In other words, LLMs are not deterministic in just about any real setting. What you said there only compounds with MoE architectures, variable test-time compute allocation, and o3-like sampling.