Comment by tsimionescu
Comment by tsimionescu 7 months ago
Depends on what you mean specifically by the output. The actual neural network will produce deterministic outputs that could be interpreted as probability values for various tokens. But the interface you'll commonly see used in front of these models will then non-deterministiclaly choose a single next token to output based on those probabilities. Then, this single randomly chosen output is fed back into the network to produce another token, and this process repeats.
I would ultimately call the result non-deterministic. You could make it deterministic relatively easily by having a deterministic process for choosing a single token from all of the outputs of the NN (say, always pick the one with the highest weight, and if there are multiple with the same weight, pick the first one in token index order), but no one normally does this, because the results aren't that great per my understanding.
You can have the best of both worlds with something like weighted_selection( output, hash( output ) ) using the hash as the PRNG seed. (If you're paranoid about statistical issues due to identical outputs (extremely unlikely) then add a nonce to the hash.)