Comment by addaon

Comment by addaon a day ago

4 replies

Suppose two models with similar parameters trained the same way on 1800-1875 and 1800-2025 data. Running both models, we get probability distributions across tokens, let's call the distributions 1875' and 2025'. We also get a probability distribution finite difference (2025' - 1875'). What would we get if we sampled from 1.1*(2025' - 1875') + 1875'? I don't think this would actually be a decent approximation of 2040', but it would be a fun experiment to see. (Interpolation rather than extrapolation seems just as unlikely to be useful and less likely to be amusing, but what do I know.)

sigmoid10 7 hours ago

These probability shifts would only account for the final output layer (which may also have some shift), but I expect the largest shift to be in the activations in the intermediate latent space. There are a bunch of papers out there that try to get some offset vector using PCA or similar to tune certain model behaviours like vulgarity or friendlyness. You don't even need much data for this as long as your examples capture the essence of the difference well. I'm pretty certain you could do this with "historicalness" too, but projecting it into the future by turning the "contemporaryness" knob way up probably won't yield an accurate result. There are too many outside influences on language that won't be captured in historical trends.

  • lopuhin 5 hours ago

    On whether this accounts only the final output layer -- once the first token is generated (i.e. selected according to the modified sampling procedure), and assuming a different token is selected compared to standard sampling, then all layers of the model would be affected during generation of subsequent tokens.