Comment by pizza
They do have a weak relationship, in that earlier index tokens were encountered earlier during the formation of the vocabulary, so they are similar in typicality
They do have a weak relationship, in that earlier index tokens were encountered earlier during the formation of the vocabulary, so they are similar in typicality
No, if you check the diagram (page 2) these are literally indexes into the KV vectors, not positional indexes in the text. If it was the text I would agree with you.