Comment by montebicyclelo
Comment by montebicyclelo a day ago
> if the weights are taken from HF then what's the implementation for
The weights are essentially a bunch of floating point numbers, (grouped into tensors). The code says what operations to do with the weights. E.g. say you load matrix W from the weights, you could do `y = W @ x`, or `y = W.T @ x`, or `y = W @ W @ x` etc.