HN Top New Show Ask Jobs

settings

Theme

Hand Mode

Feed

Comment by ironbound

Comment by ironbound 3 months ago

0 replies

View on Hacker News

The Deepseek v3 paper details a quantisation method of scaling after matmul but before accumulation to improve precision, this is different than normal GEMM as operations are left till the end, can read more in chapter 3.3 of the paper below.

https://arxiv.org/html/2412.19437v2#S3