Comment by ajb
Comment by ajb 3 days ago
So they massively reduce the area lost to defects per wafer, from 361 to 2.2 square mm. But from the figures in this blog, this is massively outweighed by the fact that they only get 46222 sq mm useable area out of the wafer, as opposed to 56247 that the H100 gets - because they are using a single square die instead of filling the circular wafer with smaller square dies, they lose 10,025 sq mm!
Not sure how that's a win.
Unless the rest of the wafer is useable for some other customer?
It's a win because they have to test one chip, and don't have to spend resources on connecting the chiplets. The latter costs a lot (though it has other advantages). I suspect that a chiplet-based device with total 900k cores would just be not viable due to the size constraints.
If their routing around the defects is automated enough (given the highly regular structure), it may be a massive economy of efforts on testing and packaging the chip.