Comment by ajb

Comment by ajb 9 months ago

So they massively reduce the area lost to defects per wafer, from 361 to 2.2 square mm. But from the figures in this blog, this is massively outweighed by the fact that they only get 46222 sq mm useable area out of the wafer, as opposed to 56247 that the H100 gets - because they are using a single square die instead of filling the circular wafer with smaller square dies, they lose 10,025 sq mm!

Not sure how that's a win.

Unless the rest of the wafer is useable for some other customer?

nine_k 9 months ago

It's a win because they have to test one chip, and don't have to spend resources on connecting the chiplets. The latter costs a lot (though it has other advantages). I suspect that a chiplet-based device with total 900k cores would just be not viable due to the size constraints.

If their routing around the defects is automated enough (given the highly regular structure), it may be a massive economy of efforts on testing and packaging the chip.

Reply View 0 replies

ungreased0675 9 months ago

Why does it have to be a square? There’s no need to worry about interchangeable third-party heat sink compatibility. Is it possible to make it an irregular polygon instead of square?

Reply View 0 replies

kristjansson 9 months ago

Additional wafer area would be a marginal increase in performance (+~20% core core best case) but increases the complexity of their design, and requires they figure out how to package/connect/house/etc. a non-standard shape. A wafer scale chip is already a huge tech risk, why spend more novelty budget on nonessential weirdness?

Reply View 0 replies

Scaevolus 9 months ago

Why does their chip have to be rectangular, anyways? Couldn't they cut out a (blocky) circle too?

Reply View 17 replies

Qwertious 9 months ago

You need a rectilinear polygon that tessellates, and has the fewest sides possible to minimize the number of cuts necessary. And it would probably help the cutting if the shape is entirely convex, so that cuts can overshoot a bit without damaging anything.
That suggests a rectangle is the only possible shape.

Reply View | 2 replies
- CorrectHorseBat 9 months ago
  
  If it's just one chip per wafer, why even bother cutting?
  
  Reply View | 0 replies
- timerol 9 months ago
  
  Why does it need to tessellate if there's only one chip per wafer?
  
  Reply View | 0 replies
nine_k 9 months ago

Rather I wonder why do they even need to cut the extra space, instead of putting something there. I suppose that the structure of the device is highly rectangular from the logical PoV, so there's nothing useful to put there. I suspect smaller unrelated chips can be produced on these areas along the way.

Reply View | 0 replies
guyzero 9 months ago

I've never cut a wafer, but I assume cutting is hard and single straight lines are the easiest.

Reply View | 9 replies
- sroussey 9 months ago
  
  I wonder if you could… just not cut the wafer at all??
  
  Reply View | 8 replies
  
  ryao 9 months ago
  
  I suspect this would cause alignment issues since you could literally rotate it into the wrong position when doing soldering. That said, perhaps they could get away with cutting less and using more.
  
  Reply View | 3 replies
  
  daedrdev 9 months ago
  
  That's the idea in the article. Just one big chip. But the reason why it's normally done is that there is a pretty high defect rate, so cutting if every wafer has 1-2 defects you still get (X-1.5) devices per wafer. In the article thy go into how they avoid this problem (I think its better fault tolerance, at a cost)
  
  Reply View | 1 reply
  
  gpm 9 months ago
  
  The article shows them using a single maximally sized square portion of a circular wafer.
  I think the proposal you're responding to is "just use the whole circular wafer without cutting out a square".
  
  Reply View | 0 replies
  
  axus 9 months ago
  
  Might be jumping in without reading, but the chips you cut out of the wafer have to be delivered to physically different locations.
  
  Reply View | 1 reply
  
  ajb 9 months ago
  
  Normally yes. But they're using a whole wafer for a single chip! So it's actually a good idea.
  I guess the issue is how do you design your routing fabric to work in the edge regions.
  Actually I wonder how they are exposing this wafer. Normal chips are exposed in a rectangular batch called a reticle. The reticle mask has repeated patterns across it, and it is then exposed repeatedly across the wafer. So either they have to make a reticle mask the full size of the wafer, which sounds expensive, or they somehow have to precisely align reticle exposures so that the joined edges form valid circuits.
  
  Reply View | 0 replies
yannyu 9 months ago

The cost driver for fabbing out wafers is the number of layers and the number of usable devices per wafer. Higher layer count increases cost and tends to decrease yield, and more robust designs with higher yields increase usable devices per wafer. If circles or other shapes could help with either of those, they would likely be used. Generally the end goal is to have the most usable devices per wafer, so they'll be packed as tightly as possible on the wafer so as to have the highest potential output.

Reply View | 1 reply
- Scaevolus 9 months ago
  
  Right, but they're making just one usable device per wafer already.
  
  Reply View | 0 replies
[removed] 9 months ago

[deleted]

Reply View | 0 replies

olejorgenb 9 months ago

Is the wafer itself so expensive? I assume they don't pattern the unused area, so the process should be quicker?

Reply View 14 replies

addaon 9 months ago

> I assume they don't pattern the unused area
I’m out of date on this stuff, so it’s possible things have changed, but I wouldn’t make that assumption. It is (used to be?) standard to pattern the entire wafer, with partially-off-the-wafer dice around the edges of the circle. The reason for this is that etching behavior depends heavily on the surrounding area — the amount of silicon or copper whatever etched in your neighborhood affects the speed of etching for you, which effects line width, and (for a single mask used for the whole wafer) thus either means you need to have more margin on your parameters (equivalent to running on an old process) or have a higher defect right near the edge of the die (which you do anyway, since you can only take “similar neighborhood” so far). This goes as far as, for hyper-optimized things like SRAM arrays, leaving an unused row and column at each border of the array.

Reply View | 1 reply
- kurthr 9 months ago
  
  All the process steps are limited by wafers for hour. Lithography (esp EUV) might be slightly faster, but that's not 30% of total steps, since you generally have deposit and etch/implant for every lithography step.
  It's close to a dead loss in process cost.
  
  Reply View | 0 replies
yannyu 9 months ago

> I assume they don't pattern the unused area, so the process should be quicker?
The primary driver of time and cost in the fabrication process is the number of layers for the wafers, not the surface area, since all wafers going through a given process are the same size. So you generally want to maximize the number of devices per wafer, because a large part of your costs will be calculated at the per-wafer level, not a per-device level.

Reply View | 5 replies
- mattashii 9 months ago
  
  Yes, but isn't a big driver of layer costs the cost of the machines to build those layers?
  For patterning, a single iteration could be (example values, no actual values used, probably only ballpark accuracy) on a 300M$ EUV machine with 5-year write off cycle, patterns on average 180 full wafers /hour. Excluding energy usage and service time, each wafer that needs full patterning would cost ~38$. If each wafer only needed half the area patterned, the lithography machine might only spend half its usual time on such a wafer, and that could double the throughput of the EUV machine, halving the write-off based cost component of such a patterning step.
  Given that each layer generally consists of multiple patterning steps, a 10-20% reduction in those steps could give a meaningful reduction in time spent in the machines whose time spend on the wafer depends on the used wafer area.
  This of course doesn't help reduce time in polishing or etching (and other steps that happen with whole wafers at a time), so it won't be as straightforward as % reduction in wafer area usage == % reduction in cost, but I wouldn't be surprised if it was a meaningful percentage.
  
  Reply View | 1 reply
  
  yannyu 9 months ago
  
  > Yes, but isn't a big driver of layer costs the cost of the machines to build those layers?
  Let's say the time spent in lithography step is linear the way you're describing. Even with that, the deposition step beforehand is surface area independent and would be applied across the entire wafer, and takes just as long if not longer than the lithography.
  Additionally, if you were going to build a fab ground up for some specific purpose, then you might optimize the fab for those specific devices as you lay out. But most of these companies are not doing that and are simply going through TSMC or a similar subcontractor. So you've got an additional question of how far TSMC will go to accommodate customers who only want to use half a wafer, and whether that's the kind of project they could profitably cater to.
  
  Reply View | 0 replies
- olejorgenb 9 months ago
  
  Yes, but my understanding is that the wafer is exposed in multiple steps, so there would still be less exposure steps? Probably insignificant compared to all the rest though. (Etching, moving the wafer, etc.)
  EDIT: to clarify - I mean the exposure of one single pattern/layer is done in multiple steps. (https://en.wikipedia.org/wiki/Photolithography#Projection)
  
  Reply View | 2 replies
  
  yannyu 9 months ago
  
  The number of exposure steps would be unrelated to the (surface area) size of die/device that you're making. In fact, in semiconductor manufacturing you're typically trying to maximize the number of devices per wafer because it costs the same to manufacture 1 device with 10 layers vs 100 devices with 10 layers on the same wafer. This goes so far as to have companies or business units share wafers for prototyping runs so as to minimize cost per device (by maximizing output per wafer).
  Also, etching, moving, etc is all done on the entire wafer at the same time generally, via masks and baths. It's less of a pencil/stylus process, and more of a t-shirt silk-screening process.
  
  Reply View | 1 reply
  
  gpm 9 months ago
  
  > This goes so far as to have companies or business units share wafers for prototyping runs so as to minimize cost per device
  Can this be done in production? Is there a chance that the portion of the wafer cerebras.ai can't fit their giant square in is being used for production of some other companies chips?
  
  Reply View | 0 replies
pulvinar 9 months ago

There's also no reason they couldn't pattern that area with some other suitable commodity chips. Like how sawmills and butchers put all cuts to use.

Reply View | 1 reply
- sitkack 9 months ago
  
  Often those areas are used for test chips and structures for the next version. They are effectively free, so you can use them to test out ideas.
  
  Reply View | 0 replies
ajb 9 months ago

Good question. I think the wafer has a cost per area which is fairly significant, but I don't have any figures. There has historically been a push to utilise them more efficiently, eg by building fabs that can process larger wafers. Although mask exposure would be per processed area, I think that there are also some proportion of processing time which is per wafer, so the unprocessed area would have an opportunity cost relating to that.

Reply View | 0 replies
kristjansson 9 months ago

AIUI Wafer marginal cost is lower than you'd expect. I had $50k in my head, quick google indicates[1] maybe <$20k at AAPL volumes? Regardless seems like the economics for Cerebras would strongly favor yield over wafer area utilization.
[1] https://www.tomshardware.com/tech-industry/tsmcs-wafer-prici...

Reply View | 0 replies
[removed] 9 months ago

[deleted]

Reply View | 0 replies
georgeburdell 9 months ago

They probably pattern at least next nearest neighbors for local uniformity. That’s just litho though. The rest of the process is done all at once on the wafer

Reply View | 0 replies

sroussey 9 months ago

It’s a win if you can use the wafer as opposed to throwing it away.

Reply View 1 reply

kristjansson 9 months ago

A win is a manufacturing process that results in a functioning product. Wafers, etc. aren't so scarce as to demand every mm2 be used on every one every time.

Reply View | 0 replies