Comment by dahart

Comment by dahart 2 days ago

6 replies

My former boss (Steve Parker, RIP) shared a story of Turner Whitted making predictions about how much compute would be needed to achieve real-time ray tracing, some time around when his seminal paper was published (~1980). As the story goes, Turner went through some calculations and came to the conclusion that it’d take 1 Cray per pixel. Because of the space each Cray takes, they’d be too far apart and he thought they wouldn’t be able to link it to a monitor and get the results in real time, so instead you’d probably have to put the array of Crays in the desert, each one attached to an RGB light, and fly over it in an airplane to see the image.

Another comparison that is equally astonishing to the RPi is that modern GPUs have exceeded Whitted’s prediction. Turner’s paper used 640x480 images. At that resolution, extrapolating the 160 Mflops number, 1 Cray per pixel would be 49 Tera flops. A 4080 GPU has just shy of 50 Tflops peak performance, so it has surpassed what Turner thought we’d need.

Think about that - not just faster than a Cray for a lot less money, but one cheap consumer device is faster than 300,000 Crays.(!) Faster than a whole Cray per pixel. We really have come a long, long way.

The 5090 has over 300 Tflops of ray tracing perf, and the Tensor cores are now in the Petaflops range (with lower precision math), so we’re now exceeding the compute needed for 1 Cray per pixel at 1080p. 1 GPU faster than 2M Crays. Mind blowing.

phendrenad2 18 hours ago

Whitted mentioned! Cofounder of the first 3d game engine company.

magicalhippo 2 days ago

> 1 Cray per pixel would be 49 Tera flops. A 4080 GPU has just shy of 50 Tflops peak performance

Interesting, wonder how it compares in terms of transistors. How many transistors combined did one Cray have in compute and cache chips?

  • dahart 2 days ago

    The Wikipedia article says the Cray-1 has 200k gates. I assume that would mean something slightly north of 2x the number of transistors? https://en.wikipedia.org/wiki/Cray-1#Description

    200k * 300k Cray-1s would be 60B gates, whereas the 4080 actually has 46B transistors. Seems like we’re totally in the right ballpark.

nottorp a day ago

But the Cray had a general purpose CPU while the GPUs have specialized hardware. Not exactly apples to apples.

  • monocasa a day ago

    The main part of the Cray was a compute offload engine that asynchronously executed job lists submitted by front end general purpose computers that ran OSes like Unix.

    It was actually pretty close to the model of a GPU.