Comment by phire

Both use a unified memory architecture, where the GPU and CPU share the same pool of memory.

On the N64, the CPU always ends up bottlenecked by memory latency. The RAM latency is quite high to start with, your CPU is sitting idle for ~40 cycles if it ever misses the cache, assuming RCP is idle. If RCP is not idle, contention with can sometimes push that well over 150 cycles.

Kaze Emanuar has a bunch of videos (like this one https://www.youtube.com/watch?v=t_rzYnXEQlE) going into detail about this flaw.

The gamecube fixed this flaw in multiple ways. They picked a CPU with a much better cache subsystem. The PowerPC 750 had Multi-way caches instead of a direct mapped, and a quite large L2 cache. Their customisations added special instructions to stream graphics commands without polluting the caches, resulting in way less cache misses.

And when it did cache miss, the latency to main memory is under 20 cycles (despite the Gamecube's CPU running at 5x the clock speed). The engineers picked main memory that was super low latency.

To fix the issue of bus contention, they created a complex bus arbitration scheme and gave CPU reads the highest priority. The gamecube also has much less traffic on the bus to start with, because many components were moved out of the unified memory.

---------------------------

The N64 famously had only 4KB of TMEM (texture memory). Textures had to fit in just 4KB, and to enable mipmapping, they had to fit in half that. This lead to most games on the N64 using very small textures stretched over very large surfaces with bilinear filtering, and kind of gave N64 games a distinctive design language.

Once again, the engineers fixed this flaw in two ways. First, they made TMEM work as a cache, so textures didn't have to fit inside it. Second, they bumped the size of TMEM from 4KB all the way 1MB, which was massive overkill, way bigger than any other GPU of the era. Even today's GPUs only have ~64KB of cache for textures.

---------------------------

The fillrate of the N64 was quite low, especially when using the depth buffer and/or doing blending.

So the Gamecube got a dedicated 2MB of memory (embedded DRAM) for its framebuffer. Now rendering doesn't touch main memory at all. Depthbuffer is now free, no reason to not enable, and blending is more or less free too.

Rasterisation was one of the major causes of bus contention on the N64, so this embedded framebuffer has a side-effect of solving bus contention issues too problem too.

---------------------------

On the N64, the RSP was used for both vertex processing and sound processing. Not exactly a flaw, it saved on hardware. But it did mean any time spent processing sound was time that couldn't be spend rendering graphics.

The gamecube got a dedicated DSP for audio processing. The audio DSP also got its own pool of memory (once again reducing bus contention).

As for vertex processing, that was all moved into fixed function hardware. (There aren't that many GPUs that did transform and lighting in hardware. Earlier GPUs often implemented transform and lighting in DSPs (like the N64's RSP), and the industry were very quickly switching to vertex shaders)