Comment by zozbot234
> The point remains that it's very useful to have a memory management approach that can turn the RAM you've already paid for to reduce CPU consumption.
That approach is specifically arenas: if you can put useful bounds on the maximum size of your "dead" data, it can pay to allocate everything in an arena and free it all in one go. This saves you the memory traffic of both manual management and tracing GC. But coming up with such bounds involves manual choices, of course.
It goes without saying that memory compaction involves a whole lot of extra traffic on the memory subsystem, so it's unlikely to help when memory bandwidth is the key bottleneck. Your claim that a 200MB working set is probably the same as a 100GB working set (or, for that matter, a 500MB or 1GB working set, which is more in the ballpark of real-world comparisons) when it comes to how it's impacted by the memory bottleneck is one that I have some trouble understanding also - especially since you've been arguing for using up more memory for the exact same workload.
Your broader claim wrt. memory makes a whole lot of sense in the context of how to tune an existing tracing GC when that's a forced choice anyway (which, AIUI, is also what the talk is about!) but it just doesn't seem all that relevant to the merits of tracing GC vs. manual memory management.
> we're not seeing any kind of abandonment of tracing GC at a rate that is even close to suggesting some significant economic value in abandoning them
We're certainly seeing a lot of "economic value" being put on modern concurrent GC's that can at least perform tolerably well even without a lot of memory headroom. That's how the Golang GC works, after all.
> It goes without saying that memory compaction involves a whole lot of extra traffic on the memory subsystem
It doesn't go without saying that compaction involves a lot of memory traffic, because memory is utilised to reduce the frequency of GC cycles and only live objects are copied. The whole point of tracing collection is that extra RAM is used to reduce the total amount of memory management work. If we ignore the old generation (which the talk covers separately), the idea is that you allocate more and more in the young gen, and when it's exhausted you compact only the remaining live objects (which is a constant for the app); the more memory you assign to the young gen, the less frequently you need to do even that work. There is no work for dead objects.
> when it comes to how it's impacted by the memory bottleneck is one that I have some trouble understanding also - especially since you've been arguing for using up more memory for the exact same workload.
Memory bandwidth - at least as far as latency is concerned - is used when you have a cache miss. Once your live set is much bigger than your L3 cache, you get cache misses even when you want to read it. If you have good temporal locality (few cache misses), it doesn't matter how big your live set is, but the same is if you have bad temporal locality (many cache misses).
> which, AIUI, is also what the talk is about
The talk focuses on tracing GCs, but it applies equally to manual memory management (as discussed in the Q&A; using less memory for the same algorithm requires CPU work regardless if it's manual or automatic)
> when that's a forced choice
I don't think tracing GCs are ever a forced choice. They keep getting chosen over and over for heavy workloads on machines with >= 1GB/core because they offer a more attractive tradeoff than other approaches for some of the most popular application domains. There's little reason for that to change unless the economics of DRAM/CPU change significantly.