Comment by zozbot234

Comment by zozbot234 a day ago

3 replies

> The point remains that it's very useful to have a memory management approach that can turn the RAM you've already paid for to reduce CPU consumption.

That approach is specifically arenas: if you can put useful bounds on the maximum size of your "dead" data, it can pay to allocate everything in an arena and free it all in one go. This saves you the memory traffic of both manual management and tracing GC. But coming up with such bounds involves manual choices, of course.

It goes without saying that memory compaction involves a whole lot of extra traffic on the memory subsystem, so it's unlikely to help when memory bandwidth is the key bottleneck. Your claim that a 200MB working set is probably the same as a 100GB working set (or, for that matter, a 500MB or 1GB working set, which is more in the ballpark of real-world comparisons) when it comes to how it's impacted by the memory bottleneck is one that I have some trouble understanding also - especially since you've been arguing for using up more memory for the exact same workload.

Your broader claim wrt. memory makes a whole lot of sense in the context of how to tune an existing tracing GC when that's a forced choice anyway (which, AIUI, is also what the talk is about!) but it just doesn't seem all that relevant to the merits of tracing GC vs. manual memory management.

> we're not seeing any kind of abandonment of tracing GC at a rate that is even close to suggesting some significant economic value in abandoning them

We're certainly seeing a lot of "economic value" being put on modern concurrent GC's that can at least perform tolerably well even without a lot of memory headroom. That's how the Golang GC works, after all.

pron a day ago

> It goes without saying that memory compaction involves a whole lot of extra traffic on the memory subsystem

It doesn't go without saying that compaction involves a lot of memory traffic, because memory is utilised to reduce the frequency of GC cycles and only live objects are copied. The whole point of tracing collection is that extra RAM is used to reduce the total amount of memory management work. If we ignore the old generation (which the talk covers separately), the idea is that you allocate more and more in the young gen, and when it's exhausted you compact only the remaining live objects (which is a constant for the app); the more memory you assign to the young gen, the less frequently you need to do even that work. There is no work for dead objects.

> when it comes to how it's impacted by the memory bottleneck is one that I have some trouble understanding also - especially since you've been arguing for using up more memory for the exact same workload.

Memory bandwidth - at least as far as latency is concerned - is used when you have a cache miss. Once your live set is much bigger than your L3 cache, you get cache misses even when you want to read it. If you have good temporal locality (few cache misses), it doesn't matter how big your live set is, but the same is if you have bad temporal locality (many cache misses).

> which, AIUI, is also what the talk is about

The talk focuses on tracing GCs, but it applies equally to manual memory management (as discussed in the Q&A; using less memory for the same algorithm requires CPU work regardless if it's manual or automatic)

> when that's a forced choice

I don't think tracing GCs are ever a forced choice. They keep getting chosen over and over for heavy workloads on machines with >= 1GB/core because they offer a more attractive tradeoff than other approaches for some of the most popular application domains. There's little reason for that to change unless the economics of DRAM/CPU change significantly.

  • Ygg2 19 hours ago

    > It doesn't go without saying that compaction involves a lot of memory traffic

    It definitely tracks with my experience. Did you see Chrome on AMD EPYC with 2TB of memory? It reached like 10% of Mem utility but over 46% of CPU around 6000 tabs. Mem usage climbed steeply at first but got overtaken by CPU usage.

    • pron 18 hours ago

      I have no idea what it's using its CPU on, whether it has anything to do with memory management, or what memory management algorithm is in use. Obviously, the GC doesn't need to do any compaction if the program isn't allocating, and the program can only allocate if it's actually doing some computation. Also, I don't know the ratio of live set to total heap. A tracing GC needs to do very little work if most of the heap is garbage (i.e. the ration of live set to the total memory is low), but any form of memory management - tracing or manual - needs to do a lot of work if the ratio is low. Remember, a tracing-moving GC doesn't spend any cycles on garbage; it spends cycles on live objects only. The more heap you give it (assuming the same allocation rate and live set), means more garbage and less CPU consumption (as GC cycles are less frequent).

      All you know is that CPU is exhausted before the RAM is, which, if anything, means that it may have been useful for Chrome to use more RAM (and reduce the liveset-to-heap ratio) to reduce CPU utilisation, assuming this CPU consumption has anything to do with memory management.

      There is no situation in which, given the same allocation rate and live set, adding more heap to a tracing GC makes it work more. That's why in the talk he says that a DIMM is a hardware accelerator for memory management if you have a tracing-moving collector: increase the heap and voila, less CPU is spent on memory management.

      That's why tracing-moving garbage collection is a great choice for any program that spends a significant amount of CPU on memory management, because then you can reduce that work by adding more RAM, which is cheaper than adding more CPU (assuming you're running on a machine that isn't RAM-constrained, like small embedded devices).