Comment by tialaramex

Comment by tialaramex 16 hours ago

> a tracing GC is acceptable much more frequently than you may think

Two things here, the easy one first: The RAM trade off is excellent for normal sizes but if you scale enormously the trade off eventually reverses. $100 per month for the outfit's cloud servers to have more RAM is a bargain compared to the eye-watering price of a Rust developer. But $1M per month when you own whole data centres in multiple countries makes hiring a Rust dev look attractive after all.

As I understand it this one is why Microsoft are rewriting Office backend stuff in Rust after writing it originally in C#. They need a safe language, so C++ was never an option, but at their scale higher resource costs add up quickly so a rewrite to a safe non-GC language, though not free, makes economic sense.

Second thing though: Unpredictability. GC means you can't be sure when reclamation happens. So that means both latency spikes (nasty in some applications) and weird bugs around delayed reclamation, things running out though you aren't using too many at once, etc. If your only resource is RAM you can just buy more, but otherwise you either have to compromise the language (e.g. a defer type mechanism, Python's "with" construct) or just suck it up. Java tried to fix this and gave up.

I agree that often you can pick GC. Personally I don't much like GC but it is effective and I can't disagree there. However you might have overestimated just how often, or missed more edge cases where it isn't a good choice.

pron 15 hours ago

> The RAM trade off is excellent for normal sizes but if you scale enormously the trade off eventually reverses

I don't think you've watched the talk. The minimal RAM-per-core is quite high, and often sits there unused even though it could be used to reduce the usage of the more expensive CPU. You pay for RAM that you could use to reduce CPU utilisation and then don't use it. What you want to aim for is a RAM/CPU usage that matches the RAM/CPU ratio on the machine, as that's what you pay for. Doubling the CPU often doubles your cost, but doubling RAM costs much less than that (5-15%).

If two implementations of an algorithm use different amounts of memory (assuming they're reasonable implementations), then the one using less memory has to use more CPU (e.g. it could be compressing the memory or freeing and reusing it more frequently). Using more CPU to save on memory that you've already paid for is just wasteful.

Another way to think about it is consider the extreme case (although it works for any interim value) where a program, say a short-running one, uses 100% of the CPU. While that program runs, no other program can use the machine, anyway, so if you don't use up to 100% of the machine's RAM to reduce the program's duration, then you're wasting it.

As the talk says, it's hard to find less than 1GB per core, so if a program uses computational resources that correspond to a full core yet uses less than 1GB, it's wasteful in the sense that it's spending more of a more expensive resource to save on a less expensive one. The same applies if it uses 50% of a core and less than 500MB of RAM.

Of course, if you're looking at kernels or drivers or VMs or some sorts of agents - things that are effectively pure overhead (rather than direct business value) - then their economics could be different.

> Second thing though: Unpredictability. GC means you can't be sure when reclamation happens.

What you say may have been true with older generations of GCs (or even something like Go's GC, which is basically Java's old CMS, recently removed after two newer GC generations). OpenJDK's current GCs, like ZGC, do zero work in stop-the-world pauses. Their work is more evenly spread out and predictable, and even their latency is more predictable than what you'd get with something like Rust's reference-counting GC. C#'s GC isn't that stellar either, but most important server-side software uses Java, anyway.

The one area where manual memory management still beats the efficiency of a modern tracing GC (although maybe not for long) is when there's a very regular memory usage pattern through the use of arenas, which is another reason why I find Zig particularly interesting - it's most powerful where modern GCs are weakest.

By design or happy accident, Zig is very focused on where the problems are: the biggest security issue for low-level languages is out-of-bounds access, and Zig focuses on that; the biggest shortcoming of modern tracing GCs is arena-like memory usage, and Zig focuses on that. When it comes to the importance of UAF, compilation times, and language complexity, I think the jury is still out, and Rust and Zig obviously make very different tradeoffs here. Zig's bottom-line impact, like that of Rust, may still be too low for widespread adoption, but at least I find it more interesting.

> As I understand it this one is why Microsoft are rewriting Office backend stuff in Rust after writing it originally in C#

The rate at which MS is doing that is nowhere near where it would be if there were some significant economic value. You can compare that to the rate of adoption of other programming languages or even techniques like unit testing or code review. With any new product, you can expect some noise and experimentation, but the adoption of products that offer a big economic value is usually very, very fast, even in programming.

Reply View 6 replies

zozbot234 15 hours ago

> What you want to aim for is a RAM/CPU usage that matches the RAM/CPU ratio on the machine, as that's what you pay for.
This totally ignores the role of memory bandwidth, which is often the key bottleneck on multicore workloads. It turns out that using more RAM costs you more CPU, too, because the CPU time is being wasted waiting for DRAM transfers. Manual memory management (augmented with optional reference counting and "borrowed" references - not the pervasive refcounting of Swift, which performs less well than modern tracing GC) still wins unless you're dealing with the messy kind of workload where your reference graphs are totally unpredictable and spaghetti-like. That's the kind of problem that GC was really meant for. It's no coincidence that tracing GC was originally developed in combination with LISP, the language of graph-intensive GOFAI.

Reply View | 5 replies
- pron 14 hours ago
  
  > It turns out that using more RAM costs you more CPU
  Yes, memory bandwidth adds another layer of complication, but it doesn't matter so much once your live set is much larger than your L3 cache. I.e. a 200MB live set and a 100GB live set are likely to require the same bandwidth. Add to that the fact that tracing GCs' compaction can also help (with prefetching) and the situation isn't so clear.
  > That's the kind of problem that GC was really meant for.
  Given the huge strides in tracing GCs over the past ten and even five years, and their incredible performance today, I don't think it matters what those of 40+ years ago were meant for, but I agree there are still some workloads - not anything that isn't spaghetti-like, but specifically arenas - that are more efficient than tracing GCs (young-gen works a little like an arena but not quite), which is why GCs are now turning their attention to that kind of workload, too. The point remains that it's very useful to have a memory management approach that can turn the RAM you've already paid for to reduce CPU consumption.
  Indeed, we're not seeing any kind of abandonment of tracing GC at a rate that is even close to suggesting some significant economic value in abandoning them (outside of very RAM-constrained hardware, at least).
  
  Reply View | 4 replies
  
  zozbot234 14 hours ago
  
  > The point remains that it's very useful to have a memory management approach that can turn the RAM you've already paid for to reduce CPU consumption.
  That approach is specifically arenas: if you can put useful bounds on the maximum size of your "dead" data, it can pay to allocate everything in an arena and free it all in one go. This saves you the memory traffic of both manual management and tracing GC. But coming up with such bounds involves manual choices, of course.
  It goes without saying that memory compaction involves a whole lot of extra traffic on the memory subsystem, so it's unlikely to help when memory bandwidth is the key bottleneck. Your claim that a 200MB working set is probably the same as a 100GB working set (or, for that matter, a 500MB or 1GB working set, which is more in the ballpark of real-world comparisons) when it comes to how it's impacted by the memory bottleneck is one that I have some trouble understanding also - especially since you've been arguing for using up more memory for the exact same workload.
  Your broader claim wrt. memory makes a whole lot of sense in the context of how to tune an existing tracing GC when that's a forced choice anyway (which, AIUI, is also what the talk is about!) but it just doesn't seem all that relevant to the merits of tracing GC vs. manual memory management.
  > we're not seeing any kind of abandonment of tracing GC at a rate that is even close to suggesting some significant economic value in abandoning them
  We're certainly seeing a lot of "economic value" being put on modern concurrent GC's that can at least perform tolerably well even without a lot of memory headroom. That's how the Golang GC works, after all.
  
  Reply View | 3 replies