Comment by favorited

Comment by favorited 5 days ago

20 replies

Disclaimer: I'm not an allocator engineer, this is just an anecdote.

A while back, I had a conversation with an engineer who maintained an OS allocator, and their claim was that custom allocators tend to make one process's memory allocation faster at the expense of the rest of the system. System allocators are less able to make allocation fair holistically, because one process isn't following the same patterns as the rest.

Which is why you see it recommended so frequently with services, where there is generally one process that you want to get preferential treatment over everything else.

mort96 4 days ago

The only way I can see that this would be true is if a custom allocator is worse about unmapping unused memory than the system allocator. After all, processes aren't sharing one heap, it's not like fragmentation in one process's address space is visible outside of that process... The only aspects of one process's memory allocation that's visible to other processes is, "that process uses N pages worth of resident memory so there's less available for me". But one of the common criticisms against glibc is that it's often really bad at unmapping its pages, so I'd think that most custom allocators are nicer to the system?

It would be interested in hearing their thoughts directly, I'm also not an allocator engineer and someone who maintains an OS allocator probably knows wayyy more about this stuff than me. I'm sure there's some missing nuance or context or which would've made it make sense.

jeffbee 5 days ago

I don't think that's really a position that can be defended. Both jemalloc and tcmalloc evolved and were refined in antagonistic multitenant environments without one overwhelming application. They are optimal for that exact thing.

  • lmm 4 days ago

    > Both jemalloc and tcmalloc evolved and were refined in antagonistic multitenant environments without one overwhelming application. They are optimal for that exact thing.

    They were mostly optimised on Facebook/Google server-side systems, which were likely one application per VM, no? (Unlike desktop usage where users want several applications to run cooperatively). Firefox is a different case but apparently mainline jemalloc never matched Firefox jemalloc, and even then it's entirely plausible that Firefox benefitted from a "selfish" allocator.

    • jeffbee 4 days ago

      Google runs dozens to hundreds of unrelated workloads in lightweight containers on a single machine, in "borg". Facebook has a thing called "tupperware" with the same property.

      • nixgeek 2 days ago

        I think Tupperware was rebranded to Twine sometime about 6-7 years ago.

  • favorited 5 days ago

    It's possible that they were referring to something specific about their platform and its system allocator, but like I said it was an anecdote about one engineer's statement. I just remember thinking it sounded fair at the time.

    • vlovich123 4 days ago

      The “system” allocator is managing memory within a process boundary. The kernel is responsible for managing it across processes. Claiming that a user space allocator is greedily inefficient is voodoo reasoning that suggests the person making the claim has a poor grasp of architecture.

      • jeffbee 4 days ago

        There are shared resources involved though, for example one process can cause a lot of traffic in khugepaged. However I would point out that is an endemic risk of Linux's overall architecture. Any process can cause chaos by dirtying pages, or otherwise triggering reclaim.

        • vlovich123 3 days ago

          That’s generally true of any allocator and assuming glibc’s behavior would help mitigate this is critically not something kernel engineers design around nor something glibc allocator is trying to achieve as a design goal.

      • favorited 4 days ago

        For context, the "allocator engineer" I was talking to was a kernel engineer - they have an extremely solid grasp of their platform's architecture.

        The whole advantage of being the platform's system allocator is that you can have a tighter relationship between the library function and the kernel implementation.

        • vlovich123 3 days ago

          I’m not generally aware of any system allocator that’s written hand in glove with the kernel’s allocator or somehow interops better for overall system efficiency at the cost of behavior in-app. Care to provide an example?

      • jdsully 4 days ago

        The "greedy" part is likely not releasing pages back to the OS in a timely manner.