Comment by kstrauser

Comment by kstrauser 5 days ago

I’ve wondered about this before but never when around people who might know. From my outsider view, jemalloc looked like a strict improvement over glibc’s malloc, according to all the benchmarks I’d seen when the subject came up. So, why isn’t it the default allocator?

toast0 4 days ago

It is on FreeBSD. :P Change your malloc, change your life? May as well change your libc while you're there and use FreeBSD libc too, and that'll be easier if you also adopt the FreeBSD kernel.

I will say, the Facebook people were very excited to share jemalloc with us when they acquired my employer, but we were using FreeBSD so we already had it and thought it was normal. :)

Reply View 0 replies

favorited 5 days ago

Disclaimer: I'm not an allocator engineer, this is just an anecdote.

A while back, I had a conversation with an engineer who maintained an OS allocator, and their claim was that custom allocators tend to make one process's memory allocation faster at the expense of the rest of the system. System allocators are less able to make allocation fair holistically, because one process isn't following the same patterns as the rest.

Which is why you see it recommended so frequently with services, where there is generally one process that you want to get preferential treatment over everything else.

Reply View 20 replies

mort96 4 days ago

The only way I can see that this would be true is if a custom allocator is worse about unmapping unused memory than the system allocator. After all, processes aren't sharing one heap, it's not like fragmentation in one process's address space is visible outside of that process... The only aspects of one process's memory allocation that's visible to other processes is, "that process uses N pages worth of resident memory so there's less available for me". But one of the common criticisms against glibc is that it's often really bad at unmapping its pages, so I'd think that most custom allocators are nicer to the system?
It would be interested in hearing their thoughts directly, I'm also not an allocator engineer and someone who maintains an OS allocator probably knows wayyy more about this stuff than me. I'm sure there's some missing nuance or context or which would've made it make sense.

Reply View | 0 replies
jeffbee 5 days ago

I don't think that's really a position that can be defended. Both jemalloc and tcmalloc evolved and were refined in antagonistic multitenant environments without one overwhelming application. They are optimal for that exact thing.

Reply View | 18 replies
- lmm 4 days ago
  
  > Both jemalloc and tcmalloc evolved and were refined in antagonistic multitenant environments without one overwhelming application. They are optimal for that exact thing.
  They were mostly optimised on Facebook/Google server-side systems, which were likely one application per VM, no? (Unlike desktop usage where users want several applications to run cooperatively). Firefox is a different case but apparently mainline jemalloc never matched Firefox jemalloc, and even then it's entirely plausible that Firefox benefitted from a "selfish" allocator.
  
  Reply View | 2 replies
  
  jeffbee 4 days ago
  
  Google runs dozens to hundreds of unrelated workloads in lightweight containers on a single machine, in "borg". Facebook has a thing called "tupperware" with the same property.
  
  Reply View | 1 reply
  
  nixgeek 2 days ago
  
  I think Tupperware was rebranded to Twine sometime about 6-7 years ago.
  
  Reply View | 0 replies
- favorited 5 days ago
  
  It's possible that they were referring to something specific about their platform and its system allocator, but like I said it was an anecdote about one engineer's statement. I just remember thinking it sounded fair at the time.
  
  Reply View | 14 replies
  
  vlovich123 5 days ago
  
  The “system” allocator is managing memory within a process boundary. The kernel is responsible for managing it across processes. Claiming that a user space allocator is greedily inefficient is voodoo reasoning that suggests the person making the claim has a poor grasp of architecture.
  
  Reply View | 13 replies

jeffbee 5 days ago

These allocators often have higher startup cost. They are designed for high performance in the steady state, but they can be worse in workloads that start a million short-lived processes in the unix style.

Reply View 1 reply

kstrauser 5 days ago

Oh, interesting. If that's the case, I can see why that'd be a bummer for short-lived command line tools. "Makes ls run 10x slower" would not be well received. OTOH, FreeBSD uses it by default, and it's not known for being a sluggish OS.

Reply View | 0 replies

o11c 5 days ago

For a long time, one of the major problems with alternate allocators is that they would never return free memory back to the OS, just keep the dirty pages in the process. This did eventually change, but it remains a strong indicator of different priorities.

There's also the fact that ... a lot of processes only ever have a single thread, or at most have a few background threads that do very little of interest. So all these "multi-threading-first allocators" aren't actually buying anything of value, and they do have a lot of overhead.

Semi-related: one thing that most people never think about: it is exactly the same amount of work for the kernel to zero a page of memory (in preparation for a future mmap) as for a userland process to zero it out (for its own internal reuse)

Reply View 8 replies

senderista 4 days ago

> Semi-related: one thing that most people never think about: it is exactly the same amount of work for the kernel to zero a page of memory (in preparation for a future mmap) as for a userland process to zero it out (for its own internal reuse)
Possibly more work since the kernel can't use SIMD

Reply View | 6 replies
- LtdJorge 4 days ago
  
  Why is that? Doesn't Linux use SIMD for the crypto operations?
  
  Reply View | 5 replies
  
  dwattttt 4 days ago
  
  Allowing SIMD instructions to be used arbitrarily in kernel actually has a fair penalty to it. I'm not sure what Linux does specifically, but:
  When a syscall is made, the kernel has to backup the user mode state of the thread, so it can restore it later.
  If any kernel code could use SIMD registers, you'll have to backup and restore that too, and those registers get big. You could easily be looking at adding a 1kb copy to every syscall, and most of the time it wouldn't be needed.
  
  Reply View | 3 replies
  
  durrrrrrrrrrrrr 4 days ago
  
  It's not so much that you can't ever use it, it's more a you really shouldn't. It's more expensive, harder to use and rarely worth it. Main users currently are crypto and raid checksumming.
  https://www.kernel.org/doc/html/next/core-api/floating-point...
  
  Reply View | 0 replies
vlovich123 5 days ago

That’s actually particular try to alternate allocators and not true for glibc if I recall correctly (it’s much worse at returning memory).

Reply View | 0 replies

sanxiyn 5 days ago

As far as I know there is no technical reason why jemalloc shouldn't be the default allocator. In fact, as pointed out in the article, it IS the default allocator on FreeBSD. My understanding is it is largely political.

Reply View 10 replies

kstrauser 5 days ago

Now that I think about it, I could easily imagine it being left out of glibc because it doesn't build on Hurd or something.

Reply View | 9 replies
- lloeki 4 days ago
  
  > I could easily imagine it being left out of glibc because [...]
  ... its license is BSD-2-Clause ;)
  hence "political"
  
  Reply View | 8 replies
  
  vkazanov 4 days ago
  
  Huh? Bsd-style licenses are fully compatible with gpl.
  The problem is exactly this: Facebook becomes the upstream of a key part of your system.
  And Facebook can just walk away from the project. Like it did just now.
  
  Reply View | 7 replies

b0a04gl 4 days ago

jemalloc’s been battle tested in prod at scale, its license is permissive, and performance wins are known. so what exactly are we protecting by clinging to glibc malloc? ideological purity? legacy inertia? who’s actually benefiting from this status quo, and why do we still pretend it’s about “compatibility”?

Reply View 0 replies