Comment by bluetomcat

Comment by bluetomcat 2 days ago

35 replies

This isn't proper usage of realloc:

    lines = realloc(lines, (num_lines + 1) * sizeof(char *));
In case it cannot service the reallocation and returns NULL, it will overwrite "lines" with NULL, but the memory that "lines" referred to is still there and needs to be either freed or used.

The proper way to call it would be:

    tmp = realloc(lines, (num_lines + 1) * sizeof(char *));

    if (tmp == NULL) {
        free(lines);
        lines = NULL;
        // ... possibly exit the program (without a memory leak)
    } else {
        lines = tmp;
    }
returningfory2 2 days ago

I feel like this comment is misleading because it gives the impression that the code in the article is wrong or unsafe, whereas I think it's actually fine? In the article, in the case when `tmp == NULL` (in your notation) the author aborts the program. This means there's no memory leak or unsafety. I agree that one can do better of course.

  • dataflow 2 days ago

    You're confusing the code with the program it compiles to. The program is fine, okay. But the code is only "fine" or "safe" if you view it as the final snapshot of whatever it's going to be. If you understand that the code also influences how it's going to evolve in the future (and which code doesn't?) then no, it's not fine or safe. It's brittle and making future changes more dangerous.

    Really, there's no excuse whatsoever for not having a separate function that takes the pointer by reference & performs the reallocation and potential termination inside itself, and using that instead of calling realloc directly.

    • returningfory2 a day ago

      This is an article introducing people to memory management, targeted at beginners. The code snippets are there to illustrate the ideas. The author made the correct pedagogical decision to prioritize readability over optimal handling of an OOM edge case that would be confusing to introduce to beginner readers at this early stage.

      Talking about "making future changes" seems to be missing the point of what the author is doing. They're not committing code to the Linux kernel. They're writing a beginner's article about memory management.

      • dataflow a day ago

        > This is an article introducing people to memory management, targeted at beginners

        I realize, and that's what makes it even worse. First impressions have a heck of a stronger effect than 10th impressions. Beginners need to learn the right way in the beginning, not the wrong way.

        Whenever did "safety first" stop being a thing? This is like like skipping any mention of goggles when teaching chemistry or woodworking for "pedagogical reasons". You're supposed to first you teach your students the the best way to do things, then you can teach them how to play fast and loose if it's warranted. Not the other way around!

tptacek 2 days ago

I was looking for a place to hang this comment and here's as good as any: the right way to handle this problem in most C code is to rig malloc, realloc, and strdup up to explode when they'd return NULL. Proper error handling of a true out-of-memory condition is pretty treacherous, so most of the manual error handling stuff you see on things like realloc and malloc are really just performative. In an application setting like this --- not, like, the world's most popular TLS library or something --- aborting automatically on an allocation failure is totally reasonable.

Since that's essentially what EKR is doing here (albeit manually), I don't think this observation about losing the original `lines` pointer is all that meaningful.

  • dundarious a day ago

    After using this malloc-auto-abort() style for many many years, I've come to believe that if only for the better error handling properties, manual memory management should primarily be done via explicit up front arena allocation using OS API's like mmap/VirtualAlloc, then a bump allocator within the arena.

    It helps in the vast amount of cases where sensible memory bounds are known or can be inferred, and it means that all system memory allocation errors* can be dealt with up front with proper error handling (including perhaps running in a more restrictive mode with less memory), and then all application memory allocation errors (running out of space in the arena) can be auto-abort() as before (and be treated as bugs). The other huge benefit is that there is no free() logic for incremental allocations within the arena, you just munmap/VirtualFree the arena in its entirety when done.

    Of course, there are cases where there are no sensible memory bounds (in space or perhaps in time) and where this method is not appropriate without significant modification.

    *modulo Linux's overcommit... which is a huge caveat

    • tptacek a day ago

      I feel like the prospect of using arenas and pools is further evidence that malloc and realloc should abort on failure, because you're right: if you're using an arena, you've not only taken application-layer control over allocation, but you've also implicitly segregated out a range of allocations for which you presumably have a strategy for exhaustion. The problem with malloc is that it's effectively the system allocator, which means the whole runtime is compromised when it fails. Yes: if you want to manually manage allocation failures, do it by using a pool or arena allocator on top of malloc.

      • dundarious 19 hours ago

        Yes, fundamentally my point is that it's pretty much always useful to separate OS allocation from application-level "allocation" (more like consumption of allocated memory than true allocation), and, that application-level "allocation" should always auto-abort() or at least provide a trivially easy way to auto-abort().

        So I agree, given malloc and friends are a combination of OS and application-level allocators, they should auto-abort(). I don't focus on malloc and friends though, because I'm not a fan of using the Rube Goldberg machine of "general purpose" allocators in most non-trivial situations. They're complicated hierarchies of size-based pools, and free lists, and locks, and on and on.

ekr____ 2 days ago

Author here.

Thanks for the flag. As you have probably noticed, I just abort the program a few lines below on realloc failure, so this doesn't leak so much as crash. However, this is a nice example of how fiddly C memory management is.

  • witrak a day ago

    Taking into account how thoroughly you explain all the intricate details of memory handling it's strange that in the example you haven't clearly commented on the fact of oversimplification of handling unsuccessful allocation (leading to the potentially risky situation).

    To say that "this is a nice example of how fiddly C memory management is" in the discussion is a bit too little - perhaps intended readers of the article would prefer an explicit warning there, just to be aware that they shouldn't forget to abort the program as you do.

lionkor 2 days ago

Very odd that an article trying to teach memory management would miss this, this should be common knowledge to anyone who used realloc, just like checking the return of any allocation call.

  • bluetomcat 2 days ago

    They treat an OOM situation as exceptional and immediately call abort() in case any allocation function returns NULL. The specification of these functions allows you to handle OOM situations gracefully.

    • josephg 2 days ago

      > The specification of these functions allows you to handle OOM situations gracefully.

      In theory, sure. But vanishingly little software actually deals with OOM gracefully. What do you do? Almost any interaction with the user may result in more memory allocations in turn - which presumably may also fail. It’s hard to even test OOM on modern systems because of OS disk page caching.

      Honestly, panicking on OOM is a totally reasonable default for most modern application software. In languages like rust, this behaviour is baked in.

      • tptacek 2 days ago

        I agree. The fact that Rust and Go will panic by default in this situation is pretty close to dispositive on what the right thing to do in (most) C code is.

  • PhilipRoman 2 days ago

    >checking the return of any allocation call

    I would say this is pointless on many modern systems unless you also disable overcommit, since otherwise any memory access can result in a crash, which is impossible to check for explicitly.

    • lionkor a day ago

      Most code correctness is pointless until it isn't, yes

    • kevin_thibedeau 2 days ago

      abort() isn't an option on all modern systems.

      • josephg 2 days ago

        It’s an option on most systems.

        Maybe not in embedded work - but in that case you might want to preallocate memory anyway.

o11c 2 days ago

There's another bug, related to performance - this involves a quadratic amount of memory copying unless your environment can arrange for zero-copy.

  • ekr____ 2 days ago

    Author here. Quite so. See footnote 3:https://educatedguesswork.org/posts/memory-management-1/#fn3

    "If you know you're going to be doing a lot of reallocation like this, many people will themselves overallocate, for instance by doubling the size of the buffer every time they are asked for more space than is available, thus reducing the number of times they need to actually reallocate. I've avoided this kind of trickery to keep this example simple."

  • Karellen 2 days ago

    Surely that's only the case if realloc() actually resizes and copies on every call? Which it normally doesn't?

    I thought that most implementations of realloc() would often "round up" internally to a larger size allocation, maybe power-of-two, maybe page size, or something? So if you ask for 20 bytes, the internal bookkeeping sets aside 32, or 4096, or whatever. And then if you realloc to 24 bytes, realloc will just note that the new allocation fits in the amount its reserved for you and return the same buffer again with no copying?

    • o11c 2 days ago

      Some implementations might round up to encourage reuse:

      * memory-checking allocators never do.

      * purely-size-based allocators always do.

      * extent-based allocators try to, but this easily fails if you're doing two interleaving allocations.

      * the mmap fallback does only if allowing the kernel to choose addresses rather than keeping virtual addresses together, unless you happen to be on a kernel that allows not leaving a hole

      Given that there's approximately zero overhead to do it right, just do it right (you don't need to store capacity, just compute it deterministically from the size).

tliltocatl 2 days ago

If you exit the program (as in the process) you don't need to free anything.

astrobe_ 2 days ago

The program abort()s if the reallocation fails. But indeed, for an educational example, it's not good to be too smart.

I believe the test if(!num_lines) is unnecessary, because reallocating a NULL pointer is equivalent to malloc(). This is also a bit "smart", but I think it is also more correct because you don't use the value of one variable (num_lines is 0) to infer the value of another (lines is NULL).

To go further, an opened-ended structure like:

    struct
    {
      unsigned count;
      char* lines[];
    };
... could also be preferable in practice. But actually writing good C is not the topic of TFA.
  • atiedebee a day ago

    > I believe the test if(!num_lines) is unnecessary, because reallocating a NULL pointer is equivalent to malloc().

    I thought that this behaviour was deprecated in C23, but according to cop reference it is still there[0].

    An I thinking of realloc with 0 size or was this actually a thing that was discussed?

    [0] https://en.cppreference.com/w/c/memory/realloc

    • pjmlp a day ago

      Section 7.24.3.7 The realloc function

      https://open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf

      > If ptr is a null pointer, the realloc function behaves like the malloc function for the specified size. Otherwise, if ptr does not match a pointer earlier returned by a memory management function, or if the space has been deallocated by a call to the free or realloc function, or if the size is zero, the behavior is undefined. If memory for the new object is not allocated, the old object is not deallocated and its value is unchanged.

pjmlp 2 days ago

It is even flagged as such on Visual Studio analyser.

aa-jv 2 days ago

Actually, no. You've just committed one of the cardinal sins of the *alloc()'s, which is: NULL is an acceptable return, so errno != 0 is the only way to tell if things have gone awry.

The proper use of realloc is to check errno always ... because in fact it can return NULL in a case which is not considered an error: lines is not NULL but requested size is zero. This is not considered an error case.

So, in your fix, please replace all checking of tmp == NULL, instead with checking errno != 0. Only then will you have actually fixed the OP's unsafe, incorrect code.

  • spiffyk 2 days ago

    From `malloc(3)`:

       Nonportable behavior
           The  behavior of these functions when the requested size is zero is glibc specific; other implementations may return NULL without setting errno, and portable POSIX programs should tolerate such behavior.  See realloc(3p).
    
           POSIX requires memory allocators to set errno upon failure.  However, the C standard does not require this, and applications portable to non-POSIX platforms should not assume this.
    • anymouse123456 2 days ago

      As someone writing C for POSIX and embedded environments, this clarification is a super helpful.

  • cozzyd 2 days ago

    In this case if (num_lines+1)(sizeof (char)) is zero that is certainly unintended