Comment by nh2

Comment by nh2 10 months ago

10 replies

Why not a simple solution:

1. Programs should call close() on stdout and report errors.

2. It's the job of whomever creates the open file description to fsync() it afterwards if desired.

3. If somebody runs a file system or hardware that ignores fsync() or hides close() errors, it's their own fault.

If you `hello > out.txt`, then it's not `hello` that creates and opens `out.txt`; the calling shell does it. So if you use `>` redirection, you should fsync in the calling shell.

Is there a drawback to this approach?

> LLVM tools were made to close stdout [..] and it caused so many problems that this code was eventually reverted

It would be good to know what those problems were.

bluetomcat 10 months ago

> Programs should call close() on stdout and report errors.

Programs have never called open() to obtain stdin, stdout and stderr. They are inherited from the shell. What would be a meaningful way to report errors if the basic output streams are unreliable? If close(stdout) fails, we would need to write to stderr. Then you will have exactly the same error handling issue with closing stderr.

It's a flaw in the design of Unix where polymorphic behaviour is achieved through file descriptors. Worse is better...

  • marcosdumay 10 months ago

    > It's a flaw in the design of Unix where polymorphic behaviour is achieved through file descriptors. Worse is better...

    Looks to me it's a flaw on the signature of `write`. There should be a way to recover the status without changing the descriptor status, and there should be a way to ensure you get the final status, blocking if necessary.

    This can even be fixed in a backwards compatible way, by creating a new pair of functions.

112233 10 months ago

> Is there a drawback to this approach?

You mean, apart from no existing code working like that? It is not possible for process that creates descriptor to fsync it, because in many very important cases that descriptor outlives the process.

What do you propose should "exec cat a.txt > b.txt" shell command do?

  • nh2 10 months ago

    > no existing code working like that

    That doesn't really matter for discussing how correct code _should_ be written.

    Also, a good amount of existing code works like that. For example, if you `with open(..) as f:` a file in Python and pass it as an FD to a `subprocess` call, you can fsync and close it fine afterwards, and Python code bases that care about durability and correct error reporting do that.

    > What do you propose should "exec cat a.txt > b.txt" shell command do?

    That code would be wrong according to my proposed approach of who should be responsible for what (which is what the blog post discusses).

    If you create the `b.txt` FD and you want it fsync'ed, then you can't `exec`.

    It's equivalent to "if you call malloc(), you should call free()" -- you shouldn't demand that functions you invoke will call free() on your pointer. Same for open files.

    • 112233 10 months ago

      > you can fsync and close it fine afterwards

      No you cannot. Once you pass descriptor to another process, that process can pass it to yet another process, fork and detach, send it via SCM_RIGHTS, give "/proc/PID/fd/N" path to something etc.

      Never assume descriptor cleanup will happen, unless you have complete control over everything.

      • nh2 10 months ago

        I don't understand your point.

        If you construct scenarios where the subprocess you call daemonizes and outlives your program, then there of course there isn't any convention your code should follow because your code isn't in charge -- it could follow whatever logic and it wouldn't matter. So then there's no possibly correct solution /anyway/.

        The question of the original post is "What convention should programmers use for fsyncing files in standard scenarios?", for example, "Should cat fsync?". As the post says: "Who should be responsible?"

        I'm suggesting an answer to that.

        I don't understand the point of "but what if `cat` double-forks". It doesn't, and surely, if you're calling a program that daemonizes, you know that it does, and that the rules about who needs to fsync file descriptors will necessary change then.

    • duped 10 months ago

      > That doesn't really matter for discussing how correct code _should_ be written.

      It absolutely does when you're talking about the semantics of virtually every program on earth

      > It's equivalent to "if you call malloc(), you should call free()" -- you shouldn't demand that functions you invoke will call free() on your pointer. Same for open files.

      There are many cases where the one calling malloc cannot be the one calling free and must explicitly document to callers/callees who is responsible for memory deallocation. This is a good example of where no convention exists and it's contextual.

      But open files aren't memory and one cannot rely on file descriptors being closed without errors in practice, so people don't, and you can't just repave decades of infrastructure for no benefit out of ideological purity.

      • nh2 10 months ago

        > There are many cases where the one calling malloc cannot be the one calling free and must explicitly document

        That's fine. Special cases, documented deviation from the default convention.

        > one cannot rely on file descriptors being closed without errors in practice, so people don't

        You mean "so people don't call close(), and the error gets swallowed" (like the article points out for `cat`)? How's that good? Why is improving that "no benefit"?

        > you can't just repave decades of infrastructure

        Of course you can.

        There are also lots of projects that were written without checking the return value of malloc() and then crashing. People make PRs and those get fixed.

        Similarly people can come to the conclusion that LLVM and cat should call close() to not swallow errors, and then it will be done.

kreetx 10 months ago

I fully agree.

The blog post is essentially a long winded way of saying that there isn't a compatible way to safely call `close` given all programs ever written. Yet, I think we already knew that.

wruza 10 months ago

It would be good to know what those problems were.

Idk which problems LLVM had, but closing stdout(stderr) long before exiting may make next open() to return 1(2) and voila some stray printf() now writes right into your database.

If you have to close std*, at least dup2() null device into it, that was a common advice.