MrResearcher a day ago

Why is he wrong?

Here's an excerpt from the close(2) syscall description:

RETURN VALUE close() returns zero on success. On error, -1 is returned, and errno is set to indicate the error.

ERRORS EBADF fd isn't a valid open file descriptor.

       EINTR  The close() call was interrupted by a signal; see signal(7).

       EIO    An I/O error occurred.

       ENOSPC
       EDQUOT On NFS, these errors are not normally reported against the first write which exceeds the available storage space, but instead against a subsequent
              write(2), fsync(2), or close().

       See NOTES for a discussion of why close() should not be retried after an error.
It obviously can fail due to a multitude of reasons.
  • AndyKelley a day ago

    It's unfortunate that the original authors of this interface didn't understand how important infallibility is to resource deallocation, and it's unfortunate that NFS authors didn't think carefully about this at all, but if you follow the advice of the text you pasted and read the section about how you can't retry close() after an error, it is clear that close is, in fact, a fundamentally infallible operation.

    • MrResearcher 14 hours ago

      If the flush (syscall) fails, it's not possible to recover in user space, therefore the only sensible option is to abort() immediately. It's not even safe to perror("Mayday, mayday, flush() failed"), you must simply abort().

      And, the moment you start flushing correctly: if(flush(...)) { abort(); }, it becomes infallible from the program's point of view, and can be safely invoked in destructors.

      File closure operations, on the other hand, do have legitimate reasons to fail. In one of my previous adventures, we were asking the operator to put the archival tape back, and then re-issuing the close() syscall, with the driver checking that the tape is inserted and passing the control to the mechanical arm for further positioning of the tape, all of that in the drivers running in the kernel space. The program actually had to retry close() syscalls, and kept asking the operator to handle the tape (there were multiple scenarios for the operator how to proceed).

      • zozbot234 12 hours ago

        > In one of my previous adventures, we were asking the operator to put the archival tape back, and then re-issuing the close() syscall, with the driver checking that the tape is inserted and passing the control to the mechanical arm for further positioning of the tape, all of that in the drivers running in the kernel space.

        Why can't the OS itself do the prompting in this case, as part of processing the original close()? MS-DOG had its (A)bort/(R)etry/(I)gnore prompt for failing I/O operations, and AmigaOS could track media labels and ask the user to "insert $MEDIA_LABEL in drive".

      • jcalvinowens 11 hours ago

        If the tape drive failed close() in a way that did not deallocate the file descriptor, that was just straight up a bug.

        Retrying close() is dangerous, if the file descriptor was successfully deallocated, it might have already been re-allocated by another thread. I'd guess the program you're describing was single threaded though (it can still bite there though)

    • jcalvinowens 11 hours ago

      Yeah, close() can't fail, but it can return an error. It's kind of odd.

      How could one fix that though? It seems pretty unavoidable to me because write() is more or less asynchronous to actual disk I/O.

      You could add finalize() which is distinct from close(), but IMHO that's even more confusing.