Comment by AndyKelley

Comment by AndyKelley a day ago

6 replies

It's unfortunate that the original authors of this interface didn't understand how important infallibility is to resource deallocation, and it's unfortunate that NFS authors didn't think carefully about this at all, but if you follow the advice of the text you pasted and read the section about how you can't retry close() after an error, it is clear that close is, in fact, a fundamentally infallible operation.

MrResearcher 14 hours ago

If the flush (syscall) fails, it's not possible to recover in user space, therefore the only sensible option is to abort() immediately. It's not even safe to perror("Mayday, mayday, flush() failed"), you must simply abort().

And, the moment you start flushing correctly: if(flush(...)) { abort(); }, it becomes infallible from the program's point of view, and can be safely invoked in destructors.

File closure operations, on the other hand, do have legitimate reasons to fail. In one of my previous adventures, we were asking the operator to put the archival tape back, and then re-issuing the close() syscall, with the driver checking that the tape is inserted and passing the control to the mechanical arm for further positioning of the tape, all of that in the drivers running in the kernel space. The program actually had to retry close() syscalls, and kept asking the operator to handle the tape (there were multiple scenarios for the operator how to proceed).

  • zozbot234 13 hours ago

    > In one of my previous adventures, we were asking the operator to put the archival tape back, and then re-issuing the close() syscall, with the driver checking that the tape is inserted and passing the control to the mechanical arm for further positioning of the tape, all of that in the drivers running in the kernel space.

    Why can't the OS itself do the prompting in this case, as part of processing the original close()? MS-DOG had its (A)bort/(R)etry/(I)gnore prompt for failing I/O operations, and AmigaOS could track media labels and ask the user to "insert $MEDIA_LABEL in drive".

    • MrResearcher 12 hours ago

      Because DOS relied on BIOS interrupt 10h to handle I/O:

        mov si, GREETINGS_STRING
        print_loop:
          lodsb                  ; Load next byte into AL, advance SI
          cmp al, 0              ; Check for null terminator
          je done
      
          mov ah, 0Eh            ; BIOS teletype output
          mov bh, 0              ; Page number = 0
          mov bl, 07h            ; Light gray on black in text mode
          int 10h                ; Print character in AL
      
          jmp print_loop
        done:
          ...
      
        GREETINGS_STRING db "Hello, BIOS world!", 0
      
      And linux doesn't rely on BIOS for output I/O, it provides TTY subsystem and then programs use devices like /dev/tty for I/O. Run $ lspci in your console: which of those devices should the kernel use for output? The kernel wouldn't know that and BIOS is no longer of any help.
      • zozbot234 12 hours ago

        > which of those devices should the kernel use for output?

        Whatever facility it uses for showing kernel panics, perhaps. Though one could also use IPC facilities such as dbus to issue a prompt in the session of whatever user is currently managing that media device.

  • jcalvinowens 12 hours ago

    If the tape drive failed close() in a way that did not deallocate the file descriptor, that was just straight up a bug.

    Retrying close() is dangerous, if the file descriptor was successfully deallocated, it might have already been re-allocated by another thread. I'd guess the program you're describing was single threaded though (it can still bite there though)

jcalvinowens 12 hours ago

Yeah, close() can't fail, but it can return an error. It's kind of odd.

How could one fix that though? It seems pretty unavoidable to me because write() is more or less asynchronous to actual disk I/O.

You could add finalize() which is distinct from close(), but IMHO that's even more confusing.