Comment by newpavlov

Comment by newpavlov a day ago

21 replies

No, the fundamental problem (in the context of io-uring) is that futures are managed by user code and can be dropped at any time. This often referred as "cancellation safety". Imagine a future has initialized completion-based IO with buffer which is part of the future state. User code can simply drop the future (e.g. if it was part of `select!`) and now we have a huge problem on our hands: the kernel will write into a dropped buffer! In the synchronous context it's equivalent to de-allocating thread stack under foot of the thread which is blocked on a synchronous syscall. You obviously can do it (using safe code) in thread-based code, but it's fine to do in async.

This is why you have to use various hacks when using io-uring based executors with Rust async (like using polling mode or ring-owned buffers and additional data copies). It could be "resolved" on the language level with an additional pile of hacks which would implement async Drop, but, in my opinion, it would only further hurt consistency of the language.

>He even calls out how naïve completion (callbacks) leads to more allocation on future composition and points to where green threads were abandoned.

I already addressed it in the other comment.

vlovich123 a day ago

I really don’t understand this argument. If you force the user to transfer ownership of the buffer into the I/O subsystem, the system can make sure to transfer ownership of the buffer into the async runtime, not leaving it held within the cancellable future and the future returns that buffer which is given back when the completion is received from the kernel. What am I missing?

  • Inufu a day ago

    Requiring ownership transfer gives up on one of the main selling points of Rust, being able to verify reference lifetime and safety at compile time. If we have to give up on references then a lot of Rusts complexity no longer buys us anything.

    • vlovich123 a day ago

      I'm not sure what you're trying to say, but the compile-time safety requirement isn't given up. It would look something like:

          self.buffer = io_read(self.buffer)?
      
      This isn't much different than

          io_read(&mut self.buffer)?
      
      since rust doesn't permit simultaneous access when a mutable reference is taken.
      • Inufu 16 hours ago

        It means you can for example no longer do things like get multiple disjoint references into the same buffer for parallel reads/writes of independent chunks.

        Or well you can, using unsafe, Arc and Mutex - but at that point the safety guarantees aren’t much better than what I get in well designed C++.

        Don’t get me wrong, I still much prefer Rust, but I wish async and references worked together better.

        Source: I recently wrote a high-throughput RPC library in Rust (saturating > 100 Gbit NICs)

  • newpavlov a day ago

    The goal of the async system is to allow users to write synchronous looking code which is executed asynchronously with all associated benefits. "Forcing" users to do stuff like this shows the clear failure to achieve this goal. Additionally, passing ownership like this (instead of passing mutable borrow) arguably goes against the zero-cost principle.

    • vlovich123 a day ago

      I don’t follow the zero copy argument. You pass in an owned buffer and get an owned buffer back out. There’s no copying happening here. It’s your claim that async is supposed to look like synchronous code but I don’t buy it. I don’t see why that’s a goal. Synchronous is an anachronistic software paradigm for a computer hardware architecture that never really existed (electronics are concurrent and asynchronous by nature) and cause a lot of performance problems trying to make it work that way.

      Indeed, one thing I’ve always wondered is if you can submit a read request for a page aligned buffer and have the kernel arrange for data to be written directly into that without any additional copies. That’s probably not possible since there’s routing happening in the kernel and it accumulates everything into sk_buffs.

      But maybe it could arrange for the framing part of the packet and the data to be decoupled so that it can just give you a mapping into the data region (maybe instead of you even providing a buffer, it gives you back an address mapped into your space). Not sure if that TLB update might be more expensive than a single copy.

      • newpavlov a day ago

        You have an inevitable overhead of managing the owned buffer when compared against simply passing mutable borrow to an already existing buffer. Imagine if `io::Read` APIs were constructed as `fn read(&mut self, buf: Vec<u8>) -> io::Resul<Vec<u8>>`.

        Parity with synchronous programming is an explicit goal of Rust async declared many times (e.g. see here https://github.com/rust-lang/rust-project-goals/issues/105). I agree with your rant about the illusion of synchronicity, but it does not matter. The synchronous abstraction is immensely useful in practice and less leaky it is, the better.

      • namibj a day ago

        Such reads are in principle supported if you have sufficient hardware offloading of your stream. AFAIK io_uring got an update a while back specifically to make this practical for non-stream reads, where you basically provide a slab allocator region to the ring and get to tell reads to pick a free slot/slab in that region _only when they actually get the data_ instead of you blocking DMA capable memory for as long as the remote takes to send you the data.

duped a day ago

That problem exists regardless of whether you want to use stackful coroutines or not. The stack could be freed by user code at anytime. It could also panic and drop buffers upon unwinding.

I wouldn't call async drop a pile of hacks, it's actually something that would be useful in this context.

And that said there's an easy fix: don't use the pointers supplied by the future!

  • newpavlov a day ago

    >That problem exists regardless of whether you want to use stackful coroutines or not. The stack could be freed by user code at anytime. It could also panic and drop buffers upon unwinding.

    Nope. The problem does not exist in the stackfull model by the virtue of user being unable (in safe code) to drop stack of a stackfull task similarly to how you can not drop stack of a thread. If you want to cancel a stackfull task, you have to send a cancellation signal to it and wait for its completion (i.e. cancellation is fully cooperative). And you can not fundamentally panic while waiting for a completion event, the task code is "frozen" until the signal is received.

    >it's actually something that would be useful in this context.

    Yes, it's useful to patch a bunch of holes introduced by the Rust async model and only for that. And this is why I call it a bunch of hacks, especially considering the fundamental issues which prevent implementation of async Drop. A properly designed system would've properly worked with the classic Drop.

    >And that said there's an easy fix: don't use the pointers supplied by the future!

    It's always amusing when Rust async advocates say that. Let met translate: don't use `let mut buf = [0u8; 16]; socket.read_all(&mut buf).await?;`. If you can't see why such arguments are bonkers, we don't have anything left to talk about.

    • oconnor663 a day ago

      > don't use `let mut buf = [0u8; 16]; socket.read_all(&mut buf).await?;`. If you can't see why such arguments are bonkers, we don't have anything left to talk about.

      It doesn't seem bonkers to me. I know you already know these details, but spelling it out: If I'm using select/poll/epoll in C to do non-blocking reads of a socket, then yes I can use any old stack buffer to receive the bytes, because those are readiness APIs that only write through my pointer "now or never". But if I'm using IOCP/io_uring, I have to be careful not to use a stack buffer that doesn't outlive the whole IO loop, because those are completion APIs that write through my pointer "later". This isn't just a question of the borrow checker being smart enough to analyze our code; it's a genuine difference in what correct code needs to do in these two different settings. So if async Rust forces us to use heap allocated (or long-lived in some other way) buffers to do IOCP/io_uring reads, is that a failure of the async model, or is that just the nature of systems programming?

      • newpavlov a day ago

        >is that a failure of the async model

        This, 100%. Being really generous, it can be called a leaky model which is poorly compatible with completion-based APIs.

    • duped a day ago

      > The problem does not exist in the stackfull model by the virtue of user being unable (in safe code) to drop stack of a stackfull task similarly to how you can not drop stack of a thread.

      If you're not doing things better than threads then why don't you just use threads?

      > And you can not fundamentally panic while waiting for a completion event, the task code is "frozen" until the signal is received.

      So you only allow join/select at the task level? Sounds awful!

      > Let met translate: don't use `let mut buf = [0u8; 16]; socket.read_all(&mut buf).await?;

      Yes, exactly. It's more like `let buf = socket.read(16);`

      • newpavlov a day ago

        >If you're not doing things better than threads then why don't you just use threads?

        Because green threads are more efficient than the classical threads. You have less context switching, more control over concurrency (e.g. you can have application-level pseudo critical section and tools like `join!`/`select!`), and with io-uring you have a much smaller number of syscalls.

        In other words, memory footprint would be similar to the classical threads, but runtime performance can be much higher.

        >So you only allow join/select at the task level? Sounds awful!

        What is the difference with join/select at the future level?

        Yes, with the most straightforward implementation you have to allocate full stack for each sub-task (somewhat equivalent to boxing sub-futures). But it's theoretically possible to use the parent task stack for sub-task stacks with the aforementioned compiler improvements.

        Another difference is that instead of just dropping the future state on the floor you have to explicitly send a cancellation signal (e.g. based on `IORING_OP_ASYNC_CANCEL`) and wait for the sub-task to finish. Performance-wise it should have minimal difference when compared against the hypothetical async Drop.

        >Yes, exactly.

        Ok, I have nothing more to add then.