Comment by gavinhoward

Comment by gavinhoward a day ago

8 replies

As the author of a semi-famous post about how Zig has function colors [1], I decided to read up on this.

I see that blocking I/O is an option:

> The most basic implementation of `Io` is one that maps to blocking I/O operations.

So far, so good, but blocking I/O is not async.

There is a thread pool that uses blocking I/O. Still good so far, but blocking I/O is still not async.

Then there's green threads:

> This implementation uses `io_uring` on Linux and similar APIs on other OSs for performing I/O combined with a thread pool. The key difference is that in this implementation OS threads will juggle multiple async tasks in the form of green threads.

Okay, they went the Go route on this one. Still (sort of) not async, but there is an important limitation:

> This implementation requires having the ability to perform stack swapping on the target platform, meaning that it will not support WASM, for example.

But still no function colors, right?

Unfortunately not:

> This implementation [stackless coroutines] won’t be available immediately like the previous ones because it depends on reintroducing a special function calling convention and rewriting function bodies into state machines that don’t require an explicit stack to run.

(Emphasis added.)

And the function colors appear again.

Now, to be fair, since there are multiple implementation options, you can avoid function colors, especially since `Io` is a value. But those options are either:

* Use blocking I/O.

* Use threads with blocking I/O.

* Use green threads, which Rust removed [2] for good reasons [3]. It only works in Go because of the garbage collector.

In short, the real options are:

* Block (not async).

* Use green threads (with their problems).

* Function colors.

It doesn't appear that the function colors problem has been defeated. Also, it appears to me that the Zig team decided to have every concurrency technique in the hope that it would appear innovative.

[1]: https://gavinhoward.com/2022/04/i-believe-zig-has-function-c...

[2]: https://github.com/aturon/rfcs/blob/remove-runtime/active/00...

[3]: https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p13...

audunw a day ago

It’s hard to judge before stackless coroutines are reintroduced. But I think you’re entirely wrong that the next version of it will have colored functions, even according to your definition.

It has been mentioned that it’s possible that the default for debug builds is that every single function is compiled as an async function. I.e. there is canonically only one function color. Changing function color could become an optimisation for release builds. This is really not much different from inlining functions or other tricks the compiler can do with the calling convention if it has perfect knowledge of all callers.

> it appears to me that the Zig team decided to have every concurrency technique in the hope that it would appear innovative.

That’s a really bad take. It’s not much different from what they did to make allocators explicit. It’s an excellent approach for what Zig is supposed to be. Different concurrency models have different performance tradeoffs, just like with allocators. If the can support different IO models without making the language complicated, that’s a huge win, and they seem to be achieving that.

I find this approach the opposite of “appear innovative”. They’ve moved away from designing in a bunch of fancy syntax that locks users into one particular concurrency model, and gone for a more explicit and boring design which puts power in the hands of the user. It may not be right for everyone, but for what Zig is setting out to do it’s perfect. A disciplined decision in my opinion.

Getting stackless coroutines right for a low level language like Zig would be somewhat innovative. But not in a way that’s flashy or super interesting.

  • gavinhoward 21 hours ago

    > It has been mentioned that it’s possible that the default for debug builds is that every single function is compiled as an async function. I.e. there is canonically only one function color.

    But then, for those that choose to only use blocking I/O or green threads, they still pay the penalty of async merely existing.

    > That’s a really bad take. It’s not much different from what they did to make allocators explicit.

    I mean, Zig explicit allocators are really the same thing is Go interfaces, just dressed up as an innovative feature by a specific use case. This is what I mean by "appearing" innovative: they are taking old and tested ideas and presenting them in new ways to make them appear new.

    Also, Zig could have had explicit allocators without needing to pass them to every function [1].

    > They’ve moved away from designing in a bunch of fancy syntax that locks users into one particular concurrency model, and gone for a more explicit and boring design which puts power in the hands of the user.

    Except that if every function is made async, they have actually removed from users the power to choose to not use async.

    [1]: https://jai.community/t/context/163

ozgrakkurt a day ago

Their bet seems to be that they can transparently implement real async inside an IO implementation using compiler magic. But then it means if you use that IO instance with the magic then your function gets transformed into a state machine?

Then this whole thing is useless for implementing cooperative scheduling async like in rust?

  • flohofwoe a day ago

    > But then it means if you use that IO instance with the magic then your function gets transformed into a state machine?

    This was essentially like the old async/await implementation in Zig already worked. The same function gets the state-machine treatment if it was called in an async context, otherwise it's compiled as a 'regular' sequential function.

    E.g. at runtime there may be two versions of a function, but not in the code base. Not sure though how that same idea would be implemented with the new IO interface, but since Zig strictly uses a single-compilation-unit model the compiler might be able to trace the usage of a specific IO implementation throughout the control flow?

    • [removed] a day ago
      [deleted]
  • yxhuvud a day ago

    No, it just means the cooperative scheduler needs to provide an io implementation that works with the rest of the scheduler.

mlugg a day ago

> it depends on reintroducing a special function calling convention

This is an internal implementation detail rather than a fact which is usually exposed to the user. This is essentially just explaining that the Zig compiler needs to figure out which functions are async and lower them differently.

We do have an explicit calling convention, `CallingConvention.async`. This was necessary in the old implementation of async functions in order to make runtime function pointer calls work; the idea was that you would cast your `fn () void` to a `fn () callconv(.async) void`, and then you could call the resulting `*const fn () callconv(.async) void` at runtime with the `@asyncCall` builtin function. This was one of the biggest flaws in the design; you could argue that it introduced a form of coloring, but in practice it just made vtables incredibly undesirable to use, because (since nobody was actually doing the `@asyncCall` machinery in their vtable implementations) they effectively just didn't support async.

We're solving this with a new language feature [0]. The idea here is that when you have a virtual function -- for a simple example, let's say `alloc: *const fn (usize) ?[*]u8` -- you instead give it a "restricted function pointer type", e.g. `const AllocFn = @Restricted(*const fn (usize) ?[*]u8);` with `alloc: AllocFn`. The magic bit is that the compiler will track the full set of comptime-known function pointers which are coerced to `AllocFn`, so that it can know the full set of possible `alloc` functions; so, when a call to one is encountered, it knows whether or not the callee is an async function (in the "stackless async" sense). Even if some `alloc` implementations are async and some are not, the compiler can literally lower `vtable.alloc(123)` to `switch (vtable.alloc) { impl1 => impl1(123), impl2 => impl2(123), ... }`; that is, it can look at the pointer, and determine from that whether it needs to dispatch a synchronous or async call.

The end goal is that most function pointers in Zig should be used as restricted function pointers. We'll probably keep normal function pointers around, but they ideally won't be used at all often. If normal function pointers are kept, we might keep `CallingConvention.async` around, giving a way to call them as async functions if you really want to; but to be honest, my personal opinion is that we probably shouldn't do that. We end up with the constraint that unrestricted pointers to functions where the compiler has inferred the function as async (in a stackless sense) cannot become runtime-known, as that would lead to the compiler losing track of the calling convention it is using internally. This would be a very rare case provided we adequately encourage restricted function pointers. Hell, perhaps we'd just ban all unrestricted default-callconv function pointers from becoming runtime-known.

Note also that stackless coroutines do some with a couple of inherent limitations: in particular, they don't play nicely with FFI (you can't suspend across an FFI boundary; in other words, a function with a well-defined calling convention like the C calling convention is not allowed to be inferred as async). This is a limitation which seems perfectly acceptable, and yet I'm very confident that it will impact significantly more code than the calling convention thing might.

TL;DR: depending on where the design ends up, the "calling convention" mentioned is either entirely, or almost entirely, just an implementation detail. Even in the "almost entirely" case, it will be exceptionally rare for anyone to write code which could be affected by it, to the point that I don't think it's a case worth seriously worrying about unless it proves itself to actually be an issue in practice.

[0]: https://github.com/ziglang/zig/issues/23367

  • gavinhoward 21 hours ago

    From my experience, the calling convention was, in 0.9.x, just an implementation detail, until it wasn't. I think I may still reserve judgment for when async is fully implemented. Then I'll torture it again.