Comment by Fiveplus

Comment by Fiveplus 5 days ago

62 replies

>The goal is, that xfwl4 will offer the same functionality and behavior as xfwm4 does...

I wonder how strictly they interpret behavior here given the architectural divergence?

As an example, focus-stealing prevention. In xfwm4 (and x11 generally), this requires complex heuristics and timestamp checks because x11 clients are powerful and can aggressively grab focus. In wayland, the compositor is the sole arbiter of focus, hence clients can't steal it, they can only request it via xdg-activation. Porting the legacy x11 logic involves the challenge of actually designing a new policy that feels like the old heuristic but operates on wayland's strict authority model.

This leads to my main curiosity regarding the raw responsiveness of xfce. On potato hardware, xfwm4 often feels snappy because it can run as a distinct stacking window manager with the compositor disabled. Wayland, by definition forces compositing. While I am not concerned about rust vs C latency (since smithay compiles to machine code without a GC), I am curious about the mandatory compositing overhead. Can the compositor replicate the input-to-pixel latency of uncomposited x11 on low-end devices or is that a class of performance we just have to sacrifice for the frame-perfect rendering of wayland?

kelnos 5 days ago

(xfwl4 author here.)

> I wonder how strictly they interpret behavior here given the architectural divergence?

It's right there in the rest of the sentence (that you didn't quote all of): "... or as much as possible considering the differences between X11 and Wayland."

I'll do my best. It won't be exactly the same, of course, but it will be as close as I can get it.

> As an example, focus-stealing prevention.

Focus stealing prevention is a place where I think xfwl4 could be at an advantage over xfwm4. Xfwm4 does a great job at focus-stealing prevention, but it has to work on a bunch of heuristics, and sometimes it just does the wrong thing, and there's not much we can do about it. Wayland's model plus xdg-activation should at least make the focus-or-don't-focus decision much more consistent.

> I am curious about the mandatory compositing overhead. Can the compositor replicate the input-to-pixel latency of uncomposited x11 on low-end devices or is that a class of performance we just have to sacrifice for the frame-perfect rendering of wayland?

I'm not sure yet, but I suspect your fears are well-founded here. On modern (and even not-so-modern) hardware, even low-end GPUs should be fine with all this (on my four-year-old laptop with Intel graphics, I can't tell the difference performance-wise with xfwm4's compositor on or off). But I know people run Xfce/X11 on very-not-modern hardware, and those people may unfortunately be left behind. But we'll see.

  • argulane 5 days ago

    If xfwl4 plans to implement something like sway output max_render_time, then input to pixel output latency should be same or even lower than x11

pjmlp 5 days ago

At least they are honest regarding the reasons, not a wall of text to justify what bails down to "because I like it".

Naturally these kinds of having a language island create some attrition regarding build tooling, integration with existing ecosystem and who is able to contribute to what.

So lets see how it evolves, even with my C bashing, I was a much happier XFCE user than with GNOME and GJS all over the place.

  • amazari 5 days ago

    You know that all the Wayland primitives, event handling and drawing in gnome-shell are handled in C/native code through Mutter, right ? The JavaScript in gnome-shell is the cherry on top for scripting, similar to C#/Lua (or any GCed language) in game engines, elisp in Emacs, event JS in QtQuick/QML.

    It is not the performance bottleneck people seem to believe.

    • pjmlp 5 days ago

      I can dig out the old GNOME tickets and related blog posts...

      Implementation matters, including proper use of JIT/AOT toolchains.

      • joe_mamba 5 days ago

        >I can dig out the old GNOME tickets and related blog posts...

        That's the easiest way you can win any argument on gnome. You're going straight for the nuclear option.

    • ChocolateGod 5 days ago

      It has been the case that stalls in the GJS land can stall the compositor though, especially if it's during a GC cycle.

simoncion 5 days ago

> ...or is that a class of performance we just have to sacrifice for the frame-perfect rendering of wayland?

I think I know what "frame perfect" means, and I'm pretty sure that you've been able to get that for ages on X11... at least with AMD/ATi hardware. Enable (or have your distro enable) the TearFree option, and there you go.

I read somewhere that TearFree is triple buffering, so -if true- it's my (perhaps mistaken) understanding that this adds a frame of latency.

  • wtallis 5 days ago

    > I read somewhere that TearFree is triple buffering, so -if true- it's my (perhaps mistaken) understanding that this adds a frame of latency.

    True triple buffering doesn't add one frame of latency, but since it enforces only whole frames be sent to the display instead of tearing, it can cause partial frames of latency. (It's hard to come up with a well-defined measure of frame latency when tearing is allowed.)

    But there have been many systems that abused the term "triple buffering" to refer to a three-frame queue, which always does add unnecessary latency, making it almost always the wrong choice for interactive systems.

  • gryn 5 days ago

    only on the primary display. once you had more than one display there were only workarounds.

    • simoncion 4 days ago

      I don't know what "workarounds" you're talking about, or what unwanted behavior that I presume you're talking about. Would you be more specific?

      I ask because just a few minutes ago, I ran VRRTest [0] on my dual-monitor machine and saw no screen tearing on either monitor. Because VRR is disabled in multi-monitor setups, I saw juddering on both monitors when I commanded VRRTest render rates that weren't a multiple of the monitor's refresh rate, but no tearing at all.

      My setup:

      * Both monitors hooked up via DisplayPort

      * Radeon 9070 (non-XT)

      * Gentoo Linux, running almost all ~amd64 packages.

      * x11-base/xorg-server-21.1.20

      * x11-drivers/xf86-video-amdgpu-25.0.0-r1

      * x11-drivers/xf86-video-ati-22.0.0

      * sys-kernel/gentoo-sources-6.18.5

      * KDE and Plasma packages are either version 6.22.0 or 6.5.5. I CBA to get a complete list, as there are so many relevant packages.

      [0] <https://github.com/Nixola/VRRTest>

      • simoncion 4 days ago

        (I'm posting in a reply in part because the edit window is long since past.)

        Yeah. I'm actually quite interested in hearing what "workarounds" and/or misbehavior you're talking about. 'amdgpu(4)' says this about the TearFree property:

               Option "TearFree" "boolean"
                      Set the default value  of  the  per-output  ’TearFree’  property,
                      which  controls  tearing prevention using the hardware page flip‐
                      ping mechanism.  TearFree is on for any CRTC associated with  one
                      or  more  outputs with TearFree on.  Two separate scanout buffers
                      need to be allocated for each CRTC with TearFree on.  If this op‐
                      tion is set, the default value of the property is ’on’  or  ’off’
                      accordingly.   If this option isn’t set, the default value of the
                      property is auto, which means that TearFree  is  on  for  rotated
                      outputs,  outputs  with  RandR  transforms applied, for RandR 1.4
                      secondary outputs, and if ’VariableRefresh’ is enabled, otherwise
                      it’s off.
                      
        The explicit mention that the "auto" enables TearFree only for secondary outputs and rotated and/or transformed outputs if 'VariableRefresh' is disabled seems to directly contradict what I think you're saying. And if "auto" enables TearFree on secondary displays, my recommendation of "on" certainly also does. But, yeah. I await clarification.
badsectoracula 5 days ago

One thing to keep in mind is that composition does not mean you have to do it with vsync, you can just refresh the screen the moment a client tells you the window has new contents.

mikkupikku 5 days ago

Compositor overhead even with cheapo Intel laptop graphics is basically a non-issue these days. The people still rocking their 20 year old thinkpads might want to choose something else, but besides that kind of user I don't think it's worth worrying too much about.

  • josefx 5 days ago

    It isn't always pure overhead, but also jitter, additional delays and other issues caused by the indirection. Most systems have a way to mostly override the compositor for fullscreen windows and for games and other applications where visible jitter and delays are an issue you want that even on modern hardware.

    • kllrnohj 4 days ago

      > Most systems have a way to mostly override the compositor for fullscreen windows and for games

      No, they don't. I don't think Wayland ever supported exclusive fullscreen, MacOS doesn't, and Windows killed it a while back as well (in a Windows 10 update like 5-ish years ago?)

      Jitter is a non-issue for things you want vsync'd (like every UI), and for games the modern solution is gsync/freesync which is significantly better than tearing.

      • josefx 4 days ago

        > I don't think Wayland ever supported

        Isn't that true for even the most basic features you expect from a windowing system? X11 may have come with everything and the kitchen sink, Wayland drops all that fun on the implementations.

        GNOME does unredirect on Wayland since 2019: https://www.reddit.com/r/linux/comments/g2g99z/wayland_surfa...

        > Windows killed it

        They replaced it with "Fullscreen Optimisations", which is mostly the same, but more flexible as leaves detection of fullscreen exclusive windows to the window manager.

        https://devblogs.microsoft.com/directx/demystifying-full-scr...

        As far as I can find the update removed the option to turn this of.

      • account42 4 days ago

        X11 doesn't have an exclusive fullscreen mode either. [*] It's always has relied on compositors and drivers to detect when fullscreen windows can be unredirected. Some programs chose to implement behavior like minimizing on focus loss or grabbing input that is closer to Windows's exclusive fullscreen mode but the unredirecting of the display pipeline doesn't depend on that.

        [*] Well, there was an extension (can't recall the name right now) but not much used it and support was dropped at some point.

  • aktau 5 days ago

    That matches what I recall too, back when I ran a very cheap integrated intel (at least that's what I recall) card on my underpowered laptop. I posted a few days ago with screenshots of my 2009 setup with awesome+xcompmgr, and I remember it being very snappy (much more so than my tuned Windows XP install at the time). https://news.ycombinator.com/item?id=46717701

i80and 4 days ago

I ran xfwm's compositor back when it was first introduced on a 400 MHz Pentium II with a GeForce 2. It was fully fine.

The compositing tax is just waiting for vsync; unless your machine is, like, a Pentium Classic, compositing itself isn't a problem.

jchw 5 days ago

> Can the compositor replicate the input-to-pixel latency of uncomposited x11 on low-end devices or is that a class of performance we just have to sacrifice for the frame-perfect rendering of wayland?

I think this is ultimately correct. The compositor will have to render a frame at some point after the VBlank signal, and it will need to render with it the buffers on-screen as of that point, which will be from whatever was last rendered to them.

This can be somewhat alleviated, though. Both KDE and GNOME have been getting progressively more aggressive about "unredirecting" surfaces into hardware accelerated DRM planes in more circumstances. In this situation, the unredirected planes will not suffer compositing latency, as their buffers will be scanned out by the GPU at scanout time with the rest of the composited result. In modern Wayland, this is accomplished via both underlays and overlays.

There is also a slight penalty to the latency of mouse cursor movement that is imparted by using atomic DRM commits. Since using atomic DRM is very common in modern Wayland, it is normal for the cursor to have at least a fraction of a frame of added latency (depending on many factors.)

I'm of two minds about this. One, obviously it's sad. The old hardware worked perfectly and never had latency issues like this. Could it be possible to implement Wayland without full compositing? Maybe, actually. But I don't expect anyone to try, because let's face it, people have simply accepted that we now live with slightly more latency on the desktop. But then again, "old" hardware is now hardware that can more often than not, handle high refresh rates pretty well on desktop. An on-average increase of half a frame of latency is pretty bad with 60 Hz: it's, what, 8.3ms? But half a frame at 144 Hz is much less at somewhere around 3.5ms of added latency, which I think is more acceptable. Combined with aggressive underlay/overlay usage and dynamic triple buffering, I think this makes the compositing experience an acceptable tradeoff.

What about computers that really can't handle something like 144 Hz or higher output? Well, tough call. I mean, I have some fairly old computers that can definitely handle at least 100 Hz very well on desktop. I'm talking Pentium 4 machines with old GeForce cards. Linux is certainly happy to go older (though the baseline has been inching up there; I think you need at least Pentium now?) but I do think there is a point where you cross a line where asking for things to work well is just too much. At that point, it's not a matter of asking developers to not waste resources for no reason, but asking them to optimize not just for reasonably recent machines but also to optimize for machines from 30 years ago. At a certain point it does feel like we have to let it go, not because the computers are necessarily completely obsolete, but because the range of machines to support is too wide.

Obviously, though, simply going for higher refresh rates can't fix everything. Plenty of laptops have screens that can't go above 60 Hz, and they are forever stuck with a few extra milliseconds of latency when using a compositor. It is unideal, but what are you going to do? Compositors offer many advantages, it seems straightforward to design for a future where they are always on.

  • drob518 5 days ago

    Love your post. So, don’t take this as disagreement.

    I’m always a little bewildered by frame rate discussions. Yes, I understand that more is better, but for non-gaming apps (e.g. “productivity” apps), do we really need much more than 60 Hz? Yes, you can get smoother fast scrolling with higher frame rate at 120 Hz or more, but how many people were complaining about that over the last decade?

    • jeroenhd 5 days ago

      I enjoy working on my computer more at 144Hz than 60Hz. Even on my phone, the switch from 60Hz to a higher frame rate is quite obvious. It makes the entire system feel more responsive and less glitchy. VRR also helps a lot in cases where the system is under load.

      60Hz is actually a downgrade from what people were used to. Sure, games and such struggled to get that kind of performance, but CRT screens did 75Hz/85Hz/100Hz quite well (perhaps at lower resolutions, because full-res 1200p sometimes made text difficult to read on a 21 inch CRT, with little benefit from the added smoothness as CRTs have a natural fuzzy edge around their straight lines anyway).

      There's nothing about programming or word processing that requires more than maybe 5 or 6 fps (very few people type more than 300 characters per minute anyway) but I feel much better working on a 60 fps screen than I do a 30 fps one.

      Everyone has different preferences, though. You can extend your laptop's battery life by quite a bit by reducing the refresh rate to 30Hz. If you're someone who doesn't really mind the frame rate of their computer, it may be worth trying!

      • rabf 5 days ago

        CRT screens did 75Hz/85Hz/100Hz quite well, but rendered only one pixel/dot at a time. This is in no way equivalent to 60Hz on a flat panel!

    • array_key_first 5 days ago

      I never complained about 60, then I went to 144 and 60 feels painful now. The latency is noticable in every interaction, not just gaming. It's immediately evident - the computer just feels more responsive, like you're in complete control.

      Even phones have moved in this direction, and it's immediately noticable when using it for the first time.

      I'm now on 240hz and the effect is very diminished, especially outside of gaming. But even then I notice it, although stepping down to 144 isn't the worst. 60, though, feels like ice on your teeth.

      • drob518 5 days ago

        Did you use the same computer at both 60 and 144? I have no doubt that 144 feels smoother for scrolling and things like that. It definitely should. But if you upgraded your system at the same time you upgraded your display, much of the responsiveness would be due to a faster system.

    • elektronika 5 days ago

      > how many people were complaining about that over the last decade?

      Quite a few. These articles tend to make the rounds when it comes up: https://danluu.com/input-lag/ https://lwn.net/Articles/751763/ Perception varies from person to person, but going from my 144hz monitor to my old 60hz work laptop is so noticeable to me that I switched it from a composited wayland DE to an X11 WM.

      • drob518 5 days ago

        Input lag is not the same as refresh rate. 60 Hz is 16.7 ms per frame. If it takes a long time for input to appear on screen it’s because of the layers and layers of bloat we have in our UI systems.

    • bee_rider 5 days ago

      If our mouse cursors are going to have half a frame of latency, I guess we will need 60Hz or 120Hz desktops, or whatever.

      I dunno. It does seem a bit odd, because who was thinking about the framerates of, like, desktops running productivity software, for the last couple decades? I guess I assumed this would never be a problem.

      • jchw 5 days ago

        Mouse cursor latency and window compositing latency are two separate things. I probably did not do a good enough job conveying this. In a typical Linux setup, the mouse cursor gets its own DRM plane, so it will be rendered on top of the desktop during scanout right as the video output goes to the screen.

        There are two things that typically impact mouse cursor latency, especially with regards to Wayland:

        - Software-rendering, which is sometimes used if hardware cursors are unavailable or buggy for driver/GPU reasons. In this case the cursor will be rendered onto the composited desktop frame and thus suffer compositor latency, which is tied to refresh rate.

        - Atomic DRM commits. Using atomic DRM commits, even the hardware-rendered cursors can suffer additional latency. In this case, the added latency is not necessarily tied to frame times or refresh rates. Instead, its tied to when during the refresh cycle the atomic commit is sent; specifically, how close to the deadline. I think in most cases we're talking a couple milliseconds of latency. It has been measured before, but I cannot find the source.

        Wayland compositors tend to use atomic DRM commits, hence a slightly more laggy mouse cursor. I honestly couldn't tell you if there is a specific reason why they must use atomic DRM, because I don't have knowledge that runs that deep, only that they seem to.

      • drob518 5 days ago

        Mouse being jumpy shouldn’t be related to refresh rate. The mouse driver and windowing system should keep track of the mouse position regardless of the video frame rate. Yes, the mouse may jump more per frame with a lower frame rate, but that should only be happening when you move the mouse a long distance quickly. Typically, when you do that, you’re not looking at the mouse itself but at the target. Then, once you’re near it, you slow down the movement and use fine motor skills to move it onto the target. That’s typically much slower and frame rate won’t matter much because the motion is so much smaller.

    • layer8 5 days ago

      I agree. Keyboard-action-to-result-on-screen latency is much more important, and we are typically way above 17 ms for that.

      • drob518 5 days ago

        Yep, agreed, though it’s not just keyboard to screen. It’s also mouse click to screen. Really, any event to screen.

        • layer8 5 days ago

          Initially I wrote “input device”, but since mouse movements aren’t generally a problem, I narrowed it to “keyboard”. ;) Mouse clicks definitely fall into the same category, though.

    • jchw 5 days ago

      Essentially, the only reason to go over 60 Hz for desktop is for a better "feel" and for lower latency. Compositing latency is mainly centered around frames, so the most obvious and simplest way to lower that latency is to shorten how long a frame is, hence higher frame rates.

      However, I do think that high refresh rates feel very nice to use even if they are not strictly necessary. I consider it a nice luxury.

  • michaelmrose 5 days ago

    I couldn't find ready stats on what percentage of displays are 60 hz but outside of gaming and high end machines I suspect 60 hz is still the majority of of machines used by actual users meaning we should evaluate the latency as it is observed by most users.

    • jchw 5 days ago

      The point is that we can improve latency of even old machines by simply attaching a display output that supports a higher refresh rate, or perhaps even variable refresh rate. This can negate most of the unavoidable latency of a compositor, while other techniques can be used to avoid compositor latency in more specific scenarios and try to improve performance and frame pacing.

      A new display is usually going to be cheaper than a new computer. Displays which can actually deliver 240 Hz refresh rates can be had for under $200 on the lower end, whereas you can find 180 Hz displays for under $100, brand new. It's cheap enough that I don't think it's even terribly common to buy/sell the lower end ones second-hand.

      For laptops, well, there is no great solution there; older laptops with 60 Hz panels are stuck with worse latency when using a compositor.

      • com2kid 5 days ago

        Plenty of brand new displays are still sold that only go up to 60hz, especially if you want high quality IPS panels.

        They aren't as common now, but when making a list of screens to replace my current one, I am limiting myself to IPS panels and quite a few of the modern options are still 60hz.

        • jchw 4 days ago

          Yeah, I personally still have a lot of 60 Hz panels. One of my favorites is a 43" 4K IPS. I don't think I will be able to get that at 120+ Hz any time soon.

          Of course, this isn't a huge deal to me. The additional latency is not an unusable nightmare. I'm just saying that if you are particularly latency sensitive, it's something that you can affordably mitigate even when using a compositor. I think most people have been totally fine eating the compositor latency at 60 Hz.

account42 4 days ago

> As an example, focus-stealing prevention. In xfwm4 (and x11 generally), this requires complex heuristics and timestamp checks because x11 clients are powerful and can aggressively grab focus. In wayland, the compositor is the sole arbiter of focus, hence clients can't steal it, they can only request it via xdg-activation. Porting the legacy x11 logic involves the challenge of actually designing a new policy that feels like the old heuristic but operates on wayland's strict authority model.

Not that that's necessarily the best way to do it but nothing stops xfwl4 from simply granting every focus request and then applying their existing heuristics on the result of that.

PunchyHamster 5 days ago

> Can the compositor replicate the input-to-pixel latency of uncomposited x11 on low-end devices or is that a class of performance we just have to sacrifice for the frame-perfect rendering of wayland?

well, the answer is just no, wayland has been consistently slower than X11 and nothing running on top can't really go around that

imcritic 5 days ago

Xfce / xfwm4 doesn't offer focus stealing prevention.