Comment by delusional
Comment by delusional 2 days ago
I agree. I wanted to make a note for anyone else seeing odd behavior, but I'm working on figuring out what sort of degenerate behavior this hits.
Comment by delusional 2 days ago
I agree. I wanted to make a note for anyone else seeing odd behavior, but I'm working on figuring out what sort of degenerate behavior this hits.
Nice dig. Could you share more about how you narrowed it down in the end? Is it a known issue and you just had to confirm it applies, or did you identify all of this yourself?
`perf` to get go from the "it's stuttering" to "it's spending a very long time in the gpu driver". GDB and printf debugging to get to "the sort in the driver is taking a long time because there are an excessively large amount of TTM buffer objects, not because we are calling it too much". I could have made that leap faster, and I will the next time, but this time that step took me a couple of hours. From there it was a question of who is making those buffer objects, and so it was back to GDB to find nothing in sway/wlroots.
That was where I sort of ran out of good ideas. I have never worked with Wayland before. I figured it's a "protocol" so it must have a way to inspect it, and it does. `WAYLAND_DEBUG=1` allows you to dump the wayland messages, which I then manually inspected to find a discrepancy between allocations and dealloctions. That's a client (aka firefox) bug, so I looked through their issue tracker where I found a somewhat similar bug[1]. I reported my findings there.
Since then I've checked out the firefox code (which I've also never worked with before). Back in GDB and the logs, and I think I know what's going wrong. You can read the bugzilla for that though.
I have looked into it. This appears to be a Firefox bug when HDR is enabled on wayland and the website is using webgl. Firefox looks to be leaking wl_buffer objects which are causing a VRAM leak in the wayland compositor which then causes performance issues in the AMDGPU TTM buffer object management.