Comment by IshKebab
I started working on this with the Kitty image protocol, but unfortunately that protocol is really unsuited to this sort of thing. Performance will be awful.
The protocol is sort of:
1. I'd like you to display this PNG. Here's the data: ...
2. Ok I've got the data.
3. Ok now display it at this position.
4. Ok now remove it from the screen.
We're talking motion-PNG here. Just think about how awful that is.
I wish someone would add some kind of AV1-over-terminal protocol. That would be actually useful.
The other thing I was going to try was a custom GUI that used normal terminal text for the text of widgets, but Kitty images for the rest. It's quite a hard problem though.
What you're describing is a graphical shell. If you want it over the network, we have a protocol for that, it's called X. Misusing a terminal for this is fundamentally pointless.