- but I assume Guacamole would run on the same machine as Proxmox - hence, whether it talks VNC or RDP isn't as important (latency wise).
Yes, from a network POV it may add just minimal latency (<<1ms) but the retranslation step must happen too, so effectively your VM or QEMU grabs the screen, process it sends it to a local server which reprocess it to send it over WAN.
What is important is the WAN connection from the Proxmox/Guacamole server back to you - if this can run over a more efficient RDP-like connection (via Guacamole), that seems like a win, right?
I doubt that a external daemon, not system involved, which effectively has no meta-information about the stream can do the same as RDP, a system involved daemon with all the information. E.g., they even know if there's a video playing somewhere on the screen and send only that part as original video encoded stream, e.g. AV1, h26X, ... which means no transformation (delay + cpu time) and already highly optimized. A external daemon, which doesn't have this information can never optimize in the same way, some heuristics may help but still, they probably end up taking the whole frame and re-encode it...
Anyway my question about how you determined that it is as good as RDP (or better than noVNC) is still open:
Where did you derived this information, honestly curious about why and technical background.
As integrated such things comes with quite a bit of work and headaches, and we have already technology where you can do a browser-only-no-fat-client VM display access we need hard numbers or facts why this is better, else we'd need to integrate 10 such technologies a year.
Maybe take a look at spice? While no HTML client (yet) it does a lot of magic and is quite nice in general, IMO.