How to improve RDP user experience (Proxmox + Ceph NVMe + Mellanox fabric)

meckhardt

Active Member
Sep 23, 2020
4
0
41
37
Hi everyone,


I’d like to ask for advice on improving user experience in two Windows Terminal Servers (around 15 users each, RDP/UDP).
After migrating from two standalone VMware hosts (EPYC 9654, local SSDs) to a Proxmox + Ceph cluster, users feel sessions are slightly slower or less responsive.




Current Infrastructure​


Cluster: Proxmox VE 8.4.1 + Ceph Squid 19.2.1
Backup: Separate PBS node (HDD SAS pool).
Storage: 3 Ceph storage nodes (each 4× NVMe U.2 3.84 TB) + HDD nodes for PBS.


Main compute nodes:


  • main1 and main2 (Dell R750 / Oracle X8-2L)
    • CPUs: Duals. Xeon Platinum 8368 / 8270CL
    • RAM: 512GB ECC DDR4 — not all memory channels populated yet
    • Boot: local SSD
    • Data: Ceph NVMe pool (MTU 9000, dedicated VLAN)
    • Optional local RAID10 array (12 Gb/s SATA SSDs)
    • Tesla M10 GPU available (not installed yet)

Networking (current state):


  • Full 2x 10/40 Gb fabric LAGG —
  • Core: dual Mellanox SX6036 in MLAG (LACP L3+L4, MTU 9000)
  • Routers: two UDM Pro in HA — each connected via SFP+ 10 Gb breakout
    (active to SX6036-A, passive to SX6036-B)
  • Networks:
    • VLAN 1 → Ceph (private, MTU 9000)
    • VLAN120 → Cluster Network / Corosync, backups )
    • VLAN 1020 → VM traffic (10.20.0.0/24) MTU 9000)
    • VLAN 10 → Management (ilom idracs)

This setup replaced a previous Cisco SG350XG / SG550XG fabric with LAGs and an ER8411 router. ( 4 x10GB LAGG all nodes)



⚙️ Workloads​


  • 2 × Windows Terminal Servers (≈15 users each, RDP over UDP)
  • Several Linux database VMs

The environment is stable and throughput tests are strong, but RDP sessions feel less responsive than before.




❓Questions​


  1. Memory channels: Will fully populating all RAM channels (on both main hosts) noticeably improve responsiveness or latency for RDP sessions?
  2. GPU: If I install the Tesla M10 (for basic VDI / RDP graphics, no vGPU GRID), how much real-world improvement can I expect?
  3. Storage: Would moving the Terminal Server VMs from Ceph NVMe to a local SSD RAID10 array improve user experience, or is latency similar?
  4. Other tunings: Any Ceph / VirtIO / network settings that had the biggest impact for you on RDP smoothness or UI responsiveness?



Ceph benchmarks, CPU stress tests, and FIO runs all return solid and expected results. All NVMe drives (Oracle U.2 models) show 0% wear.


This weekend I plan to:


  • Upgrade the cluster to Proxmox 9.
  • Fully populate memory with 8× 64 GB DDR4-3200 modules on both main1 and main2.
  • Migrate the Terminal Server VMs to local RAID10 SSD storage on main1.
  • Dedicate main1 exclusively to the Windows Server ecosystem (DC, TS, UPD, DFS, etc.), splitting the services across several VMs.
  • Use main2 for database workloads.
  • Install the Tesla M10 GPU on main1 and configure CPU passthrough for the Terminal Server VMs.



Thanks a lot for any insights or tuning recommendations
 
I'm not sure if this applies to you, but it sounds like a commercial scenario? In which case I would highly recommend you take out the full license cover with Proxmox. As I understand it they have inhouse skills specializing in migrations from vMware to Proxmox, pretty sure they will just SSH in and help you out.

If this is purely for personal use, I think you'll be at a loss on these forums because I can't imagine most of the non-subscription community does multi-user RDP client desktops etc.

One thing I would suggest is to list your VM config (ie the 10x.conf file) as a start, because then some obvious performance enhancement suggestions can be thrown your way after a quick glance of the config file.


PS...... I've always found Ceph to be a little sluggish too until it is finely tuned with the right hardware etc., problem is, the tuning varies from platform to platform, one size never fits all, but you already know that. Very difficult to make suggestions. The Proxmox guys do it night and day, but that granular service understandably doesn't come on these forums for free.
 
Last edited:
Disable any power saving features / c-states and/or switch the BIOS to a high-performance/virtualization pre-set.

10/40 Gbit can be a bottleneck if used with fast flash storage! Especially if it shares the physical network with other services!

Give Ceph its own physical network. With 3 nodes, you could think about a full-mesh setup if the cluster isn't expected to grow in the foreseeable future. This way you could dedicate faster NICs for Ceph without spending tons of money on the appropriate switches.
https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server