Hello Proxmox Community,
I hope you’re all doing well.
We’ve set up a High Availability (HA) Proxmox cluster with 10 nodes (and growing), and we’re planning to host 1000+ Windows 11 VMs within this cluster. Each VM is dedicated to a single user, who connects via RDP. On average, each RDP session consumes about 100 Mbps of bandwidth.
To manage external access, we’ve placed a pfSense HAProxy setup in front of the cluster. It consists of:
- A CARP master VM with a Hetzner failover IP
- A CARP backup VM that takes over if the master becomes unreachable
At any given time, only one pfSense VM is active. Each Proxmox node also has its own public IP.
Users connect to their VMs using a dedicated FQDN and RDP port, for example:
e.g.
Please note that it is neccessary for us to have the vmid in the FQDN as we are also using another service with hardcoded port value for WinRM: 5985/5986.
The Challenge
When a large number of users connect simultaneously, the pfSense VM becomes a bottleneck. Currently, it is limited to 1 Gbps network throughput, which isn’t sufficient for scaling beyond a certain point.
The Question
How can we design the networking layer so that pfSense doesn’t become a bottleneck when supporting 1000+ concurrent RDP connections?
We’ve explored the idea of SNAT-based routing — inbound traffic entering via pfSense, with outbound traffic going directly through the Proxmox node’s public IP. However, this introduces complications when VMs are migrated between nodes (since the public IP may change), and we are not sure how to properly implement this.
We’d really appreciate any insights or design recommendations from those who have dealt with similar high-scale setups.
Thank you in advance for your guidance!
Best regards
I hope you’re all doing well.
We’ve set up a High Availability (HA) Proxmox cluster with 10 nodes (and growing), and we’re planning to host 1000+ Windows 11 VMs within this cluster. Each VM is dedicated to a single user, who connects via RDP. On average, each RDP session consumes about 100 Mbps of bandwidth.
To manage external access, we’ve placed a pfSense HAProxy setup in front of the cluster. It consists of:
- A CARP master VM with a Hetzner failover IP
- A CARP backup VM that takes over if the master becomes unreachable
At any given time, only one pfSense VM is active. Each Proxmox node also has its own public IP.
Users connect to their VMs using a dedicated FQDN and RDP port, for example:
vm{proxmox_vm_id}.my-domain.com:<rdp_port>
e.g.
vm123.my-domain.com:456
for VM ID 123 with RDP port 456. This mapping remains consistent even when a VM migrates between nodes. Please note that it is neccessary for us to have the vmid in the FQDN as we are also using another service with hardcoded port value for WinRM: 5985/5986.
The Challenge
When a large number of users connect simultaneously, the pfSense VM becomes a bottleneck. Currently, it is limited to 1 Gbps network throughput, which isn’t sufficient for scaling beyond a certain point.
The Question
How can we design the networking layer so that pfSense doesn’t become a bottleneck when supporting 1000+ concurrent RDP connections?
We’ve explored the idea of SNAT-based routing — inbound traffic entering via pfSense, with outbound traffic going directly through the Proxmox node’s public IP. However, this introduces complications when VMs are migrated between nodes (since the public IP may change), and we are not sure how to properly implement this.
We’d really appreciate any insights or design recommendations from those who have dealt with similar high-scale setups.
Thank you in advance for your guidance!
Best regards