I've had a few posts in the last month about trialing Proxmox and I've decided to give it a go. I have my hands on a 3 x 2U 4 node cluster. They were new old stock Cohesity cluster systems. The system is based on the Inventec K900G4. (https://youtu.be/CfRZZbggzLk?si=fcyG4-w2Ge7vr2aH)
I've setup one of the 4 node servers for PVE/PBS. Here are the specs for each node.
2 x Intel Silver 4214
256GB Ram
2 x 240GB Nvme boot disks
6 x 1.92 Nvme storage disks.
1 x Quad port 10G SFP+ network card.
Cisco Nexus 3172 SFP+ switch.
Each node has port 1 of the of the quad sfp+ card connected to the switch with a 10G DAC cable.
Each node has port 4 of the of the quad sfp+ card connected to the switch with a 10G DAC cable.
All nodes OS was installed on the 2 x 240GB Nvme disks with ZFS raid mirror.
Nodes 1, 2 and 3 all have PVE 8.2.2 installed.
Node 4 has PBS 3.2-2 installed.
For nodes 1,2,3 the PVE install created a vmbr0 bridge on port 1. IPs 10.63.14.1, .2 and .3 /24 were added for the main network. I manually created vmbr1 to the other DAC connected port 4 on each of the 3 PVE nodes and assigned 10.10.10.1, .2 and .3 /24 and that network was selected when creating the 3 node cluster.
Each node for 1-3 has access to 6 disks. There are Intel D7-P5500 data center drives. Capable of Seq R/W 7000/4300 MB/s. Random 4KB R/W up to 1M/130K IOPS.
There are 18 OSD's in total. All OSD's were created as Nvme disks. The pool which all the VM's sit on was created with 128 PG's and a standard of 3/2 setup.
To test the network I installed iperf3 on all nodes. Between the PVE 1-3 nodes and PBS 4 node all speeds come back at aprox 9.4-9.5Gbps. I have some linux VM's each of the nodes with iperf3 as well and networking between them I average 9.3+ gbps still. I have some Windows VM's (Server 2022 and Win 11) installed and running iperf3 between them as the client to a linux vm or one of the host nodes direct I get 9.3+gbps. When I have the Windows VM's run as a iperf3 server anything connecting to it seems to slow to about 8.2Gbps, doesn't matter if its a linux vm, windows vm or one of the nodes shell directly. Not critical but a little slow down noted on windows vm's ability to 'push' data is a little less than the linux vm's or the nodes themselves.
When copying data between Windows vm's I am only getting around ~400MB/sec. I would of thought with the usage of multiple drives spread to write the data the speed would be near the max of the 10Gbps network (aprox ~1200MB/sec)
For PBS on node 4 there is a single Nvme disk for backups for now. It was setup as a single zfs drive. When I backup any of the vm's to the PBS system the max seems to be around ~200MiB/s for the read and a maybe a smidge less for the write while it is "actually" reading and writing the data. What I mean by that is this, for example I created a 500GB VM and there is only 50GB used on it...., when the backup starts, its showing you the % complete it posting a line of 1% increments at a time but once you get to the part where the (50.0 GiB of 500.0 GiB) is complete the read speed goes to ~9GiB/s read and the writes are 0 B/s and moves multiple % each line now. Assuming this is normal as the last 450GB of data was blank. But why the actual data being read at the start is so slow? Why would that be?
Any advice on tuning this before I start converting VMware production VMs would be appreciated.
thanks, Paul
I've setup one of the 4 node servers for PVE/PBS. Here are the specs for each node.
2 x Intel Silver 4214
256GB Ram
2 x 240GB Nvme boot disks
6 x 1.92 Nvme storage disks.
1 x Quad port 10G SFP+ network card.
Cisco Nexus 3172 SFP+ switch.
Each node has port 1 of the of the quad sfp+ card connected to the switch with a 10G DAC cable.
Each node has port 4 of the of the quad sfp+ card connected to the switch with a 10G DAC cable.
All nodes OS was installed on the 2 x 240GB Nvme disks with ZFS raid mirror.
Nodes 1, 2 and 3 all have PVE 8.2.2 installed.
Node 4 has PBS 3.2-2 installed.
For nodes 1,2,3 the PVE install created a vmbr0 bridge on port 1. IPs 10.63.14.1, .2 and .3 /24 were added for the main network. I manually created vmbr1 to the other DAC connected port 4 on each of the 3 PVE nodes and assigned 10.10.10.1, .2 and .3 /24 and that network was selected when creating the 3 node cluster.
Each node for 1-3 has access to 6 disks. There are Intel D7-P5500 data center drives. Capable of Seq R/W 7000/4300 MB/s. Random 4KB R/W up to 1M/130K IOPS.
There are 18 OSD's in total. All OSD's were created as Nvme disks. The pool which all the VM's sit on was created with 128 PG's and a standard of 3/2 setup.
To test the network I installed iperf3 on all nodes. Between the PVE 1-3 nodes and PBS 4 node all speeds come back at aprox 9.4-9.5Gbps. I have some linux VM's each of the nodes with iperf3 as well and networking between them I average 9.3+ gbps still. I have some Windows VM's (Server 2022 and Win 11) installed and running iperf3 between them as the client to a linux vm or one of the host nodes direct I get 9.3+gbps. When I have the Windows VM's run as a iperf3 server anything connecting to it seems to slow to about 8.2Gbps, doesn't matter if its a linux vm, windows vm or one of the nodes shell directly. Not critical but a little slow down noted on windows vm's ability to 'push' data is a little less than the linux vm's or the nodes themselves.
When copying data between Windows vm's I am only getting around ~400MB/sec. I would of thought with the usage of multiple drives spread to write the data the speed would be near the max of the 10Gbps network (aprox ~1200MB/sec)
For PBS on node 4 there is a single Nvme disk for backups for now. It was setup as a single zfs drive. When I backup any of the vm's to the PBS system the max seems to be around ~200MiB/s for the read and a maybe a smidge less for the write while it is "actually" reading and writing the data. What I mean by that is this, for example I created a 500GB VM and there is only 50GB used on it...., when the backup starts, its showing you the % complete it posting a line of 1% increments at a time but once you get to the part where the (50.0 GiB of 500.0 GiB) is complete the read speed goes to ~9GiB/s read and the writes are 0 B/s and moves multiple % each line now. Assuming this is normal as the last 450GB of data was blank. But why the actual data being read at the start is so slow? Why would that be?
Any advice on tuning this before I start converting VMware production VMs would be appreciated.
thanks, Paul
Last edited: