I realize this topic has possibly been discussed to death. We had our setup working well until the upgrade from 6.4 to 7.3. Things are less good now.
We have 5 front-end nodes connected via 40 Gb/s inifiniband to a ZFS file server.
VM storage is mounted via NFS on each node and is referred to in the Proxmox Gui as just a folder (the location of the mount)
VM Backups using the GUI vzdump (not PBS) run over CIFS and a 1 Gb/s ethernet connection, also mapped per node and referred as a folder in GUI.
When we upgraded, performance all around was terrible until we downgraded NFS from 4.2 to 4.0 (the node upgrade bumped the mount to vers=4.2) on the nodes (and rebooted them all). VM performance has been excellent since (that was about a week ago).
When the VM backups run, they're 20-50% faster than before which is super. But any running VM that has a normally high load (we stupidly run a small but busy mailserver -- it uses the NFS over IB but its VHDs are on the shared storage SSD pool), has its iowait cranked to 95%. Mail users not happy. And this is anywhere in the cluster -- mail server runs on VM5, backup was running on VM3 last night...and things were a mess.
I'm wondering if I slowed the backups down with bwlimit in /etc/vzdump.conf if that would help things. It seems the topology essentially has the VM traffic moving over IB with the backup traffic -- the backup data seems to be pulled through the front-end node from the storage server which means you have normal VM traffic mixing with the backup traffic. I don't see a way around this short of doing VHD backups directly from the storage outside of Proxmox.
Thank you for any tips.
We have 5 front-end nodes connected via 40 Gb/s inifiniband to a ZFS file server.
VM storage is mounted via NFS on each node and is referred to in the Proxmox Gui as just a folder (the location of the mount)
VM Backups using the GUI vzdump (not PBS) run over CIFS and a 1 Gb/s ethernet connection, also mapped per node and referred as a folder in GUI.
When we upgraded, performance all around was terrible until we downgraded NFS from 4.2 to 4.0 (the node upgrade bumped the mount to vers=4.2) on the nodes (and rebooted them all). VM performance has been excellent since (that was about a week ago).
When the VM backups run, they're 20-50% faster than before which is super. But any running VM that has a normally high load (we stupidly run a small but busy mailserver -- it uses the NFS over IB but its VHDs are on the shared storage SSD pool), has its iowait cranked to 95%. Mail users not happy. And this is anywhere in the cluster -- mail server runs on VM5, backup was running on VM3 last night...and things were a mess.
I'm wondering if I slowed the backups down with bwlimit in /etc/vzdump.conf if that would help things. It seems the topology essentially has the VM traffic moving over IB with the backup traffic -- the backup data seems to be pulled through the front-end node from the storage server which means you have normal VM traffic mixing with the backup traffic. I don't see a way around this short of doing VHD backups directly from the storage outside of Proxmox.
Thank you for any tips.