Two Proxmox VE server and two different I/O load behavior during backup

jhr

Member
Nov 29, 2021
22
4
8
50
Hello,

I have two Proxmox VE server on almost the same HW, except the installation. PVE1 was installed on 256GB NVME M2 drive + 4x2TB HDD with ZFS RAID10 for VM. PVE2 doesn't have NVME drive and OS was installed on 4x2TB ZFS RAID10. PVE2 have a subscription Production-ready Enterprise repository enabled, PVE1 have a Non production-ready repository only enabled. On each PVE I have one running VM. Both PVE and PBS are connected to the same switch.

When backing up VMs to PBS on each PVE, it took almost the same time, except one critical issue. Load average in VM on PVE2 is about 5 during the backup, but in VM on PVE1 is over 50, a very long time the CPU are on 100% usage and some processes are killed in VM.

When I make a backup not using PBS, but directly on NFS share, the load in VM on PVE1 is high too, but not so much and no processes are killed at all.

What is the source of my troubles ? I have 3-4 candidates:
- Enterprise repo vs no no-subscription
- different version of kernel and packages on each PVE
- NVME installation vs ZFS RAID 10 installation
- of course each VM has a little bit different load and usage all the time

Any advice ?
 
Hi,
Hello,

I have two Proxmox VE server on almost the same HW, except the installation. PVE1 was installed on 256GB NVME M2 drive + 4x2TB HDD with ZFS RAID10 for VM. PVE2 doesn't have NVME drive and OS was installed on 4x2TB ZFS RAID10. PVE2 have a subscription Production-ready Enterprise repository enabled, PVE1 have a Non production-ready repository only enabled. On each PVE I have one running VM. Both PVE and PBS are connected to the same switch.

When backing up VMs to PBS on each PVE, it took almost the same time, except one critical issue. Load average in VM on PVE2 is about 5 during the backup, but in VM on PVE1 is over 50, a very long time the CPU are on 100% usage and some processes are killed in VM.

When I make a backup not using PBS, but directly on NFS share, the load in VM on PVE1 is high too, but not so much and no processes are killed at all.

What is the source of my troubles ? I have 3-4 candidates:
- Enterprise repo vs no no-subscription
- different version of kernel and packages on each PVE
- NVME installation vs ZFS RAID 10 installation
- of course each VM has a little bit different load and usage all the time

Any advice ?
is the CPU usage IO wait? There were a few other reports with such issues upon backup when ZFS was involved. In Proxmox VE 7.3, you can configure a performance max-workers=<N> setting for a backup job. It's not yet exposed in the GUI, so you need to update the job via API, e.g.
Code:
pvesh set /cluster/backup/backup-f1bd3b15-737e --performance max-workers=4
Of course your backup ID will be different, use cat /etc/pve/jobs.cfg to find it.

To test if a lower worker count (the default without the setting is 8) improves the situation, you can also just pass it to a direct vzdump invocation. And if you want to apply it on all backups on the node, set it as a default in /etc/vzdump.conf.

EDIT: For the change to apply, the VMs also need to be running with pve-qemu-kvm >= 7.0.0-4. After installing, shutdown+start or migrating the VM makes it pick up the new version.
 
Last edited:
Hi,

is the CPU usage IO wait? There were a few other reports with such issues upon backup when ZFS was involved. In Proxmox VE 7.3, you can configure a performance max-workers=<N> setting for a backup job. It's not yet exposed in the GUI, so you need to update the job via API, e.g.
Code:
pvesh set /cluster/backup/backup-f1bd3b15-737e --performance max-workers=4
Of course your backup ID will be different, use cat /etc/pve/jobs.cfg to find it.
CPU usage is CPU usage, but IO waits are very high too. iostat shows %iowait almost 100 and %idle almost 0.
OK, I can make an upgrade and setup max-workers, but why the second server is OK during backup ?


To test if a lower worker count (the default without the setting is 8) improves the situation, you can also just pass it to a direct vzdump invocation. And if you want to apply it on all backups on the node, set it as a default in /etc/vzdump.conf.
Will try.
 
I can confirm a new parameter "performance max-workers=N" has a significant effect to performance. My IO a CPU load were incredibly high when a backup was runnig, when I setup performance max-workers=1, the backup takes a little bit longer, but the load and IO are much better then before. Thanks for that.
 
I had similar problems, during backup IO increased up to 40-50%. When I setup performance max-workers=1 it works fine. I noted that when I try to restore from backup IO still increase up to 40-50%. Is the performance max-workers=1 parameter works only for backup, not for restore from backup?
 
Hi,
I had similar problems, during backup IO increased up to 40-50%. When I setup performance max-workers=1 it works fine. I noted that when I try to restore from backup IO still increase up to 40-50%. Is the performance max-workers=1 parameter works only for backup, not for restore from backup?
restore uses a different mechanism, there are no QEMU worker threads involved. Maybe setting a bandwidth limit for the restore helps? Can be done either in the storage configuration, in datacenter.cfg or for the individual operation.
 
@fiona and proxmox team: Thank you very much for adding this to the solution! I can confirm too that this new parameter "performance max-workers=N" has a significant effect to performance when doing backups (i also run ZFS pools).
once i add "max-workers=1" or "max-workers=2" the backups will behave and IO Delay is hitting 40% only (before it was 95-99%...)

i can also confirm that this parameter has effect when doing backups towards a PBS or to direct storage. (with higher values of max-workers a more high end system is required)

Thanks for this good work and taking community requests serious.
i like! :)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!