It is also necessary to restart or live migrate the virtual servers so they use the updated kvm binary.
max-workers=1 seems to be working well so far.
On one node I restarted some, but not all VMs
In this IO wait graph the arrows are when vzdump began on a VM running the updated KVM, the spikes...
I've upgraded to Proxmox 7.3 and added the new setting to /etc/vzdump.conf
Will report back in a few days if it resolved the problem, which I am confident it will.
Added to /etc/vzdump.conf:
performance: max-workers=1
Thanks @fiona
Yes M not G
I think the 80M limit is faster because it reduces the IO load, just like 8 workers reduces the IO load vs 16 workers.
8 workers is not enough to completely alleviate my issues. it is an improvement but still causes other VMs on the same node to have extremely slow IO during...
Not a typo, data obtained from the emailed logs sent after each backup.
16 workers does improve backup speed but it comes with a cost of 16 simultaneous IO requests for the duration of the backup.
On this particular node we had bwlimit set to 200M in Proxmox 6 and did not experience any issues.
After upgrading to Proxmox 7 we lowered bwlimit to 150M then to 80M in an effort to prevent the backup from causing IO stalls in other VMs on the node.
Last night was the first backup with...
Some initial testing shows the IO wait is much lower and backup times are not significantly slower with 8 workers.
VM 109 and 110 are a clone of 106.
I assume the clones backup faster with less IO wait because maybe they are not fragmented as much.
VM
blocksize
backup with 16 workers
backup...
We see IO stalls in other VMs on the same node.
IO wait is just a metric that has been recorded allowing comparison between the versions to demonstrate that something changed. I know that avoiding the IO stalls in the other VMs is what's important not necessarily what the actual IO wait is...
I'll make the storage.cfg change and clone a VM and see if the situation is better backing up the cloned one or not.
I doubt it will help tho.
Here is my understanding of the situation:
max_workers is the number of parallel IO requests made for the backup job.
Before this existed it was...
@fiona
All of my testing has been on a single node so this info only applies to it, the others are similar tho
I've not adjusted volblock size and is 8K
We have 128G RAM
VMs are using about 14G RAM
I have ARC limited to 80G
This node has mirrored SLOG on two nvme drives and l2arc on nvme too...
I am very happy you were able to reproduce the problem. :)
By any chance did the size of the IO requests also increase between the two versions?
According to iostat it looks like during the backup each request is 1M.
I believe that might be related to "max_chunk"
Maybe smaller sized IO requests...
@fiona I should be able to get one node running on 5.11 this weekend and will report back the results.
A couple of things I think are important:
1. This is a problem affecting ALL Proxmox users who use zfs, they just might not have noticed it. All 23 Proxmox servers were have had an increase in...
@Ales_R I've not had chance to try but it was suggested to see if the proxmox 5.11 kernel helps. One of the threads I linked too in my post mentioned 5.11 works well.
So if you can, give that a try and report back your results. If 5.11 works ok that will help us track down the cause of the...
All of these threads seem to be the same problem I am having:
https://forum.proxmox.com/threads/vm-slow-after-proxmox-pve-upgrade-6-4-7-2.113133/
https://forum.proxmox.com/threads/massive-load-spikes-since-upgrade-to-7-2.112774/...
Did you find any solutions?
I am having similar issue, the only thing in common I've found is ZFS.
On all nodes SSD or HDD IO has increased since upgrading but the HDD nodes the increase is huge.
Where we use ZFS we use RAID10 with 10 or more disks.
All of my nodes have ZFS and some of them...
@fiona The problem only happens when backing VMs that are on ZFS. Here is IO wait over the entire backup on 10 VMs. The high spikes happen when backing up VMs on ZFS storage and the lows are when backing up from LVM storage.
Any idea what changed in ZFS between Proxmox 6 and 7 that would...
@udo
I am seeing IO related issues since upgrading to Proxmox 7 that did not exist in Proxmox 6.
For me the issue seems like it might be related to io_uring:
https://forum.proxmox.com/threads/high-io-wait-during-backups-after-upgrading-to-proxmox-7.113790/
Obviously io_uring is more efficient at doing IO.
But it seems that io_uring or something else is allowing a single process to consume all the IO resulting in other processes waiting on IO.
Would IO threads be sufficient or is it also necessary to to change Async IO to threads too? What about the 'virtio scsi single' setting?
Some of the nodes already had bwlimit set to 150MB/sec, I'll try lowering it more to see if that helps or not.
Is this a bug that is being investigated...
We recently completed the upgrade to Proxmox 7.
The issue exists on two different kernels
pveversion:
pve-manager/7.2-7/d0dd0e85 (running kernel: 5.15.39-1-pve)
and
pve-manager/7.2-7/d0dd0e85 (running kernel: 5.15.35-2-pve)
Since the upgrade Io wait has increased dramatically during vzdump...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.