[SOLVED] High IO wait during backups after upgrading to Proxmox 7

e100 · Sep 1, 2022

Dunuin said:
With a volblocksize of 16K you would get 4x 4K blocks so 4 mirrors could work in parallel.

I'll make the storage.cfg change and clone a VM and see if the situation is better backing up the cloned one or not.

I doubt it will help tho.
Here is my understanding of the situation:
max_workers is the number of parallel IO requests made for the backup job.
Before this existed it was essentially 1, new default in QEMU is 64 and Proxmox lowered it to a more sane 16.

So basically when doing a backup in Proxmox 7 it tries to do 16x more read IO than it did in Proxmox 6.
Combine that with ZFS zvols not fairly sharing IO and you have my problem because this one process is allowed to saturate the IO leaving little for other processes.

fiona · Sep 2, 2022

e100 said:
@fiona

All of my testing has been on a single node so this info only applies to it, the others are similar tho
I've not adjusted volblock size and is 8K
We have 128G RAM
VMs are using about 14G RAM
I have ARC limited to 80G
This node has mirrored SLOG on two nvme drives and l2arc on nvme too.
The pool is RAID 10 with 10 SATA III mechanical disks.

I do not have any faster disks to backup onto but I did have some slower ones.
The slower disks did cause slightly higher IO wait than the faster disk.

Might be a hint that the slow target storage is actually causing the issue? Do you get the IO stalls in the VMs that are being backed up or in other VMs on the same node?

e100 said:
Could you prepare a package with max_workers=1 that I could test with to see if it resolves my problem?

A build with max_workers = 8 is available here. I'd rather have you test something that we can at least consider to roll out in general

Even if this helps in your case, we'll need to check that it doesn't impact performance for other setups too much then. The important thing is if it helps with IO stalls of course, IO wait is purely secondary.

e100 · Sep 2, 2022

fiona said:
Do you get the IO stalls in the VMs that are being backed up or in other VMs on the same node?

We see IO stalls in other VMs on the same node.

IO wait is just a metric that has been recorded allowing comparison between the versions to demonstrate that something changed. I know that avoiding the IO stalls in the other VMs is what's important not necessarily what the actual IO wait is. Clearly the IO wait I'm seeing is high enough to cause issues but getting it lowered just a little might be sufficient to make things acceptable, it might not be necessary to get it to the same as it was in Proxmox 6.

I'll get the max_workers=8 installed on a few servers and see what happens during backups over the weekend and report the results Monday.

After installing the package I'll stop/start or live migrate to another node and back each VM to ensure they are running the modified code.

e100 · Sep 2, 2022

Some initial testing shows the IO wait is much lower and backup times are not significantly slower with 8 workers.

VM 109 and 110 are a clone of 106.
I assume the clones backup faster with less IO wait because maybe they are not fragmented as much.

VM	blocksize	backup with 16 workers	backup with 8 workers	backup on Proxmox 6
110	16k	752 seconds	756 seconds	N/A
106	8k	998 seconds	1162 seconds	1941 seconds
109	8k	760 seconds	838 seconds	N/A

With 16 workers:

With 8 workers:

I'm off to install this on the servers that are affected by this the most so we can see what happens when the automated backups run this weekend.

e100 · Sep 3, 2022

On this particular node we had bwlimit set to 200M in Proxmox 6 and did not experience any issues.
After upgrading to Proxmox 7 we lowered bwlimit to 150M then to 80M in an effort to prevent the backup from causing IO stalls in other VMs on the node.
Last night was the first backup with max_workers=8 and we did not get alerts about IO stalls so that is an improvement.

The table below demonstrates that on this node:
* 8 workers backup faster than 16 workers.
* Lowering bwlimit from 200M to 150M with 16 workers resulted in a faster backup.

So it seems like on this particular node being less aggressive on the backup read IO results in a faster backup. Makes sense, not overloading the IO system should result in a faster backup.

I want to also see what the backup speed is with 8 workers with bwlimit set at 150M and 200M and should have that info in a few days.

@fiona
While I understand that making max_workers configurable would be difficult I would appreciate it if you would make a package with max_workers=1 and maybe even one with it set to 4 so we can see how it performs.
Based on what I see so far it looks like in some cases a lower max_workers actually performs better which would make a compelling case for the necessity of a configuration option.

I could also do some tests on our fastest IO/CPU servers so we can compare max_workers=1/8/16 there too

	Backup Time	Backup Size	Bwlimit
Proxmox 6	9:28:42	586.16GB	200M
Proxmox 7 w/16 workers	6:05:25	602.16GB	200M
Proxmox 7 w/16 workers	04:34:46	612.31GB	150M
Proxmox 7 w/16 workers	12:04:26	611.74GB	80M
Proxmox 7 w/8 workers	08:36:39	611.69GB	80M
Proxmox 7 w/8 workers	09:08:49	612.02GB	150M

Edit: Added line for 8 workers/150M bwlimit to the table, fixed typo in bwlimit G to M

RobFantini · Sep 3, 2022

hello e100. 04:34:46 with bandwidth limit 150G was the fastest backup on the your post today. was that a typo?

If not a typo then why not go with 16 workers and bw limit 150? [ yes i should have re read the entire thread .... ]

e100 · Sep 5, 2022

RobFantini said:
hello e100. 04:34:46 with bandwidth limit 150G was the fastest backup on the your post today. was that a typo?

If not a typo then why not go with 16 workers and bw limit 150? [ yes i should have re read the entire thread .... ]

Not a typo, data obtained from the emailed logs sent after each backup.

16 workers does improve backup speed but it comes with a cost of 16 simultaneous IO requests for the duration of the backup.

fiona · Sep 14, 2022

e100 said:
On this particular node we had bwlimit set to 200G in Proxmox 6 and did not experience any issues.
After upgrading to Proxmox 7 we lowered bwlimit to 150G then to 80G in an effort to prevent the backup from causing IO stalls in other VMs on the node.
Last night was the first backup with max_workers=8 and we did not get alerts about IO stalls so that is an improvement.

Good to hear! There's another user in the enterprise support with similar backup issues and we're currently waiting for feedback about the max_workers=8 build. Better to have more than one data point

e100 said:
@fiona

While I understand that making max_workers configurable would be difficult I would appreciate it if you would make a package with max_workers=1 and maybe even one with it set to 4 so we can see how it performs.
Based on what I see so far it looks like in some cases a lower max_workers actually performs better which would make a compelling case for the necessity of a configuration option.

I could also do some tests on our fastest IO/CPU servers so we can compare max_workers=1/8/16 there too

max_workers=8 is something we can consider rolling out if testing on different setups goes well. Reducing things further likely will just hurt performance for many setups too much. I mean, there are a lot of setups where max_workers=16 is also working fine.

e100 said:
Backup Time Backup Size Bwlimit
Proxmox 6 9:28:42 586.16GB 200G
Proxmox 7 w/16 workers 6:05:25 602.16GB 200G
Proxmox 7 w/16 workers 04:34:46 612.31GB 150G
Proxmox 7 w/16 workers 12:04:26 611.74GB 80G
Proxmox 7 w/8 workers 08:36:39 611.69GB 80G
Proxmox 7 w/8 workers 09:08:49 612.02GB 150G

Edit: Added line for 8 workers/150G bwlimit to the table

I guess the bwlimit is in MB/s, not GB/s

Well, there's a few surprising things in the table, for example 8 workers/150 limit being slower than 8 workers/80 limit, but maybe there's just a lot of variance?

e100 · Sep 14, 2022

fiona said:
Well, there's a few surprising things in the table, for example 8 workers/150 limit being slower than 8 workers/80 limit, but maybe there's just a lot of variance?

Yes M not G
I think the 80M limit is faster because it reduces the IO load, just like 8 workers reduces the IO load vs 16 workers.

8 workers is not enough to completely alleviate my issues. it is an improvement but still causes other VMs on the same node to have extremely slow IO during backups.

When multiple workers are running does the read IO become more random?
That would explain why this has a significant impact on mechanical disks.
If more workers is more random IO then on mechanical disks the fastest backup should be with a single worker.

I do not think we need to directly configure max_workers but I think it would be nice if we could select from:
fast = 16 workers
normal ( default ) = 8 workers
slow = 1 worker

That could be an option or even just a differently named qemu package we could install:
fast = pve-qemu-kvm-fast-backup
normal = pve-qemu-kvm
slow = pve-qemu-kvm-slow-backup

I just do not see how a single setting could possibly satisfy everyone.
One one extreme users with slow disks end up with crippling performance issues where on the other extreme users with expensive super fast disks cannot take full advantage of what their hardware has to offer.

leesteken · Sep 14, 2022

Doing local backups to a container with PBS (before a pull from external PBS) involves threads for reading (from fast ZFS with decompressing), compressing, transmitting on the PVE side, receiving and writing (on a slow ZFS pool with possibly compressing) on local PBS side. PVE also swaps a bit during backups, with zram and compression, so even more threads. I noticed that this can use a lot of threads and with 8-core/16-threads there are stutters in the VMs sometimes. I upgraded to a 16-core/32-thread CPU and it works without stutters. I wouldn't mind if this could be tuned a bit (by using max_workers on the PVE side).

servada · Sep 15, 2022

We were also struck by this backup iowait issue. It happens primarily with very large VM's/diffs of over 50GB and introduces a 20-40% iowait inside the VM which is not really traceable to anything else but the running backups. Once the running backup is stopped, the VM immediately becomes responsive again.

We are using Ceph RBD (librbd) as primary storage (all enterprise NVME/SSD, separate pools) and PBS on a separate (off-cluster) hypervisor for backups. We have temporarily lowered the frequency of backups from once every two hours to nightly with a maximum transfer speed of 100 MB to mitigate. Would be great if one could set the num_workers in cluster Options so everyone can tweak this setting to their own preference with a sane default of 8 or 16.

EDIT: Just found this issue as well; https://forum.proxmox.com/threads/vms-freezing-and-unreachable-when-backup-server-is-slow.96521. I guess my problems are mostly caused by the architecture of PBS backups. The symptoms seem to be alike. I am not sure how lowering num_workers would actually improve the case of a (sometimes) slower/more busy backup server, so mileage may vary,

fiona · Sep 15, 2022

Hi @leesteken and @servada,
did you already test the build with max_workers=8 to see what difference it makes for you? As always, stop/starting the VM after installing the new version or migrating to a node with the new version installed is necessary to actually pick up the new version.

servada · Sep 15, 2022

fiona said:
Hi @leesteken and @servada,
did you already test the build with max_workers=8 to see what difference it makes for you? As always, stop/starting the VM after installing the new version or migrating to a node with the new version installed is necessary to actually pick up the new version.

We have not tested the modified build yet. We're a bit wary of installing this on our production cluster. Is the new package modified against the latest version available in the enterprise repo? Would love to test and report back but I cannot afford any more stalls on workload resources, would first have to create some big 'test' VM's then. Will try to see if I can get that sorted out in our environment.

fiona · Sep 15, 2022

servada said:
We have not tested the modified build yet. We're a bit wary of installing this on our production cluster. Is the new package modified against the latest version available in the enterprise repo?

Currently the version in the enterprise repository is 7.0.0-2, but the modified build is on top of 7.0.0-3, which additionally includes two snapshot-related fixes, but that's it.

Egner · Sep 25, 2022

What ever you do don't upgrade to the latest version of proxmox ve 7.2.11 because it has iops issues.

leesteken · Sep 25, 2022

fiona said:
Hi @leesteken and @servada,
did you already test the build with max_workers=8 to see what difference it makes for you? As always, stop/starting the VM after installing the new version or migrating to a node with the new version installed is necessary to actually pick up the new version.

I replaced the current 7.0.0-3 with your version and backups run fine. CPU usage peaks are now at 55% instead of 90% or so, which makes it run more smoothly. The duration (of a cold backup the whole system) does not appear to change, which is good.

fiona · Sep 26, 2022

Hi,

Egner said:
What ever you do don't upgrade to the latest version of proxmox ve 7.2.11 because it has iops issues.

do you experience the issues only during backup or in general? In the first case, please try the QEMU build with reduced max_workers. In the latter case, please open a new thread with more information.

leesteken said:
I replaced the current 7.0.0-3 with your version and backups run fine. CPU usage peaks are now at 55% instead of 90% or so, which makes it run more smoothly. The duration (of a cold backup the whole system) does not appear to change, which is good.

Thank you for the feedback! The customer in the enterprise portal also gave positive feedback (although they upgraded to kernel 5.19 at the same time, I'd say the probability that the max_workers build made the most difference is rather high). So there are three data points now.

We'll re-think if we can reduce the setting in general or if it's worth exposing (likely as a vzdump setting). The latter would likely be the only way to make everybody happy, but it's really not too nice from a design perspective.

leesteken · Sep 26, 2022

fiona said:
Thank you for the feedback! The customer in the enterprise portal also gave positive feedback (although they upgraded to kernel 5.19 at the same time, I'd say the probability that the max_workers build made the most difference is rather high). So there are three data points now.

I'm also using the 5.19 kernel since before my test. Sorry, I did not realize that it mattered but I did see the load reduced with your change on the same kernel.

fiona said:
We'll re-think if we can reduce the setting in general or if it's worth exposing (likely as a vzdump setting). The latter would likely be the only way to make everybody happy, but it's really not too nice from a design perspective.

PBS, ZFS, zstd, PVE, and probably some other parts of the same system, all assume they can use all the threads to maximize performance. The context switching and deep I/O queue caused by all this can actually reduces latency/smooth experience. But this might be an uncommon case as I run everything on a single CPU (16c/32t).
Some kind of configuration (like Maximal Workers or /etc/vzdump.conf) would be nice to at least have a chance to tune how "invasive" regular background backups are.

otrissystems · Nov 9, 2022

fiona said:
Hi,

do you experience the issues only during backup or in general? In the first case, please try the QEMU build with reduced max_workers. In the latter case, please open a new thread with more information.

Thank you for the feedback! The customer in the enterprise portal also gave positive feedback (although they upgraded to kernel 5.19 at the same time, I'd say the probability that the max_workers build made the most difference is rather high). So there are three data points now.

We'll re-think if we can reduce the setting in general or if it's worth exposing (likely as a vzdump setting). The latter would likely be the only way to make everybody happy, but it's really not too nice from a design perspective.

7.0-4 includes https://git.proxmox.com/?p=pve-qemu.git;a=commit;h=ed01236593ef55a0a3e646ab307dc1b0728563d0 now, how can we configure the max_workers setting?

fiona · Nov 9, 2022

Hi,

bytemine said:
7.0-4 includes https://git.proxmox.com/?p=pve-qemu.git;a=commit;h=ed01236593ef55a0a3e646ab307dc1b0728563d0 now, how can we configure the max_workers setting?

yes, it will be possible with the next pve-manager version, i.e. >= 7.2-12, but it has not been packaged yet.

[SOLVED] High IO wait during backups after upgrading to Proxmox 7

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Renowned Member

Famous Member

Renowned Member

Proxmox Staff Member

Renowned Member

Distinguished Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

Renowned Member

Distinguished Member

Proxmox Staff Member

Distinguished Member

Active Member

Proxmox Staff Member

We value your privacy