Server-Disk I/O delay 100% during cloning and backup

After 2 days of testing I wasn't able to reproduce my IO load issue on the test systems.

I'm running the same hardware as production: HP DL360 Gen10 w/ P408i-a, HP DL360 Gen9 w/ P440ar and HP DL360p Gen8 w/ P420i (4 x 2TB 870 EVO in RAID10, Smart Path enabled).
And the exact same versions of PVE: 8.3.0 (migrating from it) and 9.0.11

Until now I haven't seen the same "dd" process pop up at the end of the migration process and using 100% of IO and freezing the system and crashing some of the VMs, no matter what I did: cloning, live migration, migration of shutdown VMs

Also, my production PVE 9.0.11 HP DL360 Gen10 w/ P408i-a has no issues when running backup tasks and/or cloning, only when migrating VMs from PVE 8.3.0 and restoring them from the yet to be updated PBS v3.4.6

It's weird... I'll keep trying to replicate and diagnose the issue and update here if I make any progress.

Thanks!
 
I’m still having this problem and I’m not really making any progress.
Does anyone have any ideas on what else I could try to get closer to a solution?
I’m completely stuck and can’t figure it out on my own.
 
Isn't this what happens when cloning with consumer NVMe?

kioxia kbg40zns256gb
IO Pressure Stall 60%

This doesn't happen with enterprise SAS.

husmm3280ass200
IO Pressure Stall 1.4%

Did you purchase an enterprise SSD?

*The above shows the results of cloning the same virtual machine onto each SSD after reviewing this thread.
 
Last edited:
  • Like
Reactions: UdoB
I’m still having this problem and I’m not really making any progress.
Does anyone have any ideas on what else I could try to get closer to a solution?
I’m completely stuck and can’t figure it out on my own.
Were these ssd's used before? I had very similar symptoms with zfs, then it turned out, that unless you explicitly set the org.debian:periodic-trim pool property to enable, the monthly cronjob for trimming won't do anything with sata and sas disks. The default behavior with the property set to auto only trims nvme disks. You can manually start a pool trim with zpool trim <poolname>. Also you can check the last time the pools were trimmed with zpool status -t
I've seen you mentioned ESXi on the same hardware, on vmfs automatic space reclamation is enabled by default.
 
The exact SSD model has not been specified yet; please provide the exact model.

Furthermore, using volumes from the HP Smart Array P408i-a SR Gen10 with ZFS is not only not recommended, it is explicitly discouraged.

*No one creates such environments that are considered non-recommended.
 
Last edited:
Were these ssd's used before? I had very similar symptoms with zfs, then it turned out, that unless you explicitly set the org.debian:periodic-trim pool property to enable, the monthly cronjob for trimming won't do anything with sata and sas disks. The default behavior with the property set to auto only trims nvme disks. You can manually start a pool trim with zpool trim <poolname>. Also you can check the last time the pools were trimmed with zpool status -t
I've seen you mentioned ESXi on the same hardware, on vmfs automatic space reclamation is enabled by default.
Yes, the SSDs have been in use the whole time
The problem does not occur only with ZFS - it also happens with a hardware RAID (Yes, I know that hardware RAID is not recommended, but I happen to have this hardware)

I have not explicitly enabled TRIM
Apart from the package repositories and the analysis tools mentioned in my previous posts here, PVE is still in its default (factory) state
 
The exact SSD model has not been specified yet; please provide the exact model.

Furthermore, using volumes from the HP Smart Array P408i-a SR Gen10 with ZFS is not only not recommended, it is explicitly discouraged.

*No one creates such environments that are considered non-recommended.
I don’t have the hardware with me right now, but these are explicitly enterprise SSDs, not consumer-grade drives
As I already mentioned in the previous post, I’m using the hardware exactly as it is

If the RAID controller is in HBA mode (which it is in the ZFS setup), what impact is it supposed to have?
In HBA mode it should just pass the disks through 1:1 and not do anything on its own

I need the RAID controller in order to connect the drives to the server

Where do you know from that this RAID controller is discouraged?
Is there a list of discouraged/unsupported RAID controllers somewhere?
 
Please understand that ZFS does not recommend hardware RAID when building your system.

In a sense, it's close to common sense. You should look into it.

Edit: I'm correcting this because I just learned it's being used in HBA mode.
I don't know what kind of results you'd get in HBA mode either. If the controller doesn't handle I/O, it should be the same as an HBA
 
Last edited:
Yes, the SSDs have been in use the whole time
The problem does not occur only with ZFS - it also happens with a hardware RAID (Yes, I know that hardware RAID is not recommended, but I happen to have this hardware)

I have not explicitly enabled TRIM
Apart from the package repositories and the analysis tools mentioned in my previous posts here, PVE is still in its default (factory) state
If at least as much data was written to the ssd's as their capacity without trimming, this kind of slowdowns are expected regardless of the storage subsystem in use. With lvm on a hw-raid controller, I would create a separate logical volume for the pve installation and vm data, then blkdiscard the vm data block device first, only after that create the lvm-thin storage on top of it. Also setting issue_discards = 1 to lvm.conf could help, and also enable discard on the individual vm disks. The latter should be used with zfs too - so the trim/discard commands from the vm can reach the storage system of the host.
 
Please understand that ZFS does not recommend hardware RAID when building your system.

In a sense, it's close to common sense. You should look into it.
It may be true that a hardware RAID controller is generally “not recommended” in some setups
However, as I already explained, it is required in my case in order to connect the drives directly to the server

Also, I can reproduce the exact same problem on a desktop PC with two 2.5" SATA enterprise SSDs
There is no RAID controller installed there, and the result is identical

From my perspective, this means the issue cannot be caused by the RAID controller - regardless of whether it is recommended or not
 
If at least as much data was written to the ssd's as their capacity without trimming, this kind of slowdowns are expected regardless of the storage subsystem in use. With lvm on a hw-raid controller, I would create a separate logical volume for the pve installation and vm data, then blkdiscard the vm data block device first, only after that create the lvm-thin storage on top of it. Also setting issue_discards = 1 to lvm.conf could help, and also enable discard on the individual vm disks. The latter should be used with zfs too - so the trim/discard commands from the vm can reach the storage system of the host.
No, the drives are 4 TB each and there are 8 drives installed in total
The VM data itself is only around 500 GB

Also, discard is enabled on the virtual disks of my VMs
 
I am not familiar with enterprise-grade SSDs in capacities such as 256GB, 2TB, or 4TB, so I will refrain from commenting unless specific models are provided.
 
I have set a bwlimit
With that, I can prevent the utilization from reaching 100%
With the following configuration, it only goes up to about 60%:

--bwlimit clone=500000,migration=500000,move=500000,restore=500000
 
No, the drives are 4 TB each and there are 8 drives installed in total
The VM data itself is only around 500 GB

Also, discard is enabled on the virtual disks of my VMs
Sorry, i wasn't clear enough :) It's data written to the disks over their entire lifetime up to this date.
If all of the blocks are written to without discarding blocks containing deleted data, then all new writes require a discard (done by the ssd controller) on the blocks before you can start the actual write, increasing latency.
 
Sorry, i wasn't clear enough :) It's data written to the disks over their entire lifetime up to this date.
If all of the blocks are written to without discarding blocks containing deleted data, then all new writes require a discard (done by the ssd controller) on the blocks before you can start the actual write, increasing latency.
Okay, that makes sense to me

Can you tell me how I can enable that?
I’d like to test whether this fixes the problem
 
Okay, that makes sense to me

Can you tell me how I can enable that?
I’d like to test whether this fixes the problem
You can test it with the zpool trim <poolname> command. It shouldn't take much time, but you can check it's status with zpool status <poolname> -t. If it was the problem, you can enable the periodic trimming by zfs set org.debian:periodic-trim=enable <poolname>.
 
  • Like
Reactions: SteveITS
You can test it with the zpool trim <poolname> command. It shouldn't take much time, but you can check it's status with zpool status <poolname> -t. If it was the problem, you can enable the periodic trimming by zfs set org.debian:periodic-trim=enable <poolname>.
Thank you very much!
I will test it and provide feedback