Server-Disk I/O delay 100% during cloning and backup

LSX3 · Sep 30, 2025

Hello everyone,

I am experiencing a reproducible issue with Proxmox VE 9.0. After a fresh installation, I am using the default LVM-Thin datastore that PVE creates during setup.

Whenever I create a full clone from a VM template (disk format is raw), the I/O delay immediately spikes to 100%. As a result, other running VMs freeze completely and no longer respond. Even the QEMU guest agent does not respond anymore. The only way to recover is to issue a Stop command on the affected VMs.

Steps to reproduce:

Fresh install of PVE 9.0
Keep the default LVM-Thin datastore
Create a VM template
Perform a full clone from the template (disk = raw)
I/O delay spikes to 100% → other VMs freeze → only “Stop” helps

Additional notes:

The problem occurs on multiple reinstalled hosts, always the same result.
No special configuration changes were made to storage or VMs.
Using full clone, not linked clone.
Disk format = raw.

Has anyone else seen this behavior on PVE 9.0 with LVM-Thin?

Any suggestions or workarounds would be greatly appreciated.

Thanks in advance!

Impact · Sep 30, 2025

Please share lsblk -o+FSTYPE,MODEL. I want to see what kind of disks you're using.

LSX3 · Sep 30, 2025

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 1 0B 0 disk
sdb 8:16 0 22.4T 0 disk
├─sdb1 8:17 0 1007K 0 part
├─sdb2 8:18 0 1G 0 part /boot/efi
└─sdb3 8:19 0 22.4T 0 part
├─pve-swap 252:0 0 8G 0 lvm [SWAP]
├─pve-root 252:1 0 200G 0 lvm /
├─pve-data_tmeta 252:2 0 15.9G 0 lvm
│ └─pve-data-tpool 252:4 0 22.1T 0 lvm
│ ├─pve-data 252:5 0 22.1T 1 lvm
└─pve-data_tdata 252:3 0 22.1T 0 lvm
└─pve-data-tpool 252:4 0 22.1T 0 lvm
├─pve-data 252:5 0 22.1T 1 lvm

Impact · Sep 30, 2025

Hmm, no proper model shown. Judging from the size and name I assume this is some kind of RAID or fake RAID? Can you check the model number of the drive(s) and what kind of RAID it is?

LSX3 · Sep 30, 2025

Impact said:
Hmm, no proper model shown. Judging from the size and name I assume this is some kind of RAID or fake RAID? Can you check the model number of the drive(s) and what kind of RAID it is?

Yes, this is an HP ML350 Gen10
It is equipped with an HP Smart Array P408i-a SR Gen10 RAID controller (2GB cache, 8-port modular, RAID levels 0, 1, 10, 5, 50, 6, 60, HBA mode, including BBU)
There are 8 enterprise SSDs installed, configured as a RAID6
The RAID controller provides the RAID6 volume, which is used by PVE as the datastore

LSX3 · Oct 1, 2025

I have now removed my hard drives from the RAID controller and connected them directly
After that, I created a ZFS pool
I'm still able to reproduce the same behavior
Can anyone else reproduce this issue as well?

Impact · Oct 1, 2025

Maybe you could try some IO troubleshooting via iotop-c, zpool iostat, iostat and so on: https://gist.github.com/Impact123/3dbd7e0ddaf47c5539708a9cbcaab9e3#io-debugging

LSX3 · Oct 1, 2025

Can you test whether you can reproduce the problem on your system?

This is my VM configuration:

LSX3 · Oct 4, 2025

Can you test whether you can reproduce the problem on your system?

LSX3 · Oct 12, 2025

I’m experiencing the same issue where the I/O utilization goes up to 100% when restarting VMs — apparently whenever some load is generated.
Is anyone else having this problem?
Can anyone reproduce this behavior?

The problem occurs with both an LVM thin volume and ZFS.

LSX3 · Oct 22, 2025

I'm completely out of ideas. I'm seeing the same issue with the backup, and it occurs consistently.
Has anyone else experienced or can reproduce this behavior?

LSX3 · Oct 26, 2025

I'm running into the same problem again, and I'm close to not being able to properly use Proxmox because my VMs keep freezing.
Can anyone confirm this behavior?
Is anyone else experiencing this, or has someone maybe already found a solution?

ripper12004 · Oct 27, 2025

Hello,

Running into the same issue here with PVE 9.0.11 running on a HP DL360 Gen10 using P408i-a SR and 4 consumer SSDs in RAID10 (used as PVE datastore).

Whenever I migrate or restore a VM from backup the IO usage spikes to 90% for minutes at a time (depending on how large the VM is) and all running VMs freeze (some recover, some do not and require a forced stop).

This issue seems to be worse when migrating/restoring larger VMs (over 32GB), because that's when the IO usage stays high for 3-10 minutes at a time.

Haven't been able to find a solution for this yet, but I can consistently replicate the issue.
Will try to troubleshoot with iotop-c, but at this time I do not have the delayacct kernel arg active at boot.

Edit: IO load during backup seems normal, around 5-6%

Edit 2: iotop-c shows dd using 100% of IO (for about 2 minutes after copying the data from the remote host) during a migration from PVE8 to PVE9 with the VM powered off

_gabriel · Oct 27, 2025

ripper12004 said:
4 consumer SSDs

it's expected as they can't substain writes.

ripper12004 · Oct 27, 2025

I know and expect some issues using consumer SSDs (these are 2TB SSDs with DRAM cache).

What I can't explain is why the same 4 SSDs in RAID10 on a HP DL360 Gen9 using Smart Array P840 and running PVE8, and even the same 4 SSDs in RAID10 on a HP DL360 Gen8 using Smart Array P420i also running PVE8, do not have this issue regarless of the VM size.
To me, at least right now, it seems to be something related to HP Gen10 with P408i RAID controller and PVE9...

_gabriel · Oct 27, 2025

no ZFS ?

ripper12004 · Tuesday at 06:40

No ZFS, the drives are not exposed to PVE.
Just the RAID controller running in Mixed mode, using a RAID10 array of 4 Samsung 870 2TB SSDs as datastore.

I have a couple of Gen10 machines with the same P408i controller I'm not using at the moment.
I'll spin up a PVE cluster with them to test different storage configurations and settings and report here in a couple of days.
Thanks

_gabriel · Tuesday at 08:10

if these are Samsung QVO, they are not suitable at all.
QLC drives write very slow after cache (which is different from DRAM cache).
There is plenty topics on the forum about it, mainly with ZFS because it reveals faster the slowness.

ripper12004 · Tuesday at 09:15

All drives are the EVO variant as I know to stay away from the QVO ones.
I will install PVE 8 and 9 on the same Gen10 hardware and compare between them, and I will also install PVE 9 on Gen9 hardware.
Will post some conclusions here as soon as possible.
Thank you

LSX3 · Thursday at 12:42

I’m experiencing the same problem with my test host.
This host has the following hardware:

Intel Xeon CPU E5-2698 v3 – 2.30GHz – 2 sockets
600 GB RAM
6 × 2 TB enterprise SSDs
1 × 256 GB enterprise SSD (for the PVE operating system)

The six drives are configured as a ZFS RAIDZ2 (RAID6).
I can reproduce the problem on this system as well.

Server-Disk I/O delay 100% during cloning and backup

New Member

Renowned Member

New Member

Renowned Member

New Member

New Member

Renowned Member

New Member

New Member

New Member

New Member

New Member

New Member

Famous Member

New Member

Famous Member

New Member

Famous Member

New Member

New Member

We value your privacy