[SOLVED] Huge I/O delays on backup/restore jobs

molnart

Member
Jul 30, 2020
28
6
8
40
I am running PVE with the config a single SSD contains the PVE root and lvm volumes for VMs. One VM is running OpenMediaVault and all the HDDs are passed trough to this VM. OMV is then providing NFS shares to other VMs as well as Proxmox itself for VM backups.

Now my issue is, that everytime I do a task which requires heavy disk access (even if on the local lvm volumes and not the storage shared by the OMV VM) i get huge I/O delays, effectively stalling all my VMs.

cloning a VM is outright impossible - even when i shutdown all the other VMs the cloning stops at around 6% and nothing happens. IO delay stays on even after I stop the cloning task.

backing up a VM shows read/write speeds dropping down occasionally to <5 MB/s, then speeding up to "normal" speeds.

i feel there are some filesystem optimizations i did not perform that should improve this process.
 
Last edited:
Hi,
could you share more details about the hardware you use (disk models, CPU, RAM, etc.)? Also, please use smartctl -H to check if your disks are healthy.

For the storage(s) accessible from PVE, you can set bandwith-limits by adding the bwlimit option with appropriate values to your storage in /etc/pve/storage.cfg, for example
Code:
lvmthin: local-lvm
    thinpool data
    vgname pve
    content rootdir,images
    bwlimit default=200000,clone=100000,restore=150000
The values are in KiB/s.
 
i am using a Xeon E3-1265Lv2 CPU (4c/8t), 16 GB ECC RAM, 525 GB Crucial MX300 SSD for PVE and VM data and 4x WD RED EFRX drives passed trough the OMV VM. i am regularly monitoring the SMART data of my drives and they don't seem to have any issues.

however, I already twice faced as issue with my VM disk dying - the VM got stuck in reboot and looking at the VM disk from gparted show unallocated capacity. not sure if the IO delays and this are related
 
Last edited:
If the SSD is the culprit, it would explain why cloning (whrere the SSD is both source and target of the operation) has a bigger impact than backing up. You said, cloning even fails after you shut down all other VMs. What if you shut down all VMs? Is there any difference on the SSD when you disable passthrough for the other disks?

however, I already twice faced as issue with my VM disk dying - the VM got stuck in reboot and looking at the VM disk from gparted show unallocated capacity. not sure if the IO delays and this are related
How much unallocated space? Was any of the data within the VM affected?
 
or maybe it's an LVM corruption, as I did not face any issue with proxmox itself. i also have some 20 GB LXC containers on the local volume that are unaffected. HDparm read and dd write tests seem to be OK.

yesterday two of my VMs failed - unbootable disks. I was not able to recover any data from the virtual drives, as they show up unallocated. I have regular automated weekly backups, but i was not even able to restore them to the SSD - it was very slow, then it got stuck at 100% for 10 mins (this was a 2G VM drive). this happend with all VMs stopped.

this is how my logical volumes look like:
Code:
root@pve:/# lvs
File descriptor 7 (pipe:[2296777]) leaked on lvs invocation. Parent PID 38988: bash
  LV                            VG  Attr       LSize   Pool Origin                        Data%  Meta%  Move Log Cpy%Sync Convert
  data                          pve twi-aotz-- 359.66g                                    31.08  1.89
  root                          pve -wi-ao----  27.75g
  snap_vm-103-disk-0_v3_11_6    pve Vri---tz-k   8.00g data vm-103-disk-0
  snap_vm-103-disk-0_v3_12      pve Vri---tz-k   8.00g data vm-103-disk-0
  snap_vm-104-disk-0_full_setup pve Vri---tz-k  64.00g data
  snap_vm-104-disk-0_v20200821  pve Vri---tz-k  64.00g data
  swap                          pve -wi-ao----   8.00g
  vm-100-disk-0                 pve Vwi-a-tz--  64.00g data                               31.52
  vm-101-disk-0                 pve Vwi-a-tz--   8.00g data                               40.14
  vm-101-disk-1                 pve Vwi-a-tz--   8.00g data                               40.14
  vm-102-disk-0                 pve Vwi-aotz--  80.00g data                               19.00
  vm-103-disk-0                 pve Vwi-aotz--   8.00g data                               4.29
  vm-104-disk-0                 pve Vwi-aotz--  64.00g data snap_vm-104-disk-0_full_setup 67.46
  vm-104-state-full_setup       pve Vwi-a-tz-- <16.49g data                               33.44
  vm-104-state-v20200821        pve Vwi-a-tz-- <16.49g data                               5.45
  vm-105-disk-0                 pve Vwi-a-tz--  64.00g data                               0.01

that file descriptor message on the beginning shows up with all lvm commands, but according to google it's harmless

edit: i have tried to restore a backup once again. the disk is 64 GB, thin-provisioned with less than 9GB of actual data. I have set a limit of 50 MB/s and the restore took 1h25m with 50-60% IO delays.

Code:
()
applying read rate limit: 51200
restore vma archive: cstream -t 52428800 -- /mnt/backup/Backups/dump/vzdump-qemu-100-2020_09_05-03_00_03.vma.zst | zstd -q -d -c - | vma extract -v -r /var/tmp/vzdumptmp50300.fifo - /var/tmp/vzdumptmp50300
CFG: size: 1253 name: qemu-server.conf
DEV: dev_id=1 size: 68719476736 devname: drive-sata0
CTIME: Sat Sep  5 03:00:04 2020
  Logical volume "vm-100-disk-0" successfully removed
  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
  Logical volume "vm-100-disk-0" created.
  WARNING: Sum of all thin volume sizes (<472.98 GiB) exceeds the size of thin pool pve/data and the amount of free space in volume group (88.13 GiB).
new volume ID is 'local-lvm:vm-100-disk-0'
map 'drive-sata0' to '/dev/pve/vm-100-disk-0' (write zeros = 0)
progress 1% (read 687210496 bytes, duration 3 sec)
progress 2% (read 1374420992 bytes, duration 43 sec)
progress 3% (read 2061631488 bytes, duration 160 sec)
progress 4% (read 2748841984 bytes, duration 272 sec)
progress 5% (read 3435986944 bytes, duration 390 sec)
progress 6% (read 4123197440 bytes, duration 504 sec)
progress 7% (read 4810407936 bytes, duration 603 sec)
progress 8% (read 5497618432 bytes, duration 693 sec)
progress 9% (read 6184763392 bytes, duration 771 sec)
progress 10% (read 6871973888 bytes, duration 845 sec)
progress 11% (read 7559184384 bytes, duration 874 sec)
progress 12% (read 8246394880 bytes, duration 918 sec)
progress 13% (read 8933539840 bytes, duration 1000 sec)
progress 14% (read 9620750336 bytes, duration 1180 sec)
progress 15% (read 10307960832 bytes, duration 1241 sec)
progress 16% (read 10995171328 bytes, duration 1436 sec)
progress 17% (read 11682316288 bytes, duration 1722 sec)
progress 18% (read 12369526784 bytes, duration 1919 sec)
progress 19% (read 13056737280 bytes, duration 2094 sec)
progress 20% (read 13743947776 bytes, duration 2252 sec)
progress 21% (read 14431092736 bytes, duration 2512 sec)
progress 22% (read 15118303232 bytes, duration 2674 sec)
progress 23% (read 15805513728 bytes, duration 2794 sec)
progress 24% (read 16492724224 bytes, duration 2963 sec)
progress 25% (read 17179869184 bytes, duration 3092 sec)
progress 26% (read 17867079680 bytes, duration 3224 sec)
progress 27% (read 18554290176 bytes, duration 3418 sec)
progress 28% (read 19241500672 bytes, duration 3450 sec)
progress 29% (read 19928711168 bytes, duration 3472 sec)
progress 30% (read 20615856128 bytes, duration 3476 sec)
progress 31% (read 21303066624 bytes, duration 3478 sec)
progress 32% (read 21990277120 bytes, duration 3558 sec)
progress 33% (read 22677487616 bytes, duration 3569 sec)
progress 34% (read 23364632576 bytes, duration 3580 sec)
progress 35% (read 24051843072 bytes, duration 3580 sec)
progress 36% (read 24739053568 bytes, duration 3581 sec)
progress 37% (read 25426264064 bytes, duration 3581 sec)
progress 38% (read 26113409024 bytes, duration 3581 sec)
progress 39% (read 26800619520 bytes, duration 3582 sec)
progress 40% (read 27487830016 bytes, duration 3582 sec)
progress 41% (read 28175040512 bytes, duration 3582 sec)
progress 42% (read 28862185472 bytes, duration 3582 sec)
progress 43% (read 29549395968 bytes, duration 3596 sec)
progress 44% (read 30236606464 bytes, duration 3599 sec)
progress 45% (read 30923816960 bytes, duration 3600 sec)
progress 46% (read 31610961920 bytes, duration 3601 sec)
progress 47% (read 32298172416 bytes, duration 3601 sec)
progress 48% (read 32985382912 bytes, duration 3602 sec)
progress 49% (read 33672593408 bytes, duration 3602 sec)
progress 50% (read 34359738368 bytes, duration 3602 sec)
progress 51% (read 35046948864 bytes, duration 3605 sec)
progress 52% (read 35734159360 bytes, duration 3605 sec)
progress 53% (read 36421369856 bytes, duration 3607 sec)
progress 54% (read 37108580352 bytes, duration 3607 sec)
progress 55% (read 37795725312 bytes, duration 3607 sec)
progress 56% (read 38482935808 bytes, duration 3607 sec)
progress 57% (read 39170146304 bytes, duration 3608 sec)
progress 58% (read 39857356800 bytes, duration 3613 sec)
progress 59% (read 40544501760 bytes, duration 3613 sec)
progress 60% (read 41231712256 bytes, duration 3613 sec)
progress 61% (read 41918922752 bytes, duration 3618 sec)
progress 62% (read 42606133248 bytes, duration 3618 sec)
progress 63% (read 43293278208 bytes, duration 3618 sec)
progress 64% (read 43980488704 bytes, duration 3622 sec)
progress 65% (read 44667699200 bytes, duration 3622 sec)
progress 66% (read 45354909696 bytes, duration 3626 sec)
progress 67% (read 46042054656 bytes, duration 3686 sec)
progress 68% (read 46729265152 bytes, duration 3687 sec)
progress 69% (read 47416475648 bytes, duration 3687 sec)
progress 70% (read 48103686144 bytes, duration 3687 sec)
progress 71% (read 48790831104 bytes, duration 3687 sec)
progress 72% (read 49478041600 bytes, duration 3687 sec)
progress 73% (read 50165252096 bytes, duration 3699 sec)
progress 74% (read 50852462592 bytes, duration 3857 sec)
progress 75% (read 51539607552 bytes, duration 3911 sec)
progress 76% (read 52226818048 bytes, duration 3911 sec)
progress 77% (read 52914028544 bytes, duration 3912 sec)
progress 78% (read 53601239040 bytes, duration 3912 sec)
progress 79% (read 54288449536 bytes, duration 4131 sec)
progress 80% (read 54975594496 bytes, duration 4298 sec)
progress 81% (read 55662804992 bytes, duration 4298 sec)
progress 82% (read 56350015488 bytes, duration 4298 sec)
progress 83% (read 57037225984 bytes, duration 4299 sec)
progress 84% (read 57724370944 bytes, duration 4299 sec)
progress 85% (read 58411581440 bytes, duration 4345 sec)
progress 86% (read 59098791936 bytes, duration 4346 sec)
progress 87% (read 59786002432 bytes, duration 4347 sec)
progress 88% (read 60473147392 bytes, duration 4347 sec)
progress 89% (read 61160357888 bytes, duration 4347 sec)
progress 90% (read 61847568384 bytes, duration 4347 sec)
progress 91% (read 62534778880 bytes, duration 4349 sec)
progress 92% (read 63221923840 bytes, duration 4349 sec)
progress 93% (read 63909134336 bytes, duration 4349 sec)
progress 94% (read 64596344832 bytes, duration 4349 sec)
progress 95% (read 65283555328 bytes, duration 4349 sec)
progress 96% (read 65970700288 bytes, duration 4349 sec)
progress 97% (read 66657910784 bytes, duration 4352 sec)
progress 98% (read 67345121280 bytes, duration 4589 sec)
progress 99% (read 68032331776 bytes, duration 4612 sec)
progress 100% (read 68719476736 bytes, duration 4612 sec)
total bytes read 68719476736, sparse bytes 47230992384 (68.7%)
space reduction due to 4K zero blocks 0.795%
rescan volumes...
TASK OK

is there a way to to a performance test on the LVM part of the disk?
 
Last edited:
so this seems like a kernel issue (?)

any idea how to trace it further?

Code:
[ 1813.398698] INFO: task systemd-udevd:6145 blocked for more than 362 seconds.
[ 1813.398720]       Tainted: P           O      5.4.60-1-pve #1
[ 1813.398732] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1813.398749] systemd-udevd   D    0  6145    394 0x00000324
[ 1813.398751] Call Trace:
[ 1813.398758]  __schedule+0x2e6/0x6f0
[ 1813.398761]  ? fsnotify_grab_connector+0x4e/0x90
[ 1813.398762]  schedule+0x33/0xa0
[ 1813.398764]  schedule_preempt_disabled+0xe/0x10
[ 1813.398765]  __mutex_lock.isra.10+0x2c9/0x4c0
[ 1813.398768]  ? exact_lock+0x11/0x20
[ 1813.398769]  ? disk_map_sector_rcu+0x70/0x70
[ 1813.398770]  __mutex_lock_slowpath+0x13/0x20
[ 1813.398771]  mutex_lock+0x2c/0x30
[ 1813.398772]  __blkdev_get+0x7a/0x560
[ 1813.398774]  blkdev_get+0xef/0x150
[ 1813.398775]  ? blkdev_get_by_dev+0x50/0x50
[ 1813.398776]  blkdev_open+0x87/0xa0
[ 1813.398778]  do_dentry_open+0x143/0x3a0
[ 1813.398779]  vfs_open+0x2d/0x30
[ 1813.398781]  path_openat+0x2e9/0x16f0
[ 1813.398784]  ? unlock_page_memcg+0x12/0x20
[ 1813.398786]  ? page_add_file_rmap+0x131/0x190
[ 1813.398787]  ? wp_page_copy+0x37b/0x750
[ 1813.398789]  do_filp_open+0x93/0x100
[ 1813.398791]  ? __alloc_fd+0x46/0x150
[ 1813.398792]  do_sys_open+0x177/0x280
[ 1813.398793]  __x64_sys_openat+0x20/0x30
[ 1813.398795]  do_syscall_64+0x57/0x190
[ 1813.398797]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1813.398798] RIP: 0033:0x7fbc5735e1ae
[ 1813.398802] Code: Bad RIP value.
[ 1813.398803] RSP: 002b:00007ffedf8ee220 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 1813.398804] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbc5735e1ae
[ 1813.398804] RDX: 0000000000080000 RSI: 0000557adbeb6950 RDI: 00000000ffffff9c
[ 1813.398805] RBP: 00007fbc56b7dc60 R08: 0000557adb452270 R09: 000000000000000f
[ 1813.398805] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
[ 1813.398806] R13: 0000000000000000 R14: 0000000000000000 R15: 0000557adbebe450

seems like the same issue described here https://forum.proxmox.com/threads/thin-lvm-problem.71582/
 
Last edited:
Searched around a bit more. Here might be a possible explanation. Did you already try the workarounds mentioned here, i.e. running udevadm trigger when the operation is hanging, or changing obtain_device_list_from_udev from 1 to 0 in the /etc/lvm/lvm.conf?
 
i was suspecting this is a disk issue, but appartently not. i did the following:
- replaced my system SSD with an old 120 G ssd that is near death actually (150 TBW). reinstalled pve on it, tried to restore the backup of one VM, went flawlessly with 200 MB/s
- i have tested the original SSD in a windows system and indeed showed some sectors with slow writing speeds, so i identified the disk was the problem
- bought a brand new samsung 860 EVO ssd, reinstalled PVE again, tried to restore one VM and it goes again with 12 MB/s with huuge IO delays

udevadm trigger does not change anything. the udevd blocked message did not appear (yet) but there is something seriosly wrong in my system....

it seems as if it would be related to the size of the volume. old 120 GB disk works fine, two current 500 GB disks are slow as hell
 
Was the restore done via the NFS shares? Is it also slow when you restore from a backup on the SSD itself? Also try immediately after a reboot; if this makes a significant difference, the swap might be a problem.
With your kind of hardware, you might also want to consider not using LVM, and simply use a filesystem for the VM images instead. When you install PVE on top of debian, you have more options when partitioning.
 
[SOLVED]
in the end the issue was unrelated to proxmox or the disk drives. it seems for some reason my HP Microserver Gen8 has very slow disk speeds when the controller is in AHCI mode. Enabling RAID mode increased disk I/O significantly. I don't now why is this happening, as the performance of two modes should be the same but thats for a different investigation
 
[SOLVED]
in the end the issue was unrelated to proxmox or the disk drives. it seems for some reason my HP Microserver Gen8 has very slow disk speeds when the controller is in AHCI mode. Enabling RAID mode increased disk I/O significantly. I don't now why is this happening, as the performance of two modes should be the same but thats for a different investigation
I'm just curious... I'm having similar issue, dead slow restore from backup, on a Dell Poweredge, I have 3 enormous 4TB sata diks, RAIDZ1 ZFS pool, AHCI mode: I've seen from the bios that RAID mode is also available (which sounds weird to me because there is NO raid card on the server) but i fear I'll break the ZFS pool...
 
[SOLVED]
in the end the issue was unrelated to proxmox or the disk drives. it seems for some reason my HP Microserver Gen8 has very slow disk speeds when the controller is in AHCI mode. Enabling RAID mode increased disk I/O significantly. I don't now why is this happening, as the performance of two modes should be the same but thats for a different investigation
Hi,
I have the same pb with my HP Proliant DL360P G8 Server (53 mn for cloning VM Ubuntu with 32 Go disk).
Can you tell me how verify if my controller is in AHCI mode ?

Best regards,
 
Last edited:
[SOLVED]
in the end the issue was unrelated to proxmox or the disk drives. it seems for some reason my HP Microserver Gen8 has very slow disk speeds when the controller is in AHCI mode. Enabling RAID mode increased disk I/O significantly. I don't now why is this happening, as the performance of two modes should be the same but thats for a different investigation

I can second this.

I have an ML110 Gen 7, with a 512mb SSD and 5 2Tb sata disks.
Restoring was lasting for ever.
I set the controller to Raid mode, create a logical drive with one disk (the ssd) and Proxmox booted normally..
and of course restoring/moving hosts now is very very fast...
 
[SOLVED]
in the end the issue was unrelated to proxmox or the disk drives. it seems for some reason my HP Microserver Gen8 has very slow disk speeds when the controller is in AHCI mode. Enabling RAID mode increased disk I/O significantly. I don't now why is this happening, as the performance of two modes should be the same but thats for a different investigation
Been lurking here for months, but I needed to post to thank you for being such a legend. These problems have been driving me insane trying to track down for a solid week and this ended up being the exact solution, so thank you so much for following up once you figured it out.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!