High IO during restore

Pavel Olenev

New Member
May 10, 2012
8
1
3
Hello.
Some time ago I noticed that when restoring backups of virtual machines I observe a high iowait on host. But there is a gigabit network between backup server and proxmox node. And local storage is a LVM-Thin on Raid-10 from 4 ssd drives. The saddest thing is that during the recovery procedure from backup, other virtual machines on the host almost completely stop.
Any ideas how to diagnose a bottleneck?


proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-6 (running version: 6.0-6/c71f879f)
pve-kernel-5.0: 6.0-6
pve-kernel-helper: 6.0-6
pve-kernel-5.0.18-1-pve: 5.0.18-3
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve2
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-4
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-7
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-64
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-5
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1
 

Attachments

  • vms15.png
    vms15.png
    67.1 KB · Views: 22
Hi,
you can use atop to monitor network and disk io or iotop to monitor disk io.
For the backup you can rate limit the restore read rate in order to prevent issues. Simply provide a rate in MiB/s in the corresponding field.
 
Hi, I'm experiencing exactly the same, my version is 5.4-13
I don't remember having probles with this on older versions.

I was blaming my slow 7.2K drives, but you have SSDs and have same problem...

Playing with restore read rate worked, but I ended up as low as 10Mib/s, which is rather slow.
 
Yes this helps, but is it a normal solution? Is it ok, that one write operation kills the entire drive performance? I guess yes.

My setup is DELL R720, H710Pmini, RAID1 from two HGST 2.7T drives
 
Is it ok, that one write operation kills the entire drive performance?

One write operation with network speed 100MB/s between backup server and Proxmox node, on the SSD drives with write speed about 300MB/s each, combined to HW Raid10 - I think it must not be OK)))
 
That is correct, but in my case I suspect the drives are really the problem. I was using rather slow SAS 7.2K drives in R1, and on this storage a medium load eshop with MySQL is running (lot of randrom writes and reads). When I tried to restore VM to this drive this VM crashed, because it couldn't write to the drive (timeouts in the log)
I was monitoring on proxmox host with iotop and it showed high iowait on kvm process of my vms.
I did not figure out yet how to mesure ios of each process and find if that is a problem in total.
CHeck what ioload is on your drives without restore running.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!