Is it possible to throttle backup and restore disk io?

before:
https://pastebin.com/DPvREayk
meanwhile:
https://pastebin.com/AThQeRJy

Code:
free -m
              total        used        free      shared  buff/cache   available
Mem:          48308       32520         394         147       15392       15071
Swap:          8191         119        8072

I can see that arc is kinda dropped, but even before I start the heavy i/o task I have 10G of free RAM.

and which operation are you doing / what is causing the I/O? could you include pveversion -v as well?
 
I did a vzdump snapshot

Code:
proxmox-ve: 5.1-43 (running kernel: 4.15.17-1-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.13: 5.1-44
pve-kernel-4.15: 5.1-4
pve-kernel-4.15.17-1-pve: 4.15.17-8
pve-kernel-4.15.15-1-pve: 4.15.15-6
pve-kernel-4.15.10-1-pve: 4.15.10-4
pve-kernel-4.15.3-1-pve: 4.15.3-1
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-3-pve: 4.13.13-34
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.17-1-pve: 4.10.17-18
pve-kernel-4.10.8-1-pve: 4.10.8-7
pve-kernel-4.10.5-1-pve: 4.10.5-5
pve-kernel-4.10.1-2-pve: 4.10.1-2
pve-kernel-4.4.35-1-pve: 4.4.35-77
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-21
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-17
pve-cluster: 5.0-27
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
pve-zsync: 1.6-15
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.8-pve1~bpo9
 
I did a vzdump snapshot

Code:
proxmox-ve: 5.1-43 (running kernel: 4.15.17-1-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.13: 5.1-44
pve-kernel-4.15: 5.1-4
pve-kernel-4.15.17-1-pve: 4.15.17-8
pve-kernel-4.15.15-1-pve: 4.15.15-6
pve-kernel-4.15.10-1-pve: 4.15.10-4
pve-kernel-4.15.3-1-pve: 4.15.3-1
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-3-pve: 4.13.13-34
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.17-1-pve: 4.10.17-18
pve-kernel-4.10.8-1-pve: 4.10.8-7
pve-kernel-4.10.5-1-pve: 4.10.5-5
pve-kernel-4.10.1-2-pve: 4.10.1-2
pve-kernel-4.4.35-1-pve: 4.4.35-77
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-21
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-17
pve-cluster: 5.0-27
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
pve-zsync: 1.6-15
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.8-pve1~bpo9

sorry for the back and forth - would it also be possible to get a backup log, VM config and storage config? just to get the complete picture. AFAICT ZFS thinks you are running out of RAM and drops the ARC to the configured minimum value (which is of course bad for I/O performance / system load)
 
Code:
INFO: starting new backup job: vzdump 101 --mode snapshot
INFO: Starting Backup of VM 101 (qemu)
INFO: status = running
INFO: update VM 101: -lock backup
INFO: VM Name: pfsense
INFO: include disk 'scsi0' 'local-zfs:vm-101-disk-1' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/var/lib/vz/dump/vzdump-qemu-101-2018_05_11-11_44_50.vma'
INFO: started backup task '41b0205a-0f11-48a3-86c1-e7223c4469b6'
INFO: status: 1% (677838848/34359738368), sparse 1% (641093632), duration 3, read/write 225/12 MB/s
INFO: status: 4% (1472200704/34359738368), sparse 3% (1335742464), duration 6, read/write 264/33 MB/s
INFO: status: 8% (2797338624/34359738368), sparse 7% (2575880192), duration 9, read/write 441/28 MB/s
INFO: status: 13% (4583653376/34359738368), sparse 12% (4280025088), duration 12, read/write 595/27 MB/s
INFO: status: 15% (5379325952/34359738368), sparse 14% (4989726720), duration 15, read/write 265/28 MB/s
INFO: status: 16% (5528551424/34359738368), sparse 14% (4992667648), duration 19, read/write 37/36 MB/s
INFO: status: 17% (5930745856/34359738368), sparse 15% (5243269120), duration 23, read/write 100/37 MB/s
INFO: status: 18% (6253117440/34359738368), sparse 15% (5266898944), duration 29, read/write 53/49 MB/s
INFO: status: 19% (6565527552/34359738368), sparse 15% (5458432000), duration 32, read/write 104/40 MB/s
INFO: status: 20% (6897991680/34359738368), sparse 16% (5499826176), duration 37, read/write 66/58 MB/s
INFO: status: 22% (7579435008/34359738368), sparse 17% (6015127552), duration 40, read/write 227/55 MB/s
INFO: status: 24% (8585674752/34359738368), sparse 20% (7006806016), duration 43, read/write 335/4 MB/s
INFO: status: 26% (9118023680/34359738368), sparse 21% (7477268480), duration 46, read/write 177/20 MB/s
INFO: status: 27% (9508028416/34359738368), sparse 22% (7866019840), duration 49, read/write 130/0 MB/s
INFO: status: 29% (10046537728/34359738368), sparse 24% (8384069632), duration 52, read/write 179/6 MB/s
INFO: status: 35% (12190547968/34359738368), sparse 30% (10508570624), duration 55, read/write 714/6 MB/s
INFO: status: 36% (12671713280/34359738368), sparse 31% (10988552192), duration 64, read/write 53/0 MB/s
INFO: status: 37% (12748914688/34359738368), sparse 32% (11065753600), duration 69, read/write 15/0 MB/s
INFO: status: 38% (13077381120/34359738368), sparse 33% (11394220032), duration 84, read/write 21/0 MB/s
INFO: status: 39% (13542817792/34359738368), sparse 34% (11858472960), duration 98, read/write 33/0 MB/s
INFO: status: 40% (13780123648/34359738368), sparse 35% (12095778816), duration 106, read/write 29/0 MB/s
INFO: status: 41% (14103674880/34359738368), sparse 36% (12402135040), duration 117, read/write 29/1 MB/s
INFO: status: 42% (14472970240/34359738368), sparse 37% (12770246656), duration 134, read/write 21/0 MB/s
INFO: status: 43% (14793441280/34359738368), sparse 38% (13090717696), duration 143, read/write 35/0 MB/s
INFO: status: 44% (15143141376/34359738368), sparse 39% (13439234048), duration 154, read/write 31/0 MB/s
INFO: status: 45% (15492186112/34359738368), sparse 40% (13788213248), duration 170, read/write 21/0 MB/s
INFO: status: 46% (15877734400/34359738368), sparse 41% (14172577792), duration 179, read/write 42/0 MB/s
INFO: status: 47% (16152330240/34359738368), sparse 42% (14447173632), duration 189, read/write 27/0 MB/s
INFO: status: 48% (16614752256/34359738368), sparse 43% (14908411904), duration 202, read/write 35/0 MB/s
INFO: status: 49% (16872570880/34359738368), sparse 44% (15166230528), duration 206, read/write 64/0 MB/s
INFO: status: 50% (17181048832/34359738368), sparse 45% (15473520640), duration 224, read/write 17/0 MB/s
INFO: status: 51% (17543004160/34359738368), sparse 46% (15835475968), duration 230, read/write 60/0 MB/s
INFO: status: 52% (17880907776/34359738368), sparse 47% (16172195840), duration 242, read/write 28/0 MB/s
INFO: status: 53% (18213634048/34359738368), sparse 48% (16504922112), duration 254, read/write 27/0 MB/s
INFO: status: 54% (18577293312/34359738368), sparse 49% (16853114880), duration 270, read/write 22/0 MB/s
ERROR: interrupted by signal
INFO: aborting backup job
ERROR: Backup of VM 101 failed - interrupted by signal
ERROR: Backup job failed - interrupted by signal

TASK ERROR: interrupted by signal

Code:
agent: 1
balloon: 0
bios: ovmf
boot: cdn
bootdisk: scsi0
cores: 10
cpu: EPYC
efidisk0: local-zfs:vm-100-disk-3,size=128K
hostpci0: 07:00,pcie=1,x-vga=on
hotplug: network,usb
ide0: none,media=cdrom
keyboard: de
machine: q35
memory: 12288
name: win10
net0: virtio=72:DC:5B:51:AD:D2,bridge=vmbr0
numa: 1
onboot: 1
ostype: win10
scsi0: ssdpool:vm-100-disk-1,cache=none,iothread=1,size=447G
scsi2: local-zfs:vm-100-disk-2,backup=0,iothread=1,replicate=0,size=1000G
scsihw: virtio-scsi-single
smbios1: uuid=76521a3b-440f-4ccb-980a-8f85d6043e98
sockets: 1
tablet: 0
usb0: host=046d:0a01
usb1: host=046d:c051
usb2: host=413c:2005
usb3: host=046d:c52b
hugepages: 2

Code:
balloon: 0
bootdisk: scsi0
cores: 2
ide2: none,media=cdrom
keyboard: de
memory: 1024
name: pfsense
net0: virtio=4E:44:08:32:94:3E,bridge=vmbr0
net1: virtio=42:5B:FF:FA:BC:9B,bridge=vmbr1
numa: 0
onboot: 1
ostype: l26
scsi0: local-zfs:vm-101-disk-1,cache=unsafe,size=32G
smbios1: uuid=8449f963-cb43-412e-9bc8-d3fa65e9a423
sockets: 1
startup: order=1
tablet: 0

Edit, some more test and results:
After this the arc was never beeing increased over arc_min_size, so after it was dropped during vzdump it was limited to 1.47G (=arc_min) instead of the full 7G until I did a reboot.

Then I have set arc_min == arc_max and recognized that during a vzdump the arc is still decreasing fast and the freezes are still there but at least it is increasing again after I cancle the vzdump.

Meanwhile I had plenty (10G++) of free RAM.

And it starts to swap...:

Code:
pve-ha-crm 16486 2948 kB
pve-ha-lrm 16525 2368 kB
spiceproxy 16785 2268 kB
spiceproxy work16803 2184 kB
pvedaemon 16444 1884 kB
pveproxy 16729 1872 kB
pvedaemon worke16449 1416 kB
pvedaemon worke16447 1396 kB
pvedaemon worke16448 1320 kB
pveproxy worker19686 1136 kB
pveproxy worker12382 1076 kB
pveproxy worker11918 1072 kB
pve-firewall 16353 912 kB
pvestatd 16367 408 kB
bash 21311 368 kB
bash 22854 364 kB
bash 22855 348 kB
sshd 20915 12 kB
lxc-start 20680 4 kB

Edit2:
I try do solve my typos but:
The following error occurred:
Your content can not be submitted. This is likely because your content is spam-like or contains inappropriate elements. Please change your content or try again later. If you still have problems, please contact an administrator.
 
Last edited:
and the backup storage is also (directory on) ZFS?
 
I tried to reproduce this using a similar setup, but was not successful you you try monitoring the situation with atop and arcstat and see how the memory and ARC develops over the course of the backup?
 
I'm also busy trying to find the problems during backups where VMs (Ubuntu 16.04) using ZFS inside, on top of ZFS based proxmox host/hypervisor, throws these kernel panics and warnings during backups.

proxmox-ve: 5.2-2 (running kernel: 4.15.18-5-pve)
pve-manager: 5.2-9 (running version: 5.2-9/4b30e8f9)
pve-kernel-4.15: 5.2-8
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.15-1-pve: 4.15.15-6
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.4.83-1-pve: 4.4.83-96
pve-kernel-4.4.76-1-pve: 4.4.76-94
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.40-1-pve: 4.4.40-82
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.15-1-pve: 4.4.15-60
pve-kernel-4.2.8-1-pve: 4.2.8-41
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-38
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-10
libpve-storage-perl: 5.0-29
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-27
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-35
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1

https://bugzilla.proxmox.com/show_bug.cgi?id=1453#c7 closed without mentioning where/how to limit it for backups only. as man datacenter.cfg only shows for "defult" and restore/move, but not backup.

the backup edit window also doesn't show anything about IOnice :(
 
why vzdump bwlimit is so dumb?
it runs for a few seconds at the full speed than wait's without transmission, so the average will get equal to bwlimit.
i've expected a constant transmission rate of the bwlimit value

bwlimit.jpg
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!