Proxmox backup slowing down VMs/CTs

MH_MUC · Nov 23, 2020

Good afternoon!
I am running proxmox with ZFS one 2 HDDs.

In total there are 2 CTs to be backed up.
It takes hours and drags down the CTs. I observe cronjobs failing during the night.

I run vzdump with the following args:
ionice -c3 nice -n19 vzdump --maxfiles 1 --exclude-path '/var/lib/psa/dumps/.+' $i
(The exclude-path is ignored, but that is a different problem)

Does anyone have an idea?
Prio should be the VMs/CTs and not the backup. Any idea beyond using nice and ionice?

Code:

#vzdump.conf
bwlimit:51200
compress:zstd
zstd:2
ionice:7
mailnotification:failure
mailto:xxx@xxx.de
mode:snapshot
maxfiles:3

Vzdump-Log:

Code:

2020-11-22 22:00:03 INFO: Starting Backup of VM 100 (lxc)
2020-11-22 22:00:03 INFO: status = running
2020-11-22 22:00:03 INFO: CT Name: web20.xxx.de
2020-11-22 22:00:03 INFO: including mount point rootfs ('/') in backup
2020-11-22 22:00:03 INFO: backup mode: snapshot
2020-11-22 22:00:03 INFO: bandwidth limit: 51200 KB/s
2020-11-22 22:00:03 INFO: ionice priority: 7
2020-11-22 22:00:03 INFO: create storage snapshot 'vzdump'
2020-11-22 22:00:04 INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-100-2020_11_22-22_00_03.tar.zst'
2020-11-23 02:18:59 INFO: Total bytes written: 149183938560 (139GiB, 9.2MiB/s)
2020-11-23 02:18:59 INFO: archive file size: 85.93GB
2020-11-23 02:18:59 INFO: delete old backup '/var/lib/vz/dump/vzdump-lxc-100-2020_11_21-22_00_09.tar.zst'
2020-11-23 02:19:00 INFO: remove vzdump snapshot
2020-11-23 02:19:03 INFO: Finished Backup of VM 100 (04:19:00)

Code:

pveperf /rpool
CPU BOGOMIPS:      54276.48
REGEX/SECOND:      1411096
HD SIZE:           660.72 GB (rpool)
FSYNCS/SECOND:     485.54
DNS EXT:           9.57 ms
DNS INT:           5.89 ms

fiona · Nov 25, 2020

Hi,
what kind of hardware do you have and what else is producing load during the backup? Are the jobs running at the same time or sequentially?

I'm not sure the ionice and nice settings from the outside have an effect, because vzdump executes a new process internally. But the ionice setting can be configured via the vzdump.conf or via a CLI argument --ionice, which is set to 7 already (the maximum for the best-effort class). If you set the value to 8 or higher, vzdump will start the process with the idle class instead.

Regarding exclude-path: This is the old syntax with regular expressions. We changed to shell globs (see man 7 glob) back in 2016; you need to use
'/var/lib/psa/dumps/?*' instead.

MH_MUC · Nov 25, 2020

Hi Fabian,

thank you for the quick reply.
Proxmox is running on a machine with
CPU:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
RAM: 32 GB

Disks: Two 2TB discs in a ZFS-Raid1 confiuration.

Code:

sudo hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
        Model Number:       HGST HUS722T2TALA604
        Serial Number:      WMC6N0PATUEZ
        Firmware Revision:  RAGNWA09
        Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA

There are 2 Linux-CTs, 1 Windows-VM on the same machine. The update is running during the night. There are no significant processes running except for some default webserver-crons and regular webserver-loads.

I rechecked and I think it is less a problem of the vzdump-task itself, but by any task putting load on the hard-drive.
For example creating a tar with all vzdumps in one file will generate IOdelay of 38 even if I run it with "ionice -c3 nice -n19".
Also unextracting a 190MB-zip-file in a windows VM takes very long with a lot of IO-load.
So I believe it is more a problem of the ZFS itself which is bad, because it is hard to change in a production system.

Thank you for the hing concering the exclude-path. I will give it a try.

Thank you for any further advice.

fiona · Nov 26, 2020

What kind of controller do you use? Having the wrong one might explain the bad performance. ZFS works best with direct access to the disks, see here.

MH_MUC · Nov 26, 2020

No controller. Discs are connected to the MB directly

fiona · Nov 26, 2020

What about RAM usage in general and during those jobs? Do you run into a bottleneck there?
Is there anything suspicious in dmesg or syslog about the drives? Does smartcl show them as healthy?

MH_MUC · Nov 26, 2020

Well, how would I find out?
As far as I can see there is no overcomittment.
C_Max (Max ARC) is limited to 14 GB. VMs have a max of 20 GB. Server has 32 GB.

When I observe the server doing encrypted uploads to a remote storage and becoming very slow I see a high disk io while memory is still 28/32 GB.

I added a log/cache to the pool this morning and started monitoring by atop. Will see how this turns out.

Search

Search

Proxmox backup slowing down VMs/CTs

MH_MUC

Well-Known Member

fiona

Proxmox Staff Member

MH_MUC

Well-Known Member

fiona

Proxmox Staff Member

MH_MUC

Well-Known Member

fiona

Proxmox Staff Member

MH_MUC

Well-Known Member

We value your privacy