Backup restore on ZFS produces high load

HBO · Jan 10, 2018

I've got a dual 6 core E5-2600v3 System with 64gb ram, LSI 9300-8i 12GB/s SATA/SAS HBA, 4 Intel DC SSD in ZFS Raid10 with 1 Intel DC SSD for log/cache.

On restore i got a system load up to 40-50 with many zfs processes in "top". No Change between restore by using nfs or local zfs dir.

Is there any possibility to reduce load for restore job? On time to time the whole system hangs for 1-2 seconds with a running restore job.

Running following Version:

Code:

proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
pve-kernel-4.13.13-2-pve: 4.13.13-32
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-18
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9

fireon · Jan 10, 2018

HBO said:
No Change between restore by using nfs or local zfs dir

Really "dir"? So you are no using zvol (default)? Please post some informations:

Code:

pvesm status
zpool list
zpool status
zfs list
pveperf /mountpoint-zpool

HBO · Jan 10, 2018

Yeah "dir"

/var/lib/vz/dump/ to restore the backup not from nfs.

Code:

root@proxnode3:~# pvesm status
Name             Type     Status           Total            Used       Available        %
backup            nfs     active      5759822848      3848716288      1911090176   66.82%
local             dir     active       206999552         9813888       197185664    4.74%
local-zfs     zfspool     active       464767584       267581852       197185732   57.57%

Code:

root@proxnode3:~# zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool   476G   266G   210G         -    15%    55%  1.00x  ONLINE  -
root@proxnode3:~# zpool list -v
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool   476G   266G   210G         -    15%    55%  1.00x  ONLINE  -
  mirror   238G   133G   105G         -    14%    56%
    sda2      -      -      -         -      -      -
    sdb2      -      -      -         -      -      -
  mirror   238G   132G   106G         -    16%    55%
    sdc      -      -      -         -      -      -
    sdd      -      -      -         -      -      -
log      -      -      -         -      -      -
  sde1  59.5G  1.59M  59.5G         -     0%     0%
cache      -      -      -         -      -      -
  sde2   178G   134G  44.8G         -     0%    74%

Code:

root@proxnode3:~# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool                      273G   188G   104K  /rpool
rpool/ROOT                9.36G   188G    96K  /rpool/ROOT
rpool/ROOT/pve-1          9.36G   188G  9.36G  /
rpool/data                 255G   188G    96K  /rpool/data
rpool/data/vm-109-disk-1  53.5G   188G  53.5G  -
rpool/data/vm-203-disk-1  35.3G   188G  35.3G  -
rpool/data/vm-300-disk-1  12.8G   188G  12.8G  -
rpool/data/vm-300-disk-2  48.6M   188G  48.6M  -
rpool/data/vm-300-disk-3  56.5G   188G  56.5G  -
rpool/data/vm-303-disk-1  97.1G   188G  97.1G  -
rpool/swap                8.50G   196G   950M  -

Code:

root@proxnode3:~# pveperf /rpool
CPU BOGOMIPS:      115205.76
REGEX/SECOND:      2216472
HD SIZE:           188.05 GB (rpool)
FSYNCS/SECOND:     310.98
DNS EXT:           53.95 ms
DNS INT:           14.71 ms

I don't know why there are such low fsyncs in pveperf. The system runs fast, all vms got perfect disk io. There's only this load problem by restoring backups.

PS: I've another production System with "normal" fsyncs, same load problem. It's an newer dual 12 core E5-2600v4 system with about 128GB of ram. But same HBA with SAS enterprise and consumer ssd in ZFS Raid 10.
So it's a proxmox related issue? Writing off 100GB on an empty zvol leaves load on low state.

mir · Jan 10, 2018

Since your disks in the RAID are all SSD I would remove the log and cache. In your setup it could easily reduce your performance. As a general rule of thump: Unless your log device is given at least 4-5 times the random performance over your disks in the array it often only reduces the overall performance which means in your case the log device should be NVMe type SSD with M.2 or PCIe interface to actually give a performance improvement. I am convinced that your MB supports more RAM so I would recommend exchanging the log device with additional RAM.

HBO · Jan 10, 2018

First test was without log/cache, actually i tested it with log/cache. Only on SAS system there is some more performance. This high load is not a ssd problem, the sas raid10 zpool has the same issue.
But no reason for this high load on restore process.

HBO · Jan 15, 2018

Nobody an idea how to solve this high load problem on backup restore jobs?

Evolvia · Jun 25, 2018

Please someone have a Proxmox 5 with ZFS that not have this problem during restore?

We are testing a ZFS enviroment but during restor of one VM all services stops because we suspect too much disk I/O usage.

HBO · Jun 26, 2018

There is a "bwlimit" value added with last PVE updates. Works fine now.
https://pve.proxmox.com/pve-docs/qmrestore.1.html

Evolvia · Jun 26, 2018

Thanks, just yesterday we have made a test with that and seems to works well. We are making configurations. Thanks very much for your reply.

Miguel · Nov 18, 2018

I don´t see the bwlimit option anymore:

Code:

:~# man qmrestore
QMRESTORE(1)               Proxmox VE Documentation               QMRESTORE(1)

NAME
       qmrestore - Restore QemuServer `vzdump` Backups

SYNOPSIS
       qmrestore help

       qmrestore <archive> <vmid> [OPTIONS]

       Restore QemuServer vzdump backups.

       <archive>: <string>
           The backup file. You can pass - to read from standard input.

       <vmid>: <integer> (1 - N)
           The (unique) ID of the VM.

       --force <boolean>
           Allow to overwrite existing VM.

       --pool <string>
           Add the VM to the specified pool.

       --storage <string>
           Default storage.

       --unique <boolean>
           Assign a unique random ethernet address.

Restore is slowly working but all VMs are stalling. Anyway to fix this?

goseph · Nov 27, 2020

HBO said:
I've got a dual 6 core E5-2600v3 System with 64gb ram, LSI 9300-8i 12GB/s SATA/SAS HBA, 4 Intel DC SSD in ZFS Raid10 with 1 Intel DC SSD for log/cache.

On restore i got a system load up to 40-50 with many zfs processes in "top". No Change between restore by using nfs or local zfs dir.

Is there any possibility to reduce load for restore job? On time to time the whole system hangs for 1-2 seconds with a running restore job.

Running following Version:

Code:

proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve) pve-manager: 5.1-41 (running version: 5.1-41/0b958203) pve-kernel-4.13.13-2-pve: 4.13.13-32 libpve-http-server-perl: 2.0-8 lvm2: 2.02.168-pve6 corosync: 2.4.2-pve3 libqb0: 1.0.1-1 pve-cluster: 5.0-19 qemu-server: 5.0-18 pve-firmware: 2.0-3 libpve-common-perl: 5.0-25 libpve-guest-common-perl: 2.0-14 libpve-access-control: 5.0-7 libpve-storage-perl: 5.0-17 pve-libspice-server1: 0.12.8-3 vncterm: 1.5-3 pve-docs: 5.1-12 pve-qemu-kvm: 2.9.1-5 pve-container: 2.0-18 pve-firewall: 3.0-5 pve-ha-manager: 2.0-4 ksm-control-daemon: 1.2-2 glusterfs-client: 3.8.8-1 lxc-pve: 2.1.1-2 lxcfs: 2.0.8-1 criu: 2.11.1-1~bpo90 novnc-pve: 0.6-4 smartmontools: 6.5+svn4324-1 zfsutils-linux: 0.7.3-pve1~bpo9

What was your fix in the end? Any news that would help me with high load? What bwlimit did you set?
Thanks

bwlimit if others are reading this and wonder:
vi /etc/vzdump.conf
bwlimit 50000
for example

Search

Search

Backup restore on ZFS produces high load

HBO

Active Member

fireon

Distinguished Member

HBO

Active Member

mir

Famous Member

HBO

Active Member

HBO

Active Member

Evolvia

New Member

HBO

Active Member

Evolvia

New Member

Miguel

Member

goseph

Renowned Member

We value your privacy