KVM Snapshot Backup Slow

matthew

Renowned Member
Jul 28, 2011
210
5
83
I installed Proxmox 4 on a Supermicro motherboard, Xeon dual core X3430 2.4ghz CPU with 16GByte of ram. Installed using ZFS raid1 across two 4TB sata 7200 rpm drives. I also added a 2TB 5400 rpm sata drive formatted ext4 and use it only for backups.

I created a KVM VM with 200GB space, 1GByte ram and everything else default. I installed Centos 7 on it. It seems to defaulted too XFS and thin LVM. It seems to work ok and is only running bind and is not very busy right now. It is the only VM on the proxmox server that is not idle.

When I do a snapshot backup of the KVM machine it takes about 10 minutes and the KVM server slows down to almost zero. It drops most pings and if I have a SSH terminal session to it I am dropped. When the backup is done everything returns to usual.

Any idea why this is?
 
Normally, this should be fast.
Please provide load, iowait and do a zpool iostat 5 while the backup is running. The tool dstat may also provide some info with dstat -a -D total,sda,sdb,sdc 5 (if these are your devices).

For ZFS-based volumes, did you enable discard, use scsi as disk type and use virtio as SCSI-backend? This can reduce stored blocks and therefore fasten up backup.
 
I've seen that ZFS as local storage for qcow2 images (and maybe for whatever file based else) is not a good idea, better follow the wiki and create a dedicated pool for volumes (like zfs create rpool/zfsdisks and then Datacenter -> [Storage] -> Add, ZFS, as "ZFS Pool" select "rpool/zfsdisks", flag "thin provisioning" etc.).
I've a mixed feeling with ZFS (of course with ZIL and L2ARC on SSD), sometime it seems fast like hell, sometime it trashes and hangs the whole server (i.e. a restore if not ZFS plugin is used). Also I had a VM swapping internally that made the server work very badly.
In any case, let me know :)
 
After install on a ZFS RAID1 two drive array I created my storage pool like so:

zfs create rpool/zfs_storage_pool

Then in Proxmox I created:

ID: zfs_storage
ZFS Pool: rpool/zfs_storage_pool
Enable and Thin Provision checked.
Disk Image and Container

The backups go to a separate Ext4 2TB drive.
 
There was really not much/any change when the backup started and the VM started to freeze. Top and w did not really indicate the host was that bogged down.

# zpool iostat 5
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
rpool 6.37G 3.62T 0 32 988 140K
rpool 6.37G 3.62T 0 34 0 138K
rpool 6.37G 3.62T 0 31 0 119K
rpool 6.37G 3.62T 0 31 0 119K
rpool 6.37G 3.62T 0 32 0 126K
rpool 6.37G 3.62T 0 37 0 146K
rpool 6.37G 3.62T 0 39 0 157K
rpool 6.37G 3.62T 0 32 0 122K
rpool 6.37G 3.62T 0 32 0 122K
rpool 6.37G 3.62T 0 31 0 118K
rpool 6.37G 3.62T 0 35 0 145K
rpool 6.37G 3.62T 0 37 0 144K
rpool 6.37G 3.62T 0 31 0 123K
rpool 6.37G 3.62T 0 37 0 150K
rpool 6.37G 3.62T 0 32 0 122K
rpool 6.37G 3.62T 0 34 0 136K
rpool 6.37G 3.62T 0 32 0 125K
rpool 6.37G 3.62T 0 40 0 165K
rpool 6.37G 3.62T 0 42 0 175K
rpool 6.37G 3.62T 0 33 0 127K
rpool 6.37G 3.62T 0 31 0 120K
rpool 6.37G 3.62T 0 33 0 130K
rpool 6.37G 3.62T 0 32 0 124K
rpool 6.37G 3.62T 0 36 0 150K
rpool 6.37G 3.62T 0 33 0 132K


# w
10:36:40 up 11 days, 22:21, 1 user, load average: 0.11, 0.45, 0.34

# iostat -x
Linux 4.2.8-1-pve (6) 03/02/2016 _x86_64_ (4 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
0.91 0.00 0.53 0.67 0.00 97.88

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.03 0.01 8.45 0.49 154.91 36.73 0.02 2.75 6.29 2.75 2.71 2.29
sdb 0.00 0.03 0.01 8.40 0.51 154.91 36.96 0.02 2.88 5.73 2.87 2.84 2.39
sdc 0.00 0.00 0.00 0.00 0.00 2.32 1486.68 0.00 470.65 1.46 545.86 7.43 0.00
zd0 0.00 0.00 0.00 0.00 0.00 0.00 16.00 0.00 0.02 0.02 0.00 0.02 0.00
zd16 0.00 0.00 0.24 0.06 0.93 2.24 20.90 0.00 0.41 0.01 1.90 0.41 0.01
zd32 0.00 0.00 9.89 0.17 612.00 3.08 122.38 0.00 0.10 0.04 3.31 0.10 0.10
zd48 0.00 0.00 0.19 0.07 0.79 2.72 27.75 0.00 0.55 0.01 2.05 0.55 0.01



# w
10:37:02 up 11 days, 22:22, 1 user, load average: 0.23, 0.45, 0.34

# iostat -x
Linux 4.2.8-1-pve (6) 03/02/2016 _x86_64_ (4 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
0.91 0.00 0.53 0.67 0.00 97.88

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.03 0.01 8.45 0.49 154.91 36.73 0.02 2.75 6.29 2.75 2.71 2.29
sdb 0.00 0.03 0.01 8.40 0.51 154.91 36.96 0.02 2.88 5.73 2.87 2.84 2.39
sdc 0.00 0.00 0.00 0.00 0.00 2.32 1479.90 0.00 468.52 1.46 542.99 7.44 0.00
zd0 0.00 0.00 0.00 0.00 0.00 0.00 16.00 0.00 0.02 0.02 0.00 0.02 0.00
zd16 0.00 0.00 0.24 0.06 0.93 2.24 20.90 0.00 0.41 0.01 1.90 0.41 0.01
zd32 0.00 0.00 10.12 0.17 627.28 3.08 122.51 0.00 0.10 0.04 3.32 0.10 0.10
zd48 0.00 0.00 0.19 0.07 0.79 2.72 27.75 0.00 0.55 0.01 2.05 0.55 0.01
 
Hmm, the machine really does nothing :-/

Could you please benchmark your devices (zpool and backupdevice) with fio to detect if the bottleneck is host I/O?

Do you have set I/O throttling on the machine? This IMHO affects the backup bandwidth aswell.
 
cat /etc/pve/qemu-server/104.conf
bootdisk: ide0
cores: 1
ide0: zfs_storage:vm-104-disk-1,size=200G
memory: 1024
name: resolver1.********
net0: e1000=*************,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
smbios1: uuid=***************************
sockets: 1
 
Any reason why do you use ide? Don't use it! Change it to e.g. virtio (need to change fstab-entries) and try again.
 
I used IDE because it was default. Installed another VM with virtio. It seems to do same thing but not quite as bad. Trying to do a constant ping from VM while running the backup. Occasionally even get this error inside the VM.

Message from syslogd@rs2 at Mar 3 10:52:36 ...
kernel:BUG: soft lockup - CPU#0 stuck for 28s! [swapper/0:0]
 
# cat /etc/pve/qemu-server/106.conf
bootdisk: virtio0
cores: 1
ide2: local:iso/CentOS-7-x86_64-Minimal.iso,media=cdrom
memory: 1024
name: rs2.*************
net0: e1000=**************,bridge=vmbr0
numa: 0
ostype: l26
smbios1: uuid=*******************************************
sockets: 1
virtio0: zfs_storage:vm-106-disk-1,size=200G
 
Could you please get a zfs get recordsize <pool>/<fs> for each involved filesystem? There are some noise about bad performance with zfs on 8k recordsize in the forums.
 
# zfs get recordsize
NAME PROPERTY VALUE SOURCE
rpool recordsize 128K default
rpool/ROOT recordsize 128K default
rpool/ROOT/pve-1 recordsize 128K default
rpool/swap recordsize - -
rpool/zfs_storage_pool recordsize 128K default
rpool/zfs_storage_pool/subvol-100-disk-1 recordsize 128K default
rpool/zfs_storage_pool/subvol-101-disk-1 recordsize 128K default
rpool/zfs_storage_pool/subvol-102-disk-1 recordsize 128K default
rpool/zfs_storage_pool/vm-103-disk-1 recordsize - -
rpool/zfs_storage_pool/vm-104-disk-1 recordsize - -
rpool/zfs_storage_pool/vm-105-disk-1 recordsize - -
rpool/zfs_storage_pool/vm-106-disk-1 recordsize - -
rpool/zfs_storage_pool/vm-107-disk-1 recordsize - -

104 is only vm with any real activity.
 
Looks normal, so not the problem I was referring to. You should try to disable PVE 4's automatic watchdog timer which IMHO restarted your machine. Next steps would be trying to debug the kernel, but that's very time-consuming.

Other options would be swap disks with another server and try there. System should start directly on another hardware.
 
Hi @matthew

Did you solve the issue? I have the same problem and as my kvm VMs grows, the problem gets bigger. I made a snapshot of a live VM just moments ago and for 7 straight minutes the console is blocked and no web petitions were replied.

Thanks in advance,

Luis Miguel
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!