VM interruption during backups - vzdump Disable RAM in snapshot mode?

mko

New Member
Feb 3, 2021
8
0
1
46
With backups/vzdump running in snapshot mode I am seeting 4-5 seconds of interruption to guest VMs during backup runs which is affecting certain network-sensitive applications. The same interruption happens when a manual snapshot is taken with RAM included so I presume vzdump is including RAM in snapshot mode and causing this. When RAM is not included on a snapshot there is no interruption to VMs.

I understand this is done to improve backup consistency but am looking for a way to disable RAM on vzdump snapshots to avoid disrupting sensitive VMs during several backups per day (at the risk of FS inconsistencies). Is there a way to do this currently or could an option please be added in a future release? Ideally it could be set per backup job so only the sensitive VMs have their RAM excluded from snapshots.

Thanks in advance.
 
No, vzdump doesn't dump the RAM of the guest. It uses the qemu agent to freeze / unfreeze FS inside the guest to ensure consistency instead. When you say 4-5 sec of interruption, do you mean network outage of the VM ?
 
Yes, the interruption is a network outage of the VM at the exact moment the backup job is backing up that particular VM. Other VMs are not affected until the backup job gets to them. I was maybe incorrectly relating it to snapshots with RAM because VMs also stop responding for a few seconds during snapshots unless RAM is excluded.

The majority are BSD guests so the qemu agent isn't available for them. Since there is no FS freeze/thaw going on without the agent, any other ideas what could be causing the brief VM network outages during backups?

Here is an example backup log for one of the VMs. A network outage was detected from 01:10:14 - 01:10:19

Code:
INFO: Starting Backup of VM 134 (qemu)
INFO: Backup started at 2021-02-03 01:10:12
INFO: status = running
INFO: VM Name: test1
INFO: include disk 'scsi0' 'vmdata1:vm-134-disk-0' 10G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/134/2021-02-03T07:10:12Z'
INFO: started backup task '0fafe056-4b3b-43c2-88e3-4ada596eced2'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (96.0 MiB of 10.0 GiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 96.0 MiB dirty of 10.0 GiB total
INFO: 100% (96.0 MiB of 96.0 MiB) in  1s, read: 96.0 MiB/s, write: 84.0 MiB/s
INFO: backup was done incrementally, reused 9.92 GiB (99%)
INFO: transferred 96.00 MiB in 1 seconds (96.0 MiB/s)
INFO: Finished Backup of VM 134 (00:00:08)
INFO: Backup finished at 2021-02-03 01:10:20

Backup target is a Proxmox Backup Server over a dedicated 10G network separate from VM traffic. Source host CPU utilization is under 10%, memory ~15%, and source datastore is ZFS RAID10 with enterprise SSDs.

EDIT: Also seeing the VM interruption during backup on Windows guests with qemu agent installed. Network outage detected at 17:13:05 until ~17:13:23:

Code:
INFO: Starting Backup of VM 195 (qemu)
INFO: Backup started at 2021-02-03 17:13:00
INFO: status = running
INFO: VM Name: win2016-prod1
INFO: include disk 'scsi0' 'vmdata1:vm-195-disk-0' 300G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/195/2021-02-03T23:13:00Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '4b135f20-0326-461f-bdc5-ebebb3a4f140'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (3.0 GiB of 300.0 GiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 3.0 GiB dirty of 300.0 GiB total
INFO:  25% (772.0 MiB of 3.0 GiB) in  3s, read: 257.3 MiB/s, write: 254.7 MiB/s
INFO:  38% (1.1 GiB of 3.0 GiB) in  6s, read: 134.7 MiB/s, write: 132.0 MiB/s
INFO:  49% (1.5 GiB of 3.0 GiB) in  9s, read: 112.0 MiB/s, write: 112.0 MiB/s
INFO:  63% (1.9 GiB of 3.0 GiB) in 12s, read: 141.3 MiB/s, write: 134.7 MiB/s
INFO:  80% (2.4 GiB of 3.0 GiB) in 15s, read: 164.0 MiB/s, write: 117.3 MiB/s
INFO:  95% (2.8 GiB of 3.0 GiB) in 18s, read: 152.0 MiB/s, write: 106.7 MiB/s
INFO: 100% (3.0 GiB of 3.0 GiB) in 21s, read: 48.0 MiB/s, write: 40.0 MiB/s
INFO: backup was done incrementally, reused 297.37 GiB (99%)
INFO: transferred 2.96 GiB in 28 seconds (108.1 MiB/s)
INFO: Finished Backup of VM 195 (00:00:55)
INFO: Backup finished at 2021-02-03 17:13:55
 
Last edited:
Hello mko,

I'm facing the same issues.
Have you been able to fix it in the meantime?

Cheers,
luphi
 
Hello mko,

I'm facing the same issues.
Have you been able to fix it in the meantime?

Cheers,
luphi

Unfortunately we have not been able to solve this. I did find several other posts and bug reports that suggest older ZFS versions might be part of the problem. We will be upgrading from PVE 6.3 to 6.4 very soon to see if the new major ZFS version helps at all, but given how only the VM being backed up sees the network outage and no other VMs or the system are impacted I am not sure this will help.

Which PVE version are you running and are you also using ZFS?
 
The issue even exits in PVE 7 and yes I'm using ZFS
I'm going to switch the network driver from virtio to e1000 tomorrow....
 
Last edited:
Has anybody found a workaround for this issue? I have Proxmox 7.1-10 with e1000 and still the backup freezes the VM for a couple of seconds. I only need the disk contents and I need the VM to work without any interruption.

Is using ZFS automatic snapshots the only solution? ( https://github.com/zfsonlinux/zfs-auto-snapshot )

Here is a log output (anonymized) of the backup task that freezes the VM for couple of seconds:
Code:
INFO: starting new backup job: vzdump NNN --storage ..... --mailto ..... --mode snapshot --compress zstd --mailnotification failure --quiet 1
INFO: Starting Backup of VM NNN (qemu)
INFO: Backup started at 2023-06-05 03:10:02
INFO: status = running
INFO: VM Name: XXXXX
INFO: include disk 'sata0' 'local-zfs...:vm-NNN-disk-0' 70G
INFO: include disk 'sata1' 'local-zfs...:vm-NNN-disk-1' 2G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/....../dump/vzdump-qemu-NNN-2023_06_05-03_10_02.vma.zst'
INFO: started backup task '44f7915a-163c-4385-9a77-d93076823499'
INFO: resuming VM again
INFO:   3% (2.3 GiB of 72.0 GiB) in 3s, read: 799.9 MiB/s, write: 92.4 MiB/s
INFO:   4% (3.0 GiB of 72.0 GiB) in 8s, read: 137.0 MiB/s, write: 137.0 MiB/s
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!