Freezing of the VM the time of the live snapshot on ZFS

BiB

New Member
Aug 23, 2019
2
0
1
53
Hello

I found the freezing of the virtual machine at the time of the live snapshot with vmstate (when saving VM RAM).
Freezing appears as a loss of response from the VM for a few seconds, after which the VM continues to work without
losing tcp connections.

Proxmox Virtual Environment 6.0-7 on a ZFS pool of two SSD RAID 1 (ZFS v0.8.1).

VM - Ubuntu RAM: 5Gb; Hard disk: Virtio SCSI 10Gb on ZFS volume dataset.

Freeze looks like this:

(running snapshot with saving RAM state to ZFS volume)
# time qm snapshot 114 testsnap --vmstate 1

real 0m8.963s
user 0m0.285s
sys 0m0.080s

(testing ping to VM at the same time)
#ping 10.9.2.189
PING 10.9.2.189 (10.9.2.189): 56 data bytes
64 bytes from 10.9.2.189: icmp_seq=0 ttl=64 time=0.259 ms
64 bytes from 10.9.2.189: icmp_seq=1 ttl=64 time=0.347 ms
64 bytes from 10.9.2.189: icmp_seq=2 ttl=64 time=0.302 ms
64 bytes from 10.9.2.189: icmp_seq=3 ttl=64 time=7667.822 ms <- snapshot started
64 bytes from 10.9.2.189: icmp_seq=4 ttl=64 time=6636.308 ms
64 bytes from 10.9.2.189: icmp_seq=5 ttl=64 time=5581.749 ms
64 bytes from 10.9.2.189: icmp_seq=6 ttl=64 time=4527.823 ms
64 bytes from 10.9.2.189: icmp_seq=7 ttl=64 time=3515.349 ms
64 bytes from 10.9.2.189: icmp_seq=8 ttl=64 time=2503.208 ms
64 bytes from 10.9.2.189: icmp_seq=9 ttl=64 time=1446.281 ms
64 bytes from 10.9.2.189: icmp_seq=10 ttl=64 time=410.162 ms
64 bytes from 10.9.2.189: icmp_seq=11 ttl=64 time=0.248 ms
64 bytes from 10.9.2.189: icmp_seq=12 ttl=64 time=0.265 ms
64 bytes from 10.9.2.189: icmp_seq=13 ttl=64 time=0.284 ms
^C
--- 10.9.2.189 ping statistics ---
14 packets transmitted, 14 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.248/2306.458/7667.822/2684.618 ms

If the VM disks are in the Qcow2 files (stored on ZFS dataset), freezing is not observed or they are negligible.

(running snapshot with saving RAM state to file)
# time qm snapshot 116 testsnap --vmstate 1
Formatting '/var/lib/vz/images/116/vm-116-state-testsnap.raw', fmt=raw size=11261706240

real 0m0.985s
user 0m0.315s
sys 0m0.044s

(testing ping to VM at the same time)
#ping 10.9.2.189
PING 10.9.2.189 (10.9.2.189): 56 data bytes
64 bytes from 10.9.2.189: icmp_seq=0 ttl=64 time=0.261 ms
64 bytes from 10.9.2.189: icmp_seq=1 ttl=64 time=0.283 ms
64 bytes from 10.9.2.189: icmp_seq=2 ttl=64 time=0.343 ms
64 bytes from 10.9.2.189: icmp_seq=3 ttl=64 time=0.317 ms
64 bytes from 10.9.2.189: icmp_seq=4 ttl=64 time=0.137 ms
64 bytes from 10.9.2.189: icmp_seq=5 ttl=64 time=0.236 ms
64 bytes from 10.9.2.189: icmp_seq=6 ttl=64 time=0.263 ms
^C
--- 10.9.2.189 ping statistics ---
7 packets transmitted, 7 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.137/0.263/0.343/0.061 ms


So there is some problem with saving VM RAM state to ZFS datataset.

I also tried to change sync mode of ZFS pool to: always, standard, disabled.

There is no any effect for taking snapshot of VM on ZFS volume - freeze is always present.

But there is visible effect when taking snapshot for VMs in Qcow2:
sync=always: every ping time increased to 200-400 ms
sync=standard: no freeze
sync=disabled: 7 second freeze (same as for VM on ZFS volume).

So problem with live snapshot of VMs on ZFS volume may be related to write caching of ZFS.

When using ZFS volumes on HDD instead of SSD, freeze is about 30-35 seconds for a VM with only 4 Gb of RAM.

Anyone have ideas why are these freezes occur, or how to avoid them when using ZFS?
 
if you save the memory state of the vm, there may be a situation where you cannot avoid to pause the vm to write the memory contents to disk
especially if there is memory activity in the vm and storage is not ultra fast (writing 4gb of memory in 8 seconds amount to ~500MB/s write speed; which is not slow)
in our qemu savevm code, we try to minimize the time the vm is actually paused, but as i said, it may not be completely avoidable, depending on memory activity and write speed
 
The tests were performed on the server without load.

It is not clear why there is no freeze at all when using Qcow2 (on ZFS with sync=always or sync=standard), and why there are freezes when using ZFS volumes as storage (which is preferred and fastest storage mode for proxmox with ZFS).

We wanted to create live backup scheme with saving RAM state for best backup consistency, and it seems to be impossible with proxmox on ZFS so far.
It would be very helpful if you could investigate what causes this problem and how to avoid freezing on ZFS.
 
It is not clear why there is no freeze at all when using Qcow2 (on ZFS with sync=always or sync=standard), and why there are freezes when using ZFS volumes as storage (which is preferred and fastest storage mode for proxmox with ZFS).
probably different caching behaviour (page cache/arc vs nocache. etc)

We wanted to create live backup scheme with saving RAM state for best backup consistency, and it seems to be impossible with proxmox on ZFS so far.
It would be very helpful if you could investigate what causes this problem and how to avoid freezing on ZFS.
as i already said, this mostly depends on the underlying speed of the hardware and/or load of the guest, and a little pause is not really avoidable


edit: what i forgot to mention:
you can put the vmstate files/volumes (which contain the memory) on a different storage than your disks, with the 'vmstatestorage' option in the config
(see man qm for details)
this way you could experiment if putting the memory state onto a different storage works for your use case
 
I have similar problem in version 6.0
I have scripts running in cron, is doing live snapshot of VM with command "qm snapshot ID --vmstate 1" every 3 hours and deletes olds from last day.
After few days, this command hangs a VM, and I have kill -9 the process. After this, and starting machine everything is ok next few days.
Any suggestions?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!