Freezing of the VM the time of the live snapshot on ZFS

BiB · Oct 27, 2019

Hello

I found the freezing of the virtual machine at the time of the live snapshot with vmstate (when saving VM RAM).
Freezing appears as a loss of response from the VM for a few seconds, after which the VM continues to work without
losing tcp connections.

Proxmox Virtual Environment 6.0-7 on a ZFS pool of two SSD RAID 1 (ZFS v0.8.1).

VM - Ubuntu RAM: 5Gb; Hard disk: Virtio SCSI 10Gb on ZFS volume dataset.

Freeze looks like this:

(running snapshot with saving RAM state to ZFS volume)
# time qm snapshot 114 testsnap --vmstate 1

real 0m8.963s
user 0m0.285s
sys 0m0.080s

(testing ping to VM at the same time)
#ping 10.9.2.189
PING 10.9.2.189 (10.9.2.189): 56 data bytes
64 bytes from 10.9.2.189: icmp_seq=0 ttl=64 time=0.259 ms
64 bytes from 10.9.2.189: icmp_seq=1 ttl=64 time=0.347 ms
64 bytes from 10.9.2.189: icmp_seq=2 ttl=64 time=0.302 ms
64 bytes from 10.9.2.189: icmp_seq=3 ttl=64 time=7667.822 ms <- snapshot started
64 bytes from 10.9.2.189: icmp_seq=4 ttl=64 time=6636.308 ms
64 bytes from 10.9.2.189: icmp_seq=5 ttl=64 time=5581.749 ms
64 bytes from 10.9.2.189: icmp_seq=6 ttl=64 time=4527.823 ms
64 bytes from 10.9.2.189: icmp_seq=7 ttl=64 time=3515.349 ms
64 bytes from 10.9.2.189: icmp_seq=8 ttl=64 time=2503.208 ms
64 bytes from 10.9.2.189: icmp_seq=9 ttl=64 time=1446.281 ms
64 bytes from 10.9.2.189: icmp_seq=10 ttl=64 time=410.162 ms
64 bytes from 10.9.2.189: icmp_seq=11 ttl=64 time=0.248 ms
64 bytes from 10.9.2.189: icmp_seq=12 ttl=64 time=0.265 ms
64 bytes from 10.9.2.189: icmp_seq=13 ttl=64 time=0.284 ms
^C
--- 10.9.2.189 ping statistics ---
14 packets transmitted, 14 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.248/2306.458/7667.822/2684.618 ms

If the VM disks are in the Qcow2 files (stored on ZFS dataset), freezing is not observed or they are negligible.

(running snapshot with saving RAM state to file)
# time qm snapshot 116 testsnap --vmstate 1
Formatting '/var/lib/vz/images/116/vm-116-state-testsnap.raw', fmt=raw size=11261706240

real 0m0.985s
user 0m0.315s
sys 0m0.044s

(testing ping to VM at the same time)
#ping 10.9.2.189
PING 10.9.2.189 (10.9.2.189): 56 data bytes
64 bytes from 10.9.2.189: icmp_seq=0 ttl=64 time=0.261 ms
64 bytes from 10.9.2.189: icmp_seq=1 ttl=64 time=0.283 ms
64 bytes from 10.9.2.189: icmp_seq=2 ttl=64 time=0.343 ms
64 bytes from 10.9.2.189: icmp_seq=3 ttl=64 time=0.317 ms
64 bytes from 10.9.2.189: icmp_seq=4 ttl=64 time=0.137 ms
64 bytes from 10.9.2.189: icmp_seq=5 ttl=64 time=0.236 ms
64 bytes from 10.9.2.189: icmp_seq=6 ttl=64 time=0.263 ms
^C
--- 10.9.2.189 ping statistics ---
7 packets transmitted, 7 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.137/0.263/0.343/0.061 ms

So there is some problem with saving VM RAM state to ZFS datataset.

I also tried to change sync mode of ZFS pool to: always, standard, disabled.

There is no any effect for taking snapshot of VM on ZFS volume - freeze is always present.

But there is visible effect when taking snapshot for VMs in Qcow2:
sync=always: every ping time increased to 200-400 ms
sync=standard: no freeze
sync=disabled: 7 second freeze (same as for VM on ZFS volume).

So problem with live snapshot of VMs on ZFS volume may be related to write caching of ZFS.

When using ZFS volumes on HDD instead of SSD, freeze is about 30-35 seconds for a VM with only 4 Gb of RAM.

Anyone have ideas why are these freezes occur, or how to avoid them when using ZFS?

dcsapak · Oct 28, 2019

if you save the memory state of the vm, there may be a situation where you cannot avoid to pause the vm to write the memory contents to disk
especially if there is memory activity in the vm and storage is not ultra fast (writing 4gb of memory in 8 seconds amount to ~500MB/s write speed; which is not slow)
in our qemu savevm code, we try to minimize the time the vm is actually paused, but as i said, it may not be completely avoidable, depending on memory activity and write speed

BiB · Oct 29, 2019

The tests were performed on the server without load.

It is not clear why there is no freeze at all when using Qcow2 (on ZFS with sync=always or sync=standard), and why there are freezes when using ZFS volumes as storage (which is preferred and fastest storage mode for proxmox with ZFS).

We wanted to create live backup scheme with saving RAM state for best backup consistency, and it seems to be impossible with proxmox on ZFS so far.
It would be very helpful if you could investigate what causes this problem and how to avoid freezing on ZFS.

dcsapak · Oct 29, 2019

BiB said:
It is not clear why there is no freeze at all when using Qcow2 (on ZFS with sync=always or sync=standard), and why there are freezes when using ZFS volumes as storage (which is preferred and fastest storage mode for proxmox with ZFS).

probably different caching behaviour (page cache/arc vs nocache. etc)

BiB said:
We wanted to create live backup scheme with saving RAM state for best backup consistency, and it seems to be impossible with proxmox on ZFS so far.
It would be very helpful if you could investigate what causes this problem and how to avoid freezing on ZFS.

as i already said, this mostly depends on the underlying speed of the hardware and/or load of the guest, and a little pause is not really avoidable

edit: what i forgot to mention:
you can put the vmstate files/volumes (which contain the memory) on a different storage than your disks, with the 'vmstatestorage' option in the config
(see man qm for details)
this way you could experiment if putting the memory state onto a different storage works for your use case

WOJCIECH · Nov 29, 2020

I have similar problem in version 6.0
I have scripts running in cron, is doing live snapshot of VM with command "qm snapshot ID --vmstate 1" every 3 hours and deletes olds from last day.
After few days, this command hangs a VM, and I have kill -9 the process. After this, and starting machine everything is ok next few days.
Any suggestions?

Search

Search

Freezing of the VM the time of the live snapshot on ZFS

BiB

New Member

dcsapak

Proxmox Staff Member

BiB

New Member

dcsapak

Proxmox Staff Member

WOJCIECH

Active Member