Hello
I found the freezing of the virtual machine at the time of the live snapshot with vmstate (when saving VM RAM).
Freezing appears as a loss of response from the VM for a few seconds, after which the VM continues to work without
losing tcp connections.
Proxmox Virtual Environment 6.0-7 on a ZFS pool of two SSD RAID 1 (ZFS v0.8.1).
VM - Ubuntu RAM: 5Gb; Hard disk: Virtio SCSI 10Gb on ZFS volume dataset.
Freeze looks like this:
(running snapshot with saving RAM state to ZFS volume)
# time qm snapshot 114 testsnap --vmstate 1
real 0m8.963s
user 0m0.285s
sys 0m0.080s
(testing ping to VM at the same time)
#ping 10.9.2.189
PING 10.9.2.189 (10.9.2.189): 56 data bytes
64 bytes from 10.9.2.189: icmp_seq=0 ttl=64 time=0.259 ms
64 bytes from 10.9.2.189: icmp_seq=1 ttl=64 time=0.347 ms
64 bytes from 10.9.2.189: icmp_seq=2 ttl=64 time=0.302 ms
64 bytes from 10.9.2.189: icmp_seq=3 ttl=64 time=7667.822 ms <- snapshot started
64 bytes from 10.9.2.189: icmp_seq=4 ttl=64 time=6636.308 ms
64 bytes from 10.9.2.189: icmp_seq=5 ttl=64 time=5581.749 ms
64 bytes from 10.9.2.189: icmp_seq=6 ttl=64 time=4527.823 ms
64 bytes from 10.9.2.189: icmp_seq=7 ttl=64 time=3515.349 ms
64 bytes from 10.9.2.189: icmp_seq=8 ttl=64 time=2503.208 ms
64 bytes from 10.9.2.189: icmp_seq=9 ttl=64 time=1446.281 ms
64 bytes from 10.9.2.189: icmp_seq=10 ttl=64 time=410.162 ms
64 bytes from 10.9.2.189: icmp_seq=11 ttl=64 time=0.248 ms
64 bytes from 10.9.2.189: icmp_seq=12 ttl=64 time=0.265 ms
64 bytes from 10.9.2.189: icmp_seq=13 ttl=64 time=0.284 ms
^C
--- 10.9.2.189 ping statistics ---
14 packets transmitted, 14 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.248/2306.458/7667.822/2684.618 ms
If the VM disks are in the Qcow2 files (stored on ZFS dataset), freezing is not observed or they are negligible.
(running snapshot with saving RAM state to file)
# time qm snapshot 116 testsnap --vmstate 1
Formatting '/var/lib/vz/images/116/vm-116-state-testsnap.raw', fmt=raw size=11261706240
real 0m0.985s
user 0m0.315s
sys 0m0.044s
(testing ping to VM at the same time)
#ping 10.9.2.189
PING 10.9.2.189 (10.9.2.189): 56 data bytes
64 bytes from 10.9.2.189: icmp_seq=0 ttl=64 time=0.261 ms
64 bytes from 10.9.2.189: icmp_seq=1 ttl=64 time=0.283 ms
64 bytes from 10.9.2.189: icmp_seq=2 ttl=64 time=0.343 ms
64 bytes from 10.9.2.189: icmp_seq=3 ttl=64 time=0.317 ms
64 bytes from 10.9.2.189: icmp_seq=4 ttl=64 time=0.137 ms
64 bytes from 10.9.2.189: icmp_seq=5 ttl=64 time=0.236 ms
64 bytes from 10.9.2.189: icmp_seq=6 ttl=64 time=0.263 ms
^C
--- 10.9.2.189 ping statistics ---
7 packets transmitted, 7 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.137/0.263/0.343/0.061 ms
So there is some problem with saving VM RAM state to ZFS datataset.
I also tried to change sync mode of ZFS pool to: always, standard, disabled.
There is no any effect for taking snapshot of VM on ZFS volume - freeze is always present.
But there is visible effect when taking snapshot for VMs in Qcow2:
sync=always: every ping time increased to 200-400 ms
sync=standard: no freeze
sync=disabled: 7 second freeze (same as for VM on ZFS volume).
So problem with live snapshot of VMs on ZFS volume may be related to write caching of ZFS.
When using ZFS volumes on HDD instead of SSD, freeze is about 30-35 seconds for a VM with only 4 Gb of RAM.
Anyone have ideas why are these freezes occur, or how to avoid them when using ZFS?
I found the freezing of the virtual machine at the time of the live snapshot with vmstate (when saving VM RAM).
Freezing appears as a loss of response from the VM for a few seconds, after which the VM continues to work without
losing tcp connections.
Proxmox Virtual Environment 6.0-7 on a ZFS pool of two SSD RAID 1 (ZFS v0.8.1).
VM - Ubuntu RAM: 5Gb; Hard disk: Virtio SCSI 10Gb on ZFS volume dataset.
Freeze looks like this:
(running snapshot with saving RAM state to ZFS volume)
# time qm snapshot 114 testsnap --vmstate 1
real 0m8.963s
user 0m0.285s
sys 0m0.080s
(testing ping to VM at the same time)
#ping 10.9.2.189
PING 10.9.2.189 (10.9.2.189): 56 data bytes
64 bytes from 10.9.2.189: icmp_seq=0 ttl=64 time=0.259 ms
64 bytes from 10.9.2.189: icmp_seq=1 ttl=64 time=0.347 ms
64 bytes from 10.9.2.189: icmp_seq=2 ttl=64 time=0.302 ms
64 bytes from 10.9.2.189: icmp_seq=3 ttl=64 time=7667.822 ms <- snapshot started
64 bytes from 10.9.2.189: icmp_seq=4 ttl=64 time=6636.308 ms
64 bytes from 10.9.2.189: icmp_seq=5 ttl=64 time=5581.749 ms
64 bytes from 10.9.2.189: icmp_seq=6 ttl=64 time=4527.823 ms
64 bytes from 10.9.2.189: icmp_seq=7 ttl=64 time=3515.349 ms
64 bytes from 10.9.2.189: icmp_seq=8 ttl=64 time=2503.208 ms
64 bytes from 10.9.2.189: icmp_seq=9 ttl=64 time=1446.281 ms
64 bytes from 10.9.2.189: icmp_seq=10 ttl=64 time=410.162 ms
64 bytes from 10.9.2.189: icmp_seq=11 ttl=64 time=0.248 ms
64 bytes from 10.9.2.189: icmp_seq=12 ttl=64 time=0.265 ms
64 bytes from 10.9.2.189: icmp_seq=13 ttl=64 time=0.284 ms
^C
--- 10.9.2.189 ping statistics ---
14 packets transmitted, 14 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.248/2306.458/7667.822/2684.618 ms
If the VM disks are in the Qcow2 files (stored on ZFS dataset), freezing is not observed or they are negligible.
(running snapshot with saving RAM state to file)
# time qm snapshot 116 testsnap --vmstate 1
Formatting '/var/lib/vz/images/116/vm-116-state-testsnap.raw', fmt=raw size=11261706240
real 0m0.985s
user 0m0.315s
sys 0m0.044s
(testing ping to VM at the same time)
#ping 10.9.2.189
PING 10.9.2.189 (10.9.2.189): 56 data bytes
64 bytes from 10.9.2.189: icmp_seq=0 ttl=64 time=0.261 ms
64 bytes from 10.9.2.189: icmp_seq=1 ttl=64 time=0.283 ms
64 bytes from 10.9.2.189: icmp_seq=2 ttl=64 time=0.343 ms
64 bytes from 10.9.2.189: icmp_seq=3 ttl=64 time=0.317 ms
64 bytes from 10.9.2.189: icmp_seq=4 ttl=64 time=0.137 ms
64 bytes from 10.9.2.189: icmp_seq=5 ttl=64 time=0.236 ms
64 bytes from 10.9.2.189: icmp_seq=6 ttl=64 time=0.263 ms
^C
--- 10.9.2.189 ping statistics ---
7 packets transmitted, 7 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.137/0.263/0.343/0.061 ms
So there is some problem with saving VM RAM state to ZFS datataset.
I also tried to change sync mode of ZFS pool to: always, standard, disabled.
There is no any effect for taking snapshot of VM on ZFS volume - freeze is always present.
But there is visible effect when taking snapshot for VMs in Qcow2:
sync=always: every ping time increased to 200-400 ms
sync=standard: no freeze
sync=disabled: 7 second freeze (same as for VM on ZFS volume).
So problem with live snapshot of VMs on ZFS volume may be related to write caching of ZFS.
When using ZFS volumes on HDD instead of SSD, freeze is about 30-35 seconds for a VM with only 4 Gb of RAM.
Anyone have ideas why are these freezes occur, or how to avoid them when using ZFS?