Hi,
In a cluster with several nodes replicating between them, we noticed that sometimes ZFS replication task gets stuck on Windows VMs when freezing guest filesystem. Agents are updated to the last version.
Here's the replication log when the issue happens:
Today it happened again, but before logging the snapshot part.
It does not happen if I disable qemu-guest agent.
The main problem is that this kind of failure blocks all other pending replications, which get in queue until the failing one reaches its timeout. Maybe an option to set a lower freeze guest filesystem timeout could be a solution?
Underlying storage on all nodes is 4 x NVMe configured as ZFS striped mirrors.
Here are the package versions:
Thank you
In a cluster with several nodes replicating between them, we noticed that sometimes ZFS replication task gets stuck on Windows VMs when freezing guest filesystem. Agents are updated to the last version.
Here's the replication log when the issue happens:
Code:
2024-02-08 13:50:01 321-2: start replication job
2024-02-08 13:50:01 321-2: guest => VM 321, running => 1737073
2024-02-08 13:50:01 321-2: volumes => local-zfs:vm-321-disk-0,local-zfs:vm-321-disk-1
2024-02-08 13:50:02 321-2: freeze guest filesystem
2024-02-08 14:50:02 321-2: create snapshot '__replicate_321-2_1707396601__' on local-zfs:vm-321-disk-0
2024-02-08 14:50:02 321-2: create snapshot '__replicate_321-2_1707396601__' on local-zfs:vm-321-disk-1
Today it happened again, but before logging the snapshot part.
Code:
2024-02-15 08:31:43 501-2: start replication job
2024-02-15 08:31:43 501-2: guest => VM 501, running => 164978
2024-02-15 08:31:43 501-2: volumes => local-zfs:vm-501-disk-0,local-zfs:vm-501-disk-1
It does not happen if I disable qemu-guest agent.
The main problem is that this kind of failure blocks all other pending replications, which get in queue until the failing one reaches its timeout. Maybe an option to set a lower freeze guest filesystem timeout could be a solution?
Underlying storage on all nodes is 4 x NVMe configured as ZFS striped mirrors.
Here are the package versions:
Code:
proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
openvswitch-switch: 2.15.0+ds1-2+deb11u4
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-4
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
Thank you
Last edited: