[SOLVED] ZFS replication, VM with 3 disks base snapshot doesn't exist error

zikann · Sep 10, 2024

Hi,
my setup:
3 node proxmox HA - enabled
"zfs-raid" zfs pool on each node for the windows 11 VM main drive
"zfs-raid/files" zvol for for the windows 11 VM additional drive for files
"zfs-raid/firebirddb" zvol for for the windows 11 VM additional drive for firebirddb

When the guest had only 1 drive attached replication from node 2 to 1 and 3 went fine. Zero errors.
But when I've added the additional drives, for most of the time I get this error:

2024-09-10 09:13:07 100-1: base snapshot 'zfs-raid/vm-100-disk-4@__replicate_100-1_1725952020__' doesn't exist

Replication interval is configured for 1 minute but the errors keep existing on 5 minutes too.
Additionally I think that the replications that work are full replicas not incemental ones.

Anyone has an idea what could be the issue?

Example replication log:
Header
Proxmox
Virtual Environment 8.2.4
Search
Virtual Machine 100 (win11alter) on node 'pve2'
No Tags
Server View
Logs
()
2024-09-10 09:13:00 100-1: start replication job
2024-09-10 09:13:00 100-1: guest => VM 100, running => 23301
2024-09-10 09:13:00 100-1: volumes => files:vm-100-disk-4,firebirddb:vm-100-disk-3,zfs-raid:vm-100-disk-0,zfs-raid:vm-100-disk-1,zfs-raid:vm-100-disk-2
2024-09-10 09:13:02 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'files:vm-100-disk-0'
2024-09-10 09:13:02 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'files:vm-100-disk-2'
2024-09-10 09:13:02 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'files:vm-100-disk-3'
2024-09-10 09:13:03 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'files:vm-100-disk-1'
2024-09-10 09:13:04 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'firebirddb:vm-100-disk-1'
2024-09-10 09:13:04 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'firebirddb:vm-100-disk-4'
2024-09-10 09:13:04 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'firebirddb:vm-100-disk-2'
2024-09-10 09:13:04 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'firebirddb:vm-100-disk-0'
2024-09-10 09:13:04 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'zfs-raid:vm-100-disk-3'
2024-09-10 09:13:04 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'zfs-raid:vm-100-disk-4'
2024-09-10 09:13:04 100-1: freeze guest filesystem
2024-09-10 09:13:05 100-1: create snapshot '__replicate_100-1_1725952380__' on files:vm-100-disk-4
2024-09-10 09:13:05 100-1: create snapshot '__replicate_100-1_1725952380__' on firebirddb:vm-100-disk-3
2024-09-10 09:13:06 100-1: create snapshot '__replicate_100-1_1725952380__' on zfs-raid:vm-100-disk-0
2024-09-10 09:13:06 100-1: create snapshot '__replicate_100-1_1725952380__' on zfs-raid:vm-100-disk-1
2024-09-10 09:13:06 100-1: create snapshot '__replicate_100-1_1725952380__' on zfs-raid:vm-100-disk-2
2024-09-10 09:13:06 100-1: thaw guest filesystem
2024-09-10 09:13:06 100-1: using secure transmission, rate limit: none
2024-09-10 09:13:06 100-1: incremental sync 'files:vm-100-disk-4' (__replicate_100-1_1725952020__ => __replicate_100-1_1725952380__)
2024-09-10 09:13:07 100-1: send from @__replicate_100-1_1725952020__ to zfs-raid/vm-100-disk-4@__replicate_100-2_1725952179__ estimated size is 146K
2024-09-10 09:13:07 100-1: send from @__replicate_100-2_1725952179__ to zfs-raid/vm-100-disk-4@__replicate_100-1_1725952380__ estimated size is 122K
2024-09-10 09:13:07 100-1: total estimated size is 268K
2024-09-10 09:13:07 100-1: TIME SENT SNAPSHOT zfs-raid/vm-100-disk-4@__replicate_100-2_1725952179__
2024-09-10 09:13:07 100-1: TIME SENT SNAPSHOT zfs-raid/vm-100-disk-4@__replicate_100-1_1725952380__
2024-09-10 09:13:07 100-1: base snapshot 'zfs-raid/vm-100-disk-4@__replicate_100-1_1725952020__' doesn't exist
2024-09-10 09:13:08 100-1: cannot send 'zfs-raid/vm-100-disk-4': I/O error
2024-09-10 09:13:08 100-1: command 'zfs send -Rpv -I __replicate_100-1_1725952020__ -- zfs-raid/vm-100-disk-4@__replicate_100-1_1725952380__' failed: exit code 1
2024-09-10 09:13:08 100-1: delete previous replication snapshot '__replicate_100-1_1725952380__' on files:vm-100-disk-4
2024-09-10 09:13:08 100-1: delete previous replication snapshot '__replicate_100-1_1725952380__' on firebirddb:vm-100-disk-3
2024-09-10 09:13:08 100-1: delete previous replication snapshot '__replicate_100-1_1725952380__' on zfs-raid:vm-100-disk-0
2024-09-10 09:13:08 100-1: delete previous replication snapshot '__replicate_100-1_1725952380__' on zfs-raid:vm-100-disk-1
2024-09-10 09:13:08 100-1: delete previous replication snapshot '__replicate_100-1_1725952380__' on zfs-raid:vm-100-disk-2
2024-09-10 09:13:09 100-1: end replication job with error: command 'set -o pipefail && pvesm export files:vm-100-disk-4 zfs - -with-snapshots 1 -snapshot __replicate_100-1_1725952380__ -base __replicate_100-1_1725952020__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve3' -o 'UserKnownHostsFile=/etc/pve/nodes/pve3/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@fc00::3 -- pvesm import files:vm-100-disk-4 zfs - -with-snapshots 1 -snapshot __replicate_100-1_1725952380__ -allow-rename 0 -base __replicate_100-1_1725952020__' failed: exit code 1

zikann · Sep 10, 2024

Ok, it looks like the zvols were the problem. After moving drives to the "main" zpool "zfs-raid" everything is smooth for now.

Search

Search

[SOLVED] ZFS replication, VM with 3 disks base snapshot doesn't exist error

zikann

New Member

zikann

New Member

We value your privacy