[SOLVED] ZFS replication, VM with 3 disks base snapshot doesn't exist error

zikann

New Member
Sep 10, 2024
2
0
1
Hi,
my setup:
3 node proxmox HA - enabled
"zfs-raid" zfs pool on each node for the windows 11 VM main drive
"zfs-raid/files" zvol for for the windows 11 VM additional drive for files
"zfs-raid/firebirddb" zvol for for the windows 11 VM additional drive for firebirddb

When the guest had only 1 drive attached replication from node 2 to 1 and 3 went fine. Zero errors.
But when I've added the additional drives, for most of the time I get this error:

2024-09-10 09:13:07 100-1: base snapshot 'zfs-raid/vm-100-disk-4@__replicate_100-1_1725952020__' doesn't exist

Replication interval is configured for 1 minute but the errors keep existing on 5 minutes too.
Additionally I think that the replications that work are full replicas not incemental ones.

Anyone has an idea what could be the issue?

Example replication log:
Header
Proxmox
Virtual Environment 8.2.4
Search
Virtual Machine 100 (win11alter) on node 'pve2'
No Tags
Server View
Logs
()
2024-09-10 09:13:00 100-1: start replication job
2024-09-10 09:13:00 100-1: guest => VM 100, running => 23301
2024-09-10 09:13:00 100-1: volumes => files:vm-100-disk-4,firebirddb:vm-100-disk-3,zfs-raid:vm-100-disk-0,zfs-raid:vm-100-disk-1,zfs-raid:vm-100-disk-2
2024-09-10 09:13:02 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'files:vm-100-disk-0'
2024-09-10 09:13:02 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'files:vm-100-disk-2'
2024-09-10 09:13:02 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'files:vm-100-disk-3'
2024-09-10 09:13:03 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'files:vm-100-disk-1'
2024-09-10 09:13:04 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'firebirddb:vm-100-disk-1'
2024-09-10 09:13:04 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'firebirddb:vm-100-disk-4'
2024-09-10 09:13:04 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'firebirddb:vm-100-disk-2'
2024-09-10 09:13:04 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'firebirddb:vm-100-disk-0'
2024-09-10 09:13:04 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'zfs-raid:vm-100-disk-3'
2024-09-10 09:13:04 100-1: (remote_prepare_local_job) 100-1: delete stale volume 'zfs-raid:vm-100-disk-4'
2024-09-10 09:13:04 100-1: freeze guest filesystem
2024-09-10 09:13:05 100-1: create snapshot '__replicate_100-1_1725952380__' on files:vm-100-disk-4
2024-09-10 09:13:05 100-1: create snapshot '__replicate_100-1_1725952380__' on firebirddb:vm-100-disk-3
2024-09-10 09:13:06 100-1: create snapshot '__replicate_100-1_1725952380__' on zfs-raid:vm-100-disk-0
2024-09-10 09:13:06 100-1: create snapshot '__replicate_100-1_1725952380__' on zfs-raid:vm-100-disk-1
2024-09-10 09:13:06 100-1: create snapshot '__replicate_100-1_1725952380__' on zfs-raid:vm-100-disk-2
2024-09-10 09:13:06 100-1: thaw guest filesystem
2024-09-10 09:13:06 100-1: using secure transmission, rate limit: none
2024-09-10 09:13:06 100-1: incremental sync 'files:vm-100-disk-4' (__replicate_100-1_1725952020__ => __replicate_100-1_1725952380__)
2024-09-10 09:13:07 100-1: send from @__replicate_100-1_1725952020__ to zfs-raid/vm-100-disk-4@__replicate_100-2_1725952179__ estimated size is 146K
2024-09-10 09:13:07 100-1: send from @__replicate_100-2_1725952179__ to zfs-raid/vm-100-disk-4@__replicate_100-1_1725952380__ estimated size is 122K
2024-09-10 09:13:07 100-1: total estimated size is 268K
2024-09-10 09:13:07 100-1: TIME SENT SNAPSHOT zfs-raid/vm-100-disk-4@__replicate_100-2_1725952179__
2024-09-10 09:13:07 100-1: TIME SENT SNAPSHOT zfs-raid/vm-100-disk-4@__replicate_100-1_1725952380__
2024-09-10 09:13:07 100-1: base snapshot 'zfs-raid/vm-100-disk-4@__replicate_100-1_1725952020__' doesn't exist
2024-09-10 09:13:08 100-1: cannot send 'zfs-raid/vm-100-disk-4': I/O error
2024-09-10 09:13:08 100-1: command 'zfs send -Rpv -I __replicate_100-1_1725952020__ -- zfs-raid/vm-100-disk-4@__replicate_100-1_1725952380__' failed: exit code 1
2024-09-10 09:13:08 100-1: delete previous replication snapshot '__replicate_100-1_1725952380__' on files:vm-100-disk-4
2024-09-10 09:13:08 100-1: delete previous replication snapshot '__replicate_100-1_1725952380__' on firebirddb:vm-100-disk-3
2024-09-10 09:13:08 100-1: delete previous replication snapshot '__replicate_100-1_1725952380__' on zfs-raid:vm-100-disk-0
2024-09-10 09:13:08 100-1: delete previous replication snapshot '__replicate_100-1_1725952380__' on zfs-raid:vm-100-disk-1
2024-09-10 09:13:08 100-1: delete previous replication snapshot '__replicate_100-1_1725952380__' on zfs-raid:vm-100-disk-2
2024-09-10 09:13:09 100-1: end replication job with error: command 'set -o pipefail && pvesm export files:vm-100-disk-4 zfs - -with-snapshots 1 -snapshot __replicate_100-1_1725952380__ -base __replicate_100-1_1725952020__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve3' -o 'UserKnownHostsFile=/etc/pve/nodes/pve3/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@fc00::3 -- pvesm import files:vm-100-disk-4 zfs - -with-snapshots 1 -snapshot __replicate_100-1_1725952380__ -allow-rename 0 -base __replicate_100-1_1725952020__' failed: exit code 1
 
Ok, it looks like the zvols were the problem. After moving drives to the "main" zpool "zfs-raid" everything is smooth for now.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!