Hello everyone,
I am incurring in an issue with replication on one Debian LXC Container.
I have 3 nodes in my PVE cluster:
- pve (primary, master)
- dr (secondary failover)
- raspberry pi zero (qdevice)
This container CT100 has got two subvols attached to it:
- One on local-zfs
- One on data-zfs
The issue I am having is that after the first replication job completes successfully, all the subsequent ones always fail with an I/O error on the data-zfs subvol.
This is the replication log:
I already tried deleting the subvol on the receiving data-zfs storage, but it's always the same thing. Restarting the whole cluster doesn't change anything.
I am incurring in an issue with replication on one Debian LXC Container.
I have 3 nodes in my PVE cluster:
- pve (primary, master)
- dr (secondary failover)
- raspberry pi zero (qdevice)
This container CT100 has got two subvols attached to it:
- One on local-zfs
- One on data-zfs
The issue I am having is that after the first replication job completes successfully, all the subsequent ones always fail with an I/O error on the data-zfs subvol.
This is the replication log:
Code:
2023-05-18 12:37:05 100-0: start replication job
2023-05-18 12:37:05 100-0: guest => CT 100, running => 1
2023-05-18 12:37:05 100-0: volumes => data-zfs:subvol-100-disk-0,local-zfs:subvol-100-disk-0
2023-05-18 12:37:06 100-0: freeze guest filesystem
2023-05-18 12:37:06 100-0: create snapshot '__replicate_100-0_1684406225__' on data-zfs:subvol-100-disk-0
2023-05-18 12:37:06 100-0: create snapshot '__replicate_100-0_1684406225__' on local-zfs:subvol-100-disk-0
2023-05-18 12:37:06 100-0: thaw guest filesystem
2023-05-18 12:37:06 100-0: using secure transmission, rate limit: none
2023-05-18 12:37:06 100-0: full sync 'data-zfs:subvol-100-disk-0' (__replicate_100-0_1684406225__)
2023-05-18 12:37:07 100-0: full send of data-zfs/subvol-100-disk-0@__replicate_100-0_1684406225__ estimated size is 246G
2023-05-18 12:37:07 100-0: total estimated size is 246G
2023-05-18 12:37:08 100-0: volume 'data-zfs/subvol-100-disk-0' already exists
2023-05-18 12:37:08 100-0: warning: cannot send 'data-zfs/subvol-100-disk-0@__replicate_100-0_1684406225__': signal received
2023-05-18 12:37:08 100-0: cannot send 'data-zfs/subvol-100-disk-0': I/O error
2023-05-18 12:37:08 100-0: command 'zfs send -Rpv -- data-zfs/subvol-100-disk-0@__replicate_100-0_1684406225__' failed: exit code 1
2023-05-18 12:37:08 100-0: delete previous replication snapshot '__replicate_100-0_1684406225__' on data-zfs:subvol-100-disk-0
2023-05-18 12:37:08 100-0: delete previous replication snapshot '__replicate_100-0_1684406225__' on local-zfs:subvol-100-disk-0
2023-05-18 12:37:08 100-0: end replication job with error: command 'set -o pipefail && pvesm export data-zfs:subvol-100-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_100-0_1684406225__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=dr' root@10.10.0.15 -- pvesm import data-zfs:subvol-100-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_100-0_1684406225__ -allow-rename 0' failed: exit code 255
I already tried deleting the subvol on the receiving data-zfs storage, but it's always the same thing. Restarting the whole cluster doesn't change anything.