[SOLVED] Replication failing in cluster but only for some CT/VMs

helojunkie

Well-Known Member
Jul 28, 2017
69
2
48
56
San Diego, CA
I have a four-server cluster running HA and replication. I have multiple replication jobs running and most of them are replication just fine, but I have several that continue to fail with the following output and I cannot figure it out. I have deleted the replication jobs, readded them all to no avail.

In this particular case, replication is working perfectly for about a dozen VMs and CTs between the various nodes in the cluster, so it is not a machine-specific issue. Any help would be greatly appreciated.



Code:
2022-04-06 19:47:00 116-0: start replication job
2022-04-06 19:47:00 116-0: guest => CT 116, running => 1
2022-04-06 19:47:00 116-0: volumes => ssdimages:subvol-116-disk-0
2022-04-06 19:47:02 116-0: freeze guest filesystem
2022-04-06 19:47:02 116-0: create snapshot '__replicate_116-0_1649299620__' on ssdimages:subvol-116-disk-0
2022-04-06 19:47:02 116-0: thaw guest filesystem
2022-04-06 19:47:02 116-0: using secure transmission, rate limit: none
2022-04-06 19:47:02 116-0: full sync 'ssdimages:subvol-116-disk-0' (__replicate_116-0_1649299620__)
2022-04-06 19:47:03 116-0: full send of ssdimages/subvol-116-disk-0@__replicate_116-0_1649299620__ estimated size is 1.03G
2022-04-06 19:47:03 116-0: total estimated size is 1.03G
2022-04-06 19:47:04 116-0: Unknown option: snapshot
2022-04-06 19:47:04 116-0: 400 unable to parse option
2022-04-06 19:47:04 116-0: pvesm import <volume> <format> <filename> [OPTIONS]
2022-04-06 19:47:04 116-0: warning: cannot send 'ssdimages/subvol-116-disk-0@__replicate_116-0_1649299620__': signal received
2022-04-06 19:47:04 116-0: cannot send 'ssdimages/subvol-116-disk-0': I/O error
2022-04-06 19:47:04 116-0: command 'zfs send -Rpv -- ssdimages/subvol-116-disk-0@__replicate_116-0_1649299620__' failed: exit code 1
2022-04-06 19:47:04 116-0: delete previous replication snapshot '__replicate_116-0_1649299620__' on ssdimages:subvol-116-disk-0
2022-04-06 19:47:04 116-0: end replication job with error: command 'set -o pipefail && pvesm export ssdimages:subvol-116-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_116-0_1649299620__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=proxmox01' root@10.200.70.2 -- pvesm import ssdimages:subvol-116-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_116-0_1649299620__ -allow-rename 0' failed: exit code 255
 
Last edited:
Hi,
are you running mixed versions, i.e. some on 6.x and some on 7.x? Is the target of the replication fully upgraded (libpve-storage-perl should be the relevant package here)?
 
Thank you @Fabian_E

OK, I have four systems, all of them were built and put into production the same day, all of them have been updated continually, and an apt update on all systems show them up to date.

Of the four systems, replication is working as expected except that the 4th system (aptly named proxmox04) cannot SEND replication jobs. However all other nodes can replicate TO proxmox04 with no problems.


In checking the specific libpve-storage-perl on three of the systems it is 6.4-1 but on the system where we are having the issue (proxmox04) it is running 7.1-1 and this completely confuses me! All systems show up to date, with no updates available yet the 4th system (which is the system that cannot SEND replications jobs, but can RECEIVE them fine) is on a different version.

Can I just copy the 7.1-1 version to the other nodes?

All of the nodes show Virtual Environment 6.4-14
 
Last edited:
Thank you @Fabian_E

OK, I have four systems, all of them were built and put into production the same day, all of them have been updated continually, and an apt update on all systems show them up to date.

Of the four systems, replication is working as expected except that the 4th system (aptly named proxmox04) cannot SEND replication jobs. However all other nodes can replicate TO proxmox04 with no problems.


In checking the specific libpve-storage-perl on three of the systems it is 6.4-1 but on the system where we are having the issue (proxmox04) it is running 7.1-1 and this completely confuses me!
Was that node installed with Proxmox VE 7 or upgraded at some point? Running with mixed versions in a cluster should only be done during a major version upgrade, as it can lead to such problems.

All systems show up to date, with no updates available yet the 4th system (which is the system that cannot SEND replications jobs, but can RECEIVE them fine) is on a different version.

Can I just copy the 7.1-1 version to the other nodes?
I'd recommend to either re-install the fourth node with Proxmox VE 6.4 (see here) or properly upgrade the whole cluster (please note that Proxmox VE 6.x is end-of-life in a few months so that should be done anyway at some point). See:
https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0
for the upgrade guide.

All of the nodes show Virtual Environment 6.4-14
Ok, then that sounds like the node might've been partially upgraded. You can attempt to upgrade it, but if you want to make sure, I'd say re-installing is the way to go. (If you plan to upgrade the cluster to 7.x, you can of course install that directly before joining the cluster again).
 
Thank you so much for your help, they were all installed at the same time but something obviously got upgraded on the one! I will work on upgrading the rest of the nodes one-at-a-time and let you know if that did the trick.

Thank You again for your timely help and direction!
 
Just a follow-up in case anyone runs across this, that was the issue, one of the systems had a different version on it somehow and after upgrading them all to 7, everything works again!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!