[SOLVED] Separate Migration Network - Replication Failures

Rocksince87 · Jan 10, 2025

Hello all,

I have been scratching my head for days now pouring through the forms looking for solutions as I can't be the only one who has had this issue. I've found some other posts but nothing on those worked, and I did not want to necro an old post.

EDIT: Source thread that I followed along with the official docs.

I

Thread 'Storage replication on separate NIC'

Dec 22, 2022

Planning a sizeable rebuild of my servers very soon. After asking questions on here I've decided I'm going to cluster my three servers together and I believe I've preemptively ironed out all the main details for the questions I had. Initially I wasn't going to bother with HA as I don't know that I want to setup CEPH and shared storage through a NAS seems like it leaves a large point of failure negating HA. Just yesterday I stumbled upon replication and it seems perfect for my use case. A once a day backup to the other nodes, even less to be honest, is perfectly fine for my setup. The...

First off, this is all pure LAB.

Firewall is off.

Proxmox Version 8.3.2 on all nodes

Right now, I have a VM ID: 102, a small Windows VM. I'm trying to replicate to nodes in the cluster.
It works on the "default" mgmt. network but not on the separate cluster/migration network.

Error log
2025-01-10 15:51:00 102-1: start replication job
2025-01-10 15:51:00 102-1: guest => VM 102, running => 3000
2025-01-10 15:51:00 102-1: volumes => local-zfs:vm-102-disk-0
2025-01-10 15:53:02 102-1: (remote_prepare_local_job) Connection closed by 10.249.10.53 port 22
2025-01-10 15:53:02 102-1: end replication job with error: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=RAYLAB-PMO3' -o 'UserKnownHostsFile=/etc/pve/nodes/RAYLAB-PMO3/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@10.249.10.53 -- pvesr prepare-local-job 102-1 --scan local-zfs local-zfs:vm-102-disk-0 --last_sync 0' failed: exit code 255

I have 3 nodes here are the networks on them
vrmb0 - 10.249.0.51, 52 and 53/24 Intended as mgmt for the web
vrmb1 - 10.249.10.51, 52 and 53/24 Intended as cluster and migration network, (I want migration network to be separated but I can't even get this to work)

All nodes can ping each other. Nodes can SSH via shell on 10.249.0.0 but not 10.249.10.0

3 node cluster config
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: RAYLAB-PMO1
nodeid: 1
quorum_votes: 2
ring0_addr: 10.249.10.51
}
node {
name: RAYLAB-PMO2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.249.10.52
}
node {
name: RAYLAB-PMO3
nodeid: 3
quorum_votes: 1
ring0_addr: 10.249.10.53
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: RAYLAB-CLUSTER
config_version: 4
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}

--------------------------
When I was reading about possible solutions, I tried to disable the secure requirement. This did not work.

in /etc/pve/datacenter.conf
keyboard: en-us
migration: network=10.249.10.0/24,type=insecure

-------------------------

Can someone help out and let me know what I'm doing wrong. Thank you!

Rocksince87 · Jan 11, 2025

Oh my goodness...I forgot to turn on JUMBO frames on my switch ports...

Search

Search

[SOLVED] Separate Migration Network - Replication Failures

Rocksince87

New Member

Thread 'Storage replication on separate NIC'

Rocksince87

New Member

We value your privacy