Cluster in different subnets

Aleksej · Jan 9, 2022

Hello.

i have working cluster with 2 nodes in 192.168.88.0/24 subnet.
also i made eoip tunnel to other place and need to add a node with IP 192.168.88.105/25

all network configs was done. all IPs can see each other by tcp and udp. i mean that i can use ping, nmap -Su, tested with iperf3 tcp and udp.

the probles is that when i adding node 192.168.88.105 all cluster halts.
in node that is adding in logs i see

an 8 23:21:06 pski11 pmxcfs[800]: [main] notice: exit proxmox configuration filesystem (0)
Jan 8 23:21:06 pski11 pmxcfs[31354]: [quorum] crit: quorum_initialize failed: 2
Jan 8 23:21:06 pski11 pmxcfs[31354]: [quorum] crit: can't initialize service
Jan 8 23:21:06 pski11 pmxcfs[31354]: [confdb] crit: cmap_initialize failed: 2
Jan 8 23:21:06 pski11 pmxcfs[31354]: [confdb] crit: can't initialize service
Jan 8 23:21:06 pski11 pmxcfs[31354]: [dcdb] crit: cpg_initialize failed: 2
Jan 8 23:21:06 pski11 pmxcfs[31354]: [dcdb] crit: can't initialize service
Jan 8 23:21:06 pski11 pmxcfs[31354]: [status] crit: cpg_initialize failed: 2
Jan 8 23:21:06 pski11 pmxcfs[31354]: [status] crit: can't initialize service
Jan 8 23:21:12 pski11 pmxcfs[31354]: [status] notice: update cluster info (cluster name obulerm, version = 3)
Jan 8 23:21:13 pski11 pmxcfs[31354]: [dcdb] notice: cpg_join retry 10
Jan 8 23:21:14 pski11 pmxcfs[31354]: [dcdb] notice: cpg_join retry 20
Jan 8 23:21:15 pski11 pmxcfs[31354]: [dcdb] notice: cpg_join retry 30
...
Jan 8 23:21:24 pski11 pmxcfs[31354]: [dcdb] notice: cpg_join retry 120
Jan 8 23:21:25 pski11 pmxcfs[31354]: [dcdb] notice: cpg_join retry 130
Jan 8 23:21:26 pski11 pmxcfs[31354]: [dcdb] notice: cpg_join retry 140
Jan 8 23:22:03 pski11 pmxcfs[31354]: [status] notice: members: 3/31354
Jan 8 23:22:03 pski11 pmxcfs[31354]: [status] notice: all data is up to date
Jan 8 23:22:03 pski11 pmxcfs[31354]: [dcdb] notice: members: 3/31354
Jan 8 23:22:03 pski11 pmxcfs[31354]: [dcdb] notice: all data is up to date
Jan 8 23:22:06 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retry 10
Jan 8 23:22:07 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retry 20
...
Jan 8 23:22:14 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retry 90
Jan 8 23:22:15 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retry 100
Jan 8 23:22:15 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retried 100 times
Jan 8 23:22:15 pski11 pmxcfs[31354]: [status] crit: cpg_send_message failed: 6
Jan 8 23:22:16 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retry 10
Jan 8 23:22:17 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retry 20

all nodes updated to latest version:

proxmox-ve: 6.4-1 (running kernel: 5.4.140-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-11
pve-kernel-helper: 6.4-11
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.5-pve2~bpo10+1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve2~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.6-pve1~bpo10+1

one node have not updated the kernel. it cannot be rebooted. (it uses 5.4.140 whel latest is 5.4.157)
but when i created cluster that nodes was using different kernel. i think that problem is not in kernel

what can it be?

i thought that it can be because multicast is not configured working in tunnel. but checked - there is no multicast usage at all (on working cluster), only udp 5405 port.
all ports are tested with telnet, nmap -Su. all is ok.

P. S.
I created a test server with ip 192.168.88.50/24. And made cluster between 192.168.88.150 and 192.168.88.105 (in different subnet, via eoip tunnel). All works fine.
I compared corosync.conf from clusters - they are identical.

So, what can change in cluster config to make this work?... Or maybe this is a bug when adding node?

Search

Search

Cluster in different subnets

Aleksej

Well-Known Member