Cluster in different subnets

Aleksej

Well-Known Member
Feb 25, 2018
62
4
48
38
Hello.

i have working cluster with 2 nodes in 192.168.88.0/24 subnet.
also i made eoip tunnel to other place and need to add a node with IP 192.168.88.105/25

all network configs was done. all IPs can see each other by tcp and udp. i mean that i can use ping, nmap -Su, tested with iperf3 tcp and udp.

the probles is that when i adding node 192.168.88.105 all cluster halts.
in node that is adding in logs i see
an 8 23:21:06 pski11 pmxcfs[800]: [main] notice: exit proxmox configuration filesystem (0)
Jan 8 23:21:06 pski11 pmxcfs[31354]: [quorum] crit: quorum_initialize failed: 2
Jan 8 23:21:06 pski11 pmxcfs[31354]: [quorum] crit: can't initialize service
Jan 8 23:21:06 pski11 pmxcfs[31354]: [confdb] crit: cmap_initialize failed: 2
Jan 8 23:21:06 pski11 pmxcfs[31354]: [confdb] crit: can't initialize service
Jan 8 23:21:06 pski11 pmxcfs[31354]: [dcdb] crit: cpg_initialize failed: 2
Jan 8 23:21:06 pski11 pmxcfs[31354]: [dcdb] crit: can't initialize service
Jan 8 23:21:06 pski11 pmxcfs[31354]: [status] crit: cpg_initialize failed: 2
Jan 8 23:21:06 pski11 pmxcfs[31354]: [status] crit: can't initialize service
Jan 8 23:21:12 pski11 pmxcfs[31354]: [status] notice: update cluster info (cluster name obulerm, version = 3)
Jan 8 23:21:13 pski11 pmxcfs[31354]: [dcdb] notice: cpg_join retry 10
Jan 8 23:21:14 pski11 pmxcfs[31354]: [dcdb] notice: cpg_join retry 20
Jan 8 23:21:15 pski11 pmxcfs[31354]: [dcdb] notice: cpg_join retry 30
...
Jan 8 23:21:24 pski11 pmxcfs[31354]: [dcdb] notice: cpg_join retry 120
Jan 8 23:21:25 pski11 pmxcfs[31354]: [dcdb] notice: cpg_join retry 130
Jan 8 23:21:26 pski11 pmxcfs[31354]: [dcdb] notice: cpg_join retry 140
Jan 8 23:22:03 pski11 pmxcfs[31354]: [status] notice: members: 3/31354
Jan 8 23:22:03 pski11 pmxcfs[31354]: [status] notice: all data is up to date
Jan 8 23:22:03 pski11 pmxcfs[31354]: [dcdb] notice: members: 3/31354
Jan 8 23:22:03 pski11 pmxcfs[31354]: [dcdb] notice: all data is up to date
Jan 8 23:22:06 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retry 10
Jan 8 23:22:07 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retry 20
...
Jan 8 23:22:14 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retry 90
Jan 8 23:22:15 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retry 100
Jan 8 23:22:15 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retried 100 times
Jan 8 23:22:15 pski11 pmxcfs[31354]: [status] crit: cpg_send_message failed: 6
Jan 8 23:22:16 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retry 10
Jan 8 23:22:17 pski11 pmxcfs[31354]: [status] notice: cpg_send_message retry 20

all nodes updated to latest version:
proxmox-ve: 6.4-1 (running kernel: 5.4.140-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-11
pve-kernel-helper: 6.4-11
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.5-pve2~bpo10+1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve2~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.6-pve1~bpo10+1
one node have not updated the kernel. it cannot be rebooted. (it uses 5.4.140 whel latest is 5.4.157)
but when i created cluster that nodes was using different kernel. i think that problem is not in kernel

what can it be?

i thought that it can be because multicast is not configured working in tunnel. but checked - there is no multicast usage at all (on working cluster), only udp 5405 port.
all ports are tested with telnet, nmap -Su. all is ok.


P. S.
I created a test server with ip 192.168.88.50/24. And made cluster between 192.168.88.150 and 192.168.88.105 (in different subnet, via eoip tunnel). All works fine.
I compared corosync.conf from clusters - they are identical.


So, what can change in cluster config to make this work?... Or maybe this is a bug when adding node?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!