Add to Cluster problem

PretoX

Well-Known Member
Apr 5, 2016
44
10
48
38
Hi guys,

have a problem:
Master server:
# pveversion -v
proxmox-ve: 4.3-71 (running kernel: 4.4.21-1-pve)
pve-manager: 4.3-9 (running version: 4.3-9/f7c6f0cd)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.13-2-pve: 4.4.13-58
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.2.8-1-pve: 4.2.8-41
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.10-1-pve: 4.4.10-54
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-46
qemu-server: 4.0-92
pve-firmware: 1.1-10
libpve-common-perl: 4.0-79
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-68
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.3-12
pve-qemu-kvm: 2.7.0-4
pve-container: 1.0-80
pve-firewall: 2.0-31
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.5-1
lxcfs: 2.0.4-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
hosts file
172.16.0.3 clustermaster.corosync
172.16.0.2 pm1node.corosync

# ifconfig
bond0 Link encap:Ethernet
inet addr:192.168.153.120 Bcast:192.168.153.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:767188627 errors:0 dropped:4868 overruns:0 frame:0
TX packets:712119867 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:617556309213 (575.1 GiB) TX bytes:710819810083 (662.0 GiB)

bond0:0 Link encap:Ethernet
inet addr:172.16.0.3 Bcast:172.16.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1

vmbr0 Link encap:Ethernet
inet addr:xxx.xxx.xxx.120 Bcast:xxx.xxx.xxx.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:143372848 errors:0 dropped:4838 overruns:0 frame:0
TX packets:3527001 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:463234219769 (431.4 GiB) TX bytes:573326205 (546.7 MiB)
# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: PM1-BNE2
nodeid: 1
quorum_votes: 1
ring0_addr: 172.16.0.3
}

node {
name: pm1
nodeid: 2
quorum_votes: 1
ring0_addr: 172.16.0.2
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: INGPMForest
config_version: 11
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 172.16.0.3
ringnumber: 0
}

}
Slave server config:
# pveversion -v
proxmox-ve: 4.3-66 (running kernel: 4.4.19-1-pve)
pve-manager: 4.3-1 (running version: 4.3-1/e7cdc165)
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-46
qemu-server: 4.0-88
pve-firmware: 1.1-9
libpve-common-perl: 4.0-73
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-61
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-6
pve-container: 1.0-75
pve-firewall: 2.0-29
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
zfsutils: 0.6.5.7-pve10~bpo80
hosts file:
172.16.0.3 clustermaster.corosync
172.16.0.2 pm1node.corosync

# ifconfig
vmbr0 Link encap:Ethernet
inet addr:yyy.yyy.yyy.50 Bcast:yyy.yyy.yyy.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:23838606 errors:0 dropped:6208 overruns:0 frame:0
TX packets:109785 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1489568250 (1.3 GiB) TX bytes:14887235 (14.1 MiB)

vmbr0:1 Link encap:Ethernet
inet addr:172.16.0.2 Bcast:172.16.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1


I'm trying to execute on slave
# pvecm add 172.16.0.3 -ring0_addr 172.16.0.2 -f
# cat corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: PM1-BNE2
nodeid: 1
quorum_votes: 1
ring0_addr: 172.16.0.3
}

node {
name: pm1
nodeid: 2
quorum_votes: 1
ring0_addr: 172.16.0.2
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: INGPMForest
config_version: 11
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 172.16.0.3
ringnumber: 0
}

}
# pvecm add 172.16.0.3 -ring0_addr 172.16.0.2 -f
can't create shared ssh key database '/etc/pve/priv/authorized_keys'
node pm1 already defined
copy corosync auth key
stopping pve-cluster service
backup old database
Job for corosync.service failed. See 'systemctl status corosync.service' and 'journalctl -xn' for details.
waiting for quorum...

# systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
Active: failed (Result: exit-code) since Wed 2016-12-07 19:53:25 AEST; 1min 33s ago
Process: 22276 ExecStart=/usr/share/corosync/corosync start (code=exited, status=1/FAILURE)

Dec 07 19:52:25 pm1 corosync[22290]: [QB ] withdrawing server sockets
Dec 07 19:52:25 pm1 corosync[22290]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
Dec 07 19:52:25 pm1 corosync[22290]: [QB ] withdrawing server sockets
Dec 07 19:52:25 pm1 corosync[22290]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
Dec 07 19:52:25 pm1 corosync[22290]: [SERV ] Service engine unloaded: corosync profile loading service
Dec 07 19:52:25 pm1 corosync[22290]: [MAIN ] Corosync Cluster Engine exiting normally
Dec 07 19:53:25 pm1 systemd[1]: corosync.service: control process exited, code=exited status=1
Dec 07 19:53:25 pm1 systemd[1]: Failed to start Corosync Cluster Engine.
Dec 07 19:53:25 pm1 systemd[1]: Unit corosync.service entered failed state.
Dec 07 19:53:25 pm1 corosync[22276]: Starting Corosync Cluster Engine (corosync): [FAILED]

# journalctl -xn
-- Logs begin at Thu 2016-12-01 22:48:16 AEST, end at Wed 2016-12-07 19:55:05 AEST. --
Dec 07 19:55:02 pm1 pveproxy[23708]: worker 22737 started
Dec 07 19:55:02 pm1 pveproxy[23708]: worker 22733 finished
Dec 07 19:55:02 pm1 pveproxy[23708]: starting 1 worker(s)
Dec 07 19:55:02 pm1 pveproxy[22737]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1626.
Dec 07 19:55:02 pm1 pveproxy[23708]: worker 22738 started
Dec 07 19:55:02 pm1 pveproxy[22738]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1626.
Dec 07 19:55:05 pm1 pmxcfs[22264]: [quorum] crit: quorum_initialize failed: 2
Dec 07 19:55:05 pm1 pmxcfs[22264]: [confdb] crit: cmap_initialize failed: 2
Dec 07 19:55:05 pm1 pmxcfs[22264]: [dcdb] crit: cpg_initialize failed: 2
Dec 07 19:55:05 pm1 pmxcfs[22264]: [status] crit: cpg_initialize failed: 2

# cat corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: PM1-BNE2
nodeid: 1
quorum_votes: 1
ring0_addr: 172.16.0.3
}

node {
name: pm1
nodeid: 2
quorum_votes: 1
ring0_addr: 172.16.0.2
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: INGPMForest
config_version: 11
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 172.16.0.3
ringnumber: 0
}

}
I have subscription for slave server too but at this stage it shouldn't care, right?

I have tested completely the same configuration on 2 pm clean nodes and all passes great.

Please help
 
Seems there is a problem with multicast on your network. Please test using "omping":

https://pve.proxmox.com/wiki/Multicast_notes
That's funny:
# corosync-cmapctl| grep mcast
runtime.config.totem.rrp_problem_count_mcast_threshold (u32) = 100
runtime.totem.pg.mrp.srp.mcast_retx (u64) = 0
runtime.totem.pg.mrp.srp.mcast_rx (u64) = 0
runtime.totem.pg.mrp.srp.mcast_tx (u64) = 12473
totem.interface.0.mcastaddr (str) = 239.192.88.198
totem.interface.0.mcastport (u16) = 5405

Where did this IP come from?


Maybe I should remap 172.16.0.3 from bond0:0 to vmbr0:0?

UPD, remapped from bond to cmbr same result - no multicast. Strange, installing new pm in the same subnet and connecting to this new node works. So multicast works for all devices except this node, probably Dell nic issue...
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!