[SOLVED] Corosync error joining node to cluster. (Old confs)

proxman4

Member
Mar 23, 2020
23
2
23
25
Hi .

Proxmox is a terrific technology i need to handle.

First of all thank you all the staff and community for this great product ! It's awesome.

I need your help guys because i'm stuk at this point. I'm pretty sure you can help me and there's some obvious think i do not see.

I got a problem when joining a node into a cluster.

Here is the scheme :

oOP1 and oOP2 were joined years ago (long before i was hired in the company i guess proxmox < 5 )


Now we need to migrate oOP1 [10.17.254.3] and oOP2 [10.17.254.2] to OP1 [10.17.254.5] and OP2 [10.17.254.4] (new servers)

We planned to do a pvecm add to join the cluster to do migration quickly and easily... but...

Once Debian 10.5 installed and all Proxmox stack we did this on OP1 [10.17.254.5]

pvecm add 10.17.254.3 -link0 10.17.254.5


Here are the nodes

Code:
    Nodeid      Votes Name

0x00000001          1 10.17.254.2

0x00000002          1 10.17.254.3 (local)

0x00000003          1 10.17.254.5

0x00000004          1 10.17.254.4


The last node we wanted to join with this command on OP2 :

pvecm add 10.17.254.2 -link0 10.17.254.5



For now the last node OP2 doesn't want to join the cluster and display this kind of errors :
Code:
[QUORUM] This node is within the non-primary component and will NOT provide any services.

[KNET  ] host: host: 1 has no active links

[TOTEM ] Token has not been received in 84 ms

[MAIN  ] interface section bindnetaddr is used together with no delist. Nodelist one is going to be used.

[MAIN  ] Please migrate config file to nodelist.

[KNET  ] udp: Received ICMP error from 10.17.254.4: No route to host 10.17.254.5

When i look at /etc/pve/.members

I can see nodes but with public IPs :

{
"nodename": "OP2",
"version": 514,
"cluster": { "name": "CLUSTER0", "version": 15, "nodes": 4, "quorate": 1 },
"nodelist": {
"oOP1": { "id": 2, "online": 1, "ip": "PUBLIC"},
"OP1": { "id": 3, "online": 1, "ip": "PUBLIC"},
"OP2": { "id": 4, "online": 1, "ip": "PUBLIC"},
"oOP2": { "id": 1, "online": 1, "ip": "PUBLIC"}
}
}

the /etc/pve/corosync.conf :

Code:
logging {                                                                                                                                                                                                   

  debug: on                                                                                                                                                                                                 

  to_syslog: yes                                                                                                                                                                                             

}                                                                                                                                                                                                           

                                                                                                                                                                                                            

nodelist {                                                                                                                                                                                                   

  node {                                                                                                                                                                                                     

    name: OP1                                                                                                                                                                                               

    nodeid: 3                                                                                                                                                                                               

    quorum_votes: 1                                                                                                                                                                                         

    ring0_addr: 10.17.254.5                                                                                                                                                                                 

  }                                                                                                                                                                                                         

  node {                                                                                                                                                                                                     

    name: OP2                                                                                                                                                                                               

    nodeid: 4                                                                                                                                                                                               

    quorum_votes: 1                                                                                                                                                                                         

    ring0_addr: 10.17.254.4                                                                                                                                                                                 

  }                                                                                                                                                                                                         

  node {                                                                                                                                                                                                     

    name: oOP1                                                                                                                                                                                         

    nodeid: 2                                                                                                                                                                                               

    quorum_votes: 1

    ring0_addr: 10.17.254.3

  }

  node {

    name: oOP2

    nodeid: 1

    quorum_votes: 1

    ring0_addr: 10.17.254.2

  }

}


quorum {

  provider: corosync_votequorum

}


totem {

  cluster_name: CLUSTER0

  config_version: 15

  interface {

    bindnetaddr: 10.17.254.2

    ringnumber: 0

  }

  ip_version: ipv4

  secauth: on

  version: 2




Can i change the bindnetaddr line without breaking anything ? We have prod servers on oOP1 and oOP2.

As manual says : corosync.conf

Code:
       bindnetaddr (udp only)
              This specifies the network address the corosync executive should bind to when using udp.

              bindnetaddr (udp only) should be an IP address configured on the system, or a network address.

              For  example,  if the local interface is 192.168.5.92 with netmask 255.255.255.0, you should set bindnetaddr to 192.168.5.92 or 192.168.5.0.  If the local interface is 192.168.5.92 with
              netmask 255.255.255.192, set bindnetaddr to 192.168.5.92 or 192.168.5.64, and so forth.

              This may also be an IPV6 address, in which case IPV6 networking will be used.  In this case, the exact address must be specified and there is no automatic selection of the  network  in‐
              terface within a specific subnet as with IPv4.

              If IPv6 networking is used, the nodeid field in nodelist must be specified.

I can see when i do : watch -n1 pvecm status :

Code:
Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      1
Quorum:           3 Activity blocked
Flags:

I know there is a trick to do it work properly. Unfortunately i'm pretty new to corosync/pve cluster management and work with my college was no synchronized we did this a little messy/quickly.

I can provide logs and more conf sure if it's needed . I was through such logs and conf thant i'm a little puzzled.

Thanks in advance.
 
Hi,

as you talk about new and old nodes I think the first thing to clear up is getting the PVE version running on each node.
Node joining is normally only guaranteed to be compatible between nodes with the same major version.

Can you please post the output of pveversion -v of each cluster node here?
 
@t.lamprecht

Thanks for your response.


pveversion -v

Code:
proxmox-ve: 6.2-1 (running kernel: 5.4.55-1-pve)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-5
pve-kernel-helper: 6.2-5
pve-kernel-5.4.55-1-pve: 5.4.55-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-5
libpve-guest-common-perl: 3.1-2
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-10
pve-cluster: 6.1-8
pve-container: 3.1-12
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-2 
pve-ha-manager: 3.0-9   
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-12
pve-xtermjs: 4.7.0-1
qemu-server: 6.2-11
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1

We ensured all pve pversions were identical.

We solved the problem :

1 - Removing the node from the cluster
2 - Reinstalling the node from scratch
3 - Changing the L.A.N interface IP with .6
4 - Changing its hostname
5 - Verifying all /etc/hosts on all nodes had proper hostnames/IP
6 - Joining the cluster from the last node with :

pvecm add 10.17.254.2 -link0 10.17.254.6

And it worked !

That is not a proper way to do but the most simple i found.

By the way the next step is to remove 10.17.254.2 from the cluster by i need to read the manual to understand how changing the IP cluster.

From what i read i can manage this by changing this bindnetaddress line in /etc/pve/corosync.conf from :

bindnetaddr: 10.17.254.2
to
bindnetaddr: 10.17.254.0

The next step need research and probably another post.

To be continued.
 
From what i read i can manage this by changing this bindnetaddress line in /etc/pve/corosync.conf from :

bindnetaddr: 10.17.254.2
to
bindnetaddr: 10.17.254.0

The "bindnetaddr" property is deprecated in the corosync version 3 used by Proxmox VE 6.x, so you can just drop it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!