pvecm addnode via ssh

tirili

Member
Sep 19, 2018
55
1
8
50
Is there any way like

pvecm add NOdEIP -use_ssh

to do the same for

pvecm addnode ?

Is there any way to change the node's name afterwards?

Having

# pvecm status
Quorum information
------------------
Date: Mon Oct 1 16:10:23 2018
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1/604
Quorate: Yes

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate Qdevice

Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 A,V,NMW 1.2.3.1 (local)
0x00000000 1 Qdevice


And want to change the name to the correct node name?

Thanks for your help in advance
 
Hi,

pvecm addnode is for internal use and you should not use it directly.
 
Hello Wolfgang, ok, thanks for the info.
Having issues with mcast, and configured to use udpu, the pvecm add on the 2nd node does end up in a desaster, as it is waiting for quorum, and not getting any.

Which is the recommended way to add a new node, and configure it to connect / use quorum afterwards? (I tried with -vote 0 as well)
I configured corosyc-qdevice on the 1st node which connects perfectly to the corosync-qnetd.
The second does not connect using corosyc-qdevice as the corosync.conf does not exist.

Any further help is appreciated.
Best regards
Thomas
 
We need some clarification.

omping for the nodes 10.40.20.84 and 10.40.20.20 is working:

10.40.20.84 : unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 2.790/2.868/2.925/0.060
10.40.20.84 : multicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 2.797/2.882/2.940/0.063
10.40.20.20 : unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 2.782/2.853/2.918/0.048
10.40.20.20 : multicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 2.830/2.892/2.933/0.038

Then I created the cluster

pvecm create CLUSTER -bindnet0_addr 10.40.20.20 --ring0_addr 10.40.20.20

Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/urandom.
Writing corosync key to /etc/corosync/authkey.
Writing corosync config to /etc/pve/corosync.conf
Restart corosync and cluster filesystem​

pvecm status
Quorum information
------------------
Date: Thu Oct 4 13:47:04 2018
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1/4
Quorate: Yes

Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.40.20.20 (local)​


Now I tried to add the other node and did on the node 10.40.20.84:

pvecm add 10.40.20.20 -votes 0 -ring0_addr 10.40.20.84 -use_ssh

The authenticity of host '10.40.20.20 (10.40.20.20)' can't be established.
ECDSA key fingerprint is SHA256:XXXXXXXXXXXX/yGIzWjSteDbjCuTkUl4tBFeDI.
Are you sure you want to continue connecting (yes/no)? yes
copy corosync auth key
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1538653699.sql.gz'
waiting for quorum...OK
(re)generate node files
generate new node certificate
merge authorized SSH keys and known hosts
generated new node certificate, restart pveproxy and pvedaemon services
successfully added node '10.40.20.20' to cluster.​



What I am really wondering is:

I want to add the node 10.40.20.84 to cluster, but not vice versa!
Documentation tells

pvecm add <hostname.of.existing.cluster.member> -ring0_addr <hostname.of.this.node.which.should.be.added>

I expected another output like "successfully added node '10.40.20.84' to cluster."

Any help/clarification is highly appreciated!
Best regards
Thomas
 
/etc/hosts and /etc/hostname are not identical.

I have in /etc/hosts

10.40.20.20 v20-int
10.40.20.84 v84-int
# and external, official IP
5.9.20.20 v20
188.9.20.84 v84

hostname matches on name of official IP.

Do you have any ideas how to solve?
 
What does the parameter "votes" do?
And, I use use_ssh, as I always get an error 500 when trying it with the API.
 
pvecm nodes only shows the local node,

root@vm20 ~ # pvecm nodes

Membership information
----------------------
Nodeid Votes Name
1 1 10.40.20.20 (local)

root@vm84 ~ # pvecm nodes

Membership information
----------------------
Nodeid Votes Name
2 1 10.40.20.84 (local)​

But corosync.conf on both nodes looks ok

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: vm20
nodeid: 1
quorum_votes: 1
ring0_addr: 10.40.20.20
}
node {
name: vm84
nodeid: 2
quorum_votes: 1
ring0_addr: 10.40.20.84
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: cpmx
config_version: 2
interface {
bindnetaddr: 10.40.20.0
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}​

So what is wrong, and how can I get the cluster up and running?

I expect to see on pvecm nodes both nodes on each system.

Is there manual way to get ist solved?
 
Please make a longer omping test 5 pings are not representative.
If your IGMP-snooping is wrong configured the first packages will not block.

Code:
omping -c 10000 -i 0.001 -F -q node1 node2 node3
 
We have now

10.40.20.92 : unicast, xmt/rcv/%loss = 9309/9309/0%, min/avg/max/std-dev = 0.214/0.254/0.417/0.023
10.40.20.92 : multicast, xmt/rcv/%loss = 9309/9309/0%, min/avg/max/std-dev = 0.232/0.290/0.465/0.023
10.40.20.20 : unicast, xmt/rcv/%loss = 10000/9999/0%, min/avg/max/std-dev = 2.726/2.773/2.923/0.034
10.40.20.20 : multicast, xmt/rcv/%loss = 10000/9999/0%, min/avg/max/std-dev = 2.751/2.815/2.947/0.029


Best regards
Thomas
 
Still the question, is there any way to add cluster nodes manually? It is really hard to see that pvecm add compromizes the master node, as the new node requires a quorum... is there any way to add and get the new node arbitrating ?
 
The config what you send is correct so the question is why the second node failed?
Normally you get an error when you check

systemctl status corosync

Also, check on the second node if /etc/pve/corosync.conf exists and it is the same as that one on the first node.
 
So, from the beginning:
omping -c 10000 -i 0.001 -F -q 10.40.20.84 10.40.20.92 10.40.20.20
works:
10.40.20.84 : unicast, xmt/rcv/%loss = 10000/9998/0%, min/avg/max/std-dev = 2.724/2.770/2.920/0.032
10.40.20.84 : multicast, xmt/rcv/%loss = 10000/9998/0%, min/avg/max/std-dev = 2.743/2.811/2.930/0.029
10.40.20.92 : unicast, xmt/rcv/%loss = 9154/9152/0%, min/avg/max/std-dev = 2.725/2.773/2.933/0.029
10.40.20.92 : multicast, xmt/rcv/%loss = 9154/9152/0%, min/avg/max/std-dev = 2.755/2.811/2.942/0.025
10.40.20.20 : unicast, xmt/rcv/%loss = 10000/9998/0%, min/avg/max/std-dev = 2.726/2.776/2.905/0.029
10.40.20.20 : multicast, xmt/rcv/%loss = 10000/9998/0%, min/avg/max/std-dev = 2.747/2.813/2.964/0.026


all /etc/hosts
10.40.20.84 vm84-int.proxmox.com vm84-int vm84 vm84.proxmox.com
10.40.20.20 vm20-int.proxmox.com vm20-int vm20 vm20.proxmox.com
10.40.20.92 vm92-int.proxmox.com vm92-int vm92 vm92.proxmox.com


on vm20:

root@vm20 ~ # pvecm status
Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?
Cannot initialize CMAP service
We take for bindnet0_addr the complete cluster network, so we specify 10.40.20.0 (netmask 24)
root@vm20 ~ # pvecm create pmxc -bindnet0_addr 10.40.20.0 -ring0_addr 10.40.20.20
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/urandom.
Writing corosync key to /etc/corosync/authkey.
Writing corosync config to /etc/pve/corosync.conf
Restart corosync and cluster filesystem
root@vm20 ~ # pvecm status
Quorum information
------------------
Date: Wed Oct 10 21:39:10 2018
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1/4
Quorate: Yes
Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.40.20.20 (local)
root@vm20 ~ #


Now edit config for qdevice, and increment config_version (from 1 to 2 )

vi /etc/corosync/corosync.conf

logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: vm20
nodeid: 1
quorum_votes: 1
ring0_addr: 10.40.20.20
}
}
quorum {
provider: corosync_votequorum
device {
model: net
votes: 1
net {
tls: off
host: 9.17.8.23
port: 5403
algorithm: ffsplit
}
}
}
totem {
cluster_name: pmxc
config_version: 2
interface {
bindnetaddr: 10.40.20.0
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}

root@vm20 ~ # pvecm status
Quorum information
------------------
Date: Wed Oct 10 21:42:46 2018
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1/4
Quorate: Yes
Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.40.20.20 (local)


Now start corosync-qdevice

root@vm20 ~ # systemctl start corosync-qdevice
Job for corosync-qdevice.service failed because the control process exited with error code.
See "systemctl status corosync-qdevice.service" and "journalctl -xe" for details.


Failed:

root@vm20 ~ # systemctl status corosync-qdevice.service
● corosync-qdevice.service - Corosync Qdevice daemon
Loaded: loaded (/lib/systemd/system/corosync-qdevice.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2018-10-10 21:43:01 CEST; 1min 33s ago
Docs: man:corosync-qdevice
Process: 24504 ExecStart=/usr/sbin/corosync-qdevice -f $COROSYNC_QDEVICE_OPTIONS (code=exited, status=1/FAILURE)
Main PID: 24504 (code=exited, status=1/FAILURE)
CPU: 3ms
Oct 10 21:43:01 vm20 systemd[1]: Starting Corosync Qdevice daemon...
Oct 10 21:43:01 vm20 corosync-qdevice[24504]: Can't read quorum.device.model cmap key.
Oct 10 21:43:01 vm20 systemd[1]: corosync-qdevice.service: Main process exited, code=exited, status=1/FAILURE
Oct 10 21:43:01 vm20 systemd[1]: Failed to start Corosync Qdevice daemon.
Oct 10 21:43:01 vm20 systemd[1]: corosync-qdevice.service: Unit entered failed state.
Oct 10 21:43:01 vm20 systemd[1]: corosync-qdevice.service: Failed with result 'exit-code'.
root@vm20 ~ #
root@vm20 ~ # systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2018-10-10 21:45:01 CEST; 3s ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 24674 (corosync)
Tasks: 2 (limit: 4915)
Memory: 37.5M
CPU: 80ms
CGroup: /system.slice/corosync.service
└─24674 /usr/sbin/corosync -f
Oct 10 21:45:01 vm20 corosync[24674]: [SERV ] Service engine loaded: corosync watchdog service [7]
Oct 10 21:45:01 vm20 corosync[24674]: [QUORUM] Using quorum provider corosync_votequorum
Oct 10 21:45:01 vm20 corosync[24674]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Oct 10 21:45:01 vm20 corosync[24674]: [QB ] server name: votequorum
Oct 10 21:45:01 vm20 corosync[24674]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Oct 10 21:45:01 vm20 corosync[24674]: [QB ] server name: quorum
Oct 10 21:45:01 vm20 corosync[24674]: [TOTEM ] A new membership (10.40.20.20:8) was formed. Members joined: 1
Oct 10 21:45:01 vm20 corosync[24674]: [CPG ] downlist left_list: 0 received
Oct 10 21:45:01 vm20 corosync[24674]: [QUORUM] Members[1]: 1
Oct 10 21:45:01 vm20 corosync[24674]: [MAIN ] Completed service synchronization, ready to provide service.


So now restart corosync:

root@vm20 ~ # systemctl restart corosync
root@vm20 ~ # systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2018-10-10 21:45:01 CEST; 3s ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 24674 (corosync)
Tasks: 2 (limit: 4915)
Memory: 37.5M
CPU: 80ms
CGroup: /system.slice/corosync.service
└─24674 /usr/sbin/corosync -f
Oct 10 21:45:01 vm20 corosync[24674]: [SERV ] Service engine loaded: corosync watchdog service [7]
Oct 10 21:45:01 vm20 corosync[24674]: [QUORUM] Using quorum provider corosync_votequorum
Oct 10 21:45:01 vm20 corosync[24674]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Oct 10 21:45:01 vm20 corosync[24674]: [QB ] server name: votequorum
Oct 10 21:45:01 vm20 corosync[24674]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Oct 10 21:45:01 vm20 corosync[24674]: [QB ] server name: quorum
Oct 10 21:45:01 vm20 corosync[24674]: [TOTEM ] A new membership (10.40.20.20:8) was formed. Members joined: 1
Oct 10 21:45:01 vm20 corosync[24674]: [CPG ] downlist left_list: 0 received
Oct 10 21:45:01 vm20 corosync[24674]: [QUORUM] Members[1]: 1
Oct 10 21:45:01 vm20 corosync[24674]: [MAIN ] Completed service synchronization, ready to provide service.


and start corosync-qdevice

root@vm20 ~ # systemctl start corosync-qdevice
root@vm20 ~ # systemctl status corosync-qdevice
● corosync-qdevice.service - Corosync Qdevice daemon
Loaded: loaded (/lib/systemd/system/corosync-qdevice.service; disabled; vendor preset: enabled)
Active: active (running) since Wed 2018-10-10 21:45:21 CEST; 13s ago
Docs: man:corosync-qdevice
Main PID: 24716 (corosync-qdevic)
Tasks: 1 (limit: 4915)
Memory: 1.1M
CPU: 4ms
CGroup: /system.slice/corosync-qdevice.service
└─24716 /usr/sbin/corosync-qdevice -f
Oct 10 21:45:21 vm20 systemd[1]: Starting Corosync Qdevice daemon...
Oct 10 21:45:21 vm20 systemd[1]: Started Corosync Qdevice daemon.
root@vm20 ~ # corosync-quorumtool -s
Quorum information
------------------
Date: Wed Oct 10 21:46:42 2018
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 1
Ring ID: 1/8
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
1 1 A,V,NMW 10.40.20.20 (local)
0 1 Qdevice


pvecm looks similar

root@vm20 ~ # pvecm status
Quorum information
------------------
Date: Wed Oct 10 21:46:57 2018
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1/8
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 A,V,NMW 10.40.20.20 (local)
0x00000000 1 Qdevice
 
Now go to the next node, as vm92 should be added to the cluster

root@vm92 ~ # pvecm status
Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?
Cannot initialize CMAP service
root@vm92 ~ # time pvecm add vm92 -ring0_addr 10.40.20.92
Please enter superuser (root) password for 'vm92':
Password for root@vm92: *********


But here is the point!
pvecm help add tells, "add the current node to the cluster".
But why am I asked for the root password for the node which wants to be added?
Or should I run this command from the first node?
But this does not work as well:

root@vm20 ~ # pvecm add vm92 -ring0_addr 10.40.20.20
detected the following error(s):
* authentication key '/etc/corosync/authkey' already exists
* cluster config '/etc/pve/corosync.conf' already exists
* corosync is already running, is this node already in a cluster?!
Check if node may join a cluster failed!


So any help is appreciated!
 
You have to run the pvecm add command on the new node and not on a cluster member.
 
Hello Wolfgang,

I do not run it on a cluster member.

But I did not reference the vm20 (which created the cluster above).

pvecm add vm20 -ring0_addr 10.40.20.20 -use_ssh

Now I end up in

The authenticity of host 'vm20 (10.40.20.20)' can't be established.
ECDSA key fingerprint is SHA256:vzaqUBkjCXXXXXXfla+Uoj0T2ZxspL4wSmU.
Are you sure you want to continue connecting (yes/no)? yes
copy corosync auth key
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1539257263.sql.gz'
waiting for quorum...


Why is it waiting for quorum so long ?
And, I cannot start the corosync-qdevice now, as the corosync.conf which is now created, does not have the first cluster members configuration.

root@VM92 ~ # cat /etc/corosync/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: x0720
nodeid: 1
quorum_votes: 1
ring0_addr: 10.40.20.20
}
node {
name: x1892
nodeid: 2
quorum_votes: 1
ring0_addr: 10.40.20.92
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: pmxc
config_version: 2
interface {
bindnetaddr: 10.40.20.0
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}


The node vm20 (master), see above the corosync config contains the former posted qnetd config.

on the node vm92 which is currently waiting for the quorum and being added tells:

root@vm92 ~ # pvecm status
Quorum information
------------------
Date: Thu Oct 11 13:32:09 2018
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000002
Ring ID: 2/264
Quorate: No
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 1
Quorum: 2 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000002 1 10.40.20.92 (local)


Do you have any ideas how to fix this?
 
Oh, now I see the problem.
While using the pvecm add command, the vm20's corosync.conf is being replaced!
There is no information about the qnetd device and my former config any more.
Is this a bug, or what were we doing wrong?

Best regards
Thomas
 
I don't know what you doing but your output says you that you create the cluster and the add the node on the same host what is not correct.

Is this a bug, or what were we doing wrong?
This is not a bug qdevice is not implemented and must add at the end and not at start of the creation.
 
I tried following approach which did result in the weird message, as the vm20 is added to cluster, and not the new node vm92.
So while vm92 is performing the waiting for quorum, I added on the vm20 (master) the config stanza for the qnetd.

root@vm20 ~ # cat /etc/corosync/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: vm20
nodeid: 1
quorum_votes: 1
ring0_addr: 10.40.20.20
}
node {
name: vm92
nodeid: 2
quorum_votes: 1
ring0_addr: 10.40.20.92
}
}
quorum {
provider: corosync_votequorum
device {
model: net
votes: 1
net {
tls: off
host: 9.17.8.23
port: 5403
algorithm: ffsplit
}
}
}
totem {
cluster_name: pmxc
config_version: 3
interface {
bindnetaddr: 10.40.20.0
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}

I incremented the config_version on 1, and copied over this config to the new clustermember vm20 which is still waiting for quorum.

Then I restarted on both noded the corosync using
systemctl restart corosync

Then I started on the vm20 which was still waiting for quorum the qdevice.service and got progress.

root@vm92 ~ # pvecm add vm20 -ring0_addr 10.40.20.92 -use_ssh
The authenticity of host 'vm20 (10.40.20.20)' can't be established.
ECDSA key fingerprint is SHA256:vzaqUBkjCrj/jY5m48Nl0aCfla+Uoj0T2ZxspL4wSmU.
Are you sure you want to continue connecting (yes/no)? yes
copy corosync auth key
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1539257263.sql.gz'
waiting for quorum...OK
(re)generate node files
generate new node certificate
merge authorized SSH keys and known hosts
generated new node certificate, restart pveproxy and pvedaemon services
successfully added node 'vm20' to cluster.


But the last message is the desaster "successfully added node 'vm20' to cluster." as I expected the vm92 being added to cluster.
And pvecm status on the new node looks like:

root@vm92 ~ # pvecm status
Quorum information
------------------
Date: Thu Oct 11 13:54:04 2018
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000002
Ring ID: 2/1500
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000002 1 A,V,NMW 10.40.20.92 (local)
0x00000000 1 Qdevice


and on the master node vm20 the pvecm status does not have any information on the complete cluster.

root@vm20 ~ # pvecm status
Quorum information
------------------
Date: Thu Oct 11 13:55:02 2018
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1/1556
Quorate: No
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 1
Quorum: 2 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.40.20.20 (local)
 
I don't know what you doing but your output says you that you create the cluster and the add the node on the same host what is not correct.

And as you can see, the pvecm create was done on vm20, and the pvecm add was initiated on vm92.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!