PVE Cluster setup

DynFi User

Renowned Member
Apr 18, 2016
148
16
83
49
dynfi.com
Hello,

I have couple of questions related to the PVE Cluster.


First question :

I am trying to setup a Proxmox VE cluster.
I have three nodes, but two of them already have been configured and are now in production.

I would like to know if It is safe to try to configure the cluster in this particular scenario ?

==

Second question :

I have a freshly installed node and I would like to know how do I have to proceed to completely remove It from the cluster… and be able to access the node from the GUI ?

==

Third question :

I have the first node (very critical node).
It has been setup and used to work - but since I have tried adding a second node, It gives me the following info :

root@proxmaster:/etc/pve# pvecm status
Cannot initialize CMAP service


On the second node (brand new) :

root@proxmonster:/home/gregober# pvecm status
Quorum information
------------------
Date: Tue Jun 21 16:36:22 2016
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000002
Ring ID: 28
Quorate: No


Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 1
Quorum: 2 Activity blocked
Flags:


Membership information
----------------------
Nodeid Votes Name
0x00000002 1 192.168.210.10 (local)


The third node hasn't been added yet.

I am not sure where to go from here.
I would like :

  1. to set this cluster if possible and make sure not to destroy anything
  2. to get back to intial state (no cluster) safely with 0 side effect for VM in production
 
Hello,

I have couple of questions related to the PVE Cluster.


First question :

I am trying to setup a Proxmox VE cluster.
I have three nodes, but two of them already have been configured and are now in production.

I would like to know if It is safe to try to configure the cluster in this particular scenario ?
Hi,
if "two of them already have been configured" mean you have an running 2-Node Cluster then yes...
Your further questions looks. that your cluster isn't ready yet! To be sure, you should save your vm-config on both clusternodes.
Like
Code:
tar cvf /root/pve_save.tar /etc/pve
Second question :

I have a freshly installed node and I would like to know how do I have to proceed to completely remove It from the cluster… and be able to access the node from the GUI ?
the default way is an fresh install... but use the search function for declustering (if I remember right).
Third question :

I have the first node (very critical node).
It has been setup and used to work - but since I have tried adding a second node, It gives me the following info :

root@proxmaster:/etc/pve# pvecm status
Cannot initialize CMAP service


On the second node (brand new) :

root@proxmonster:/home/gregober# pvecm status
Quorum information
------------------
Date: Tue Jun 21 16:36:22 2016
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000002
Ring ID: 28
Quorate: No


Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 1
Quorum: 2 Activity blocked
Flags:


Membership information
----------------------
Nodeid Votes Name
0x00000002 1 192.168.210.10 (local)


The third node hasn't been added yet.

I am not sure where to go from here.
I would like :


  1. to set this cluster if possible and make sure not to destroy anything
  2. to get back to intial state (no cluster) safely with 0 side effect for VM in production
does multicast-ping work?
http://pve.proxmox.com/wiki/Troubleshooting_multicast,_quorum_and_cluster_issues

Udo
 
Hi,
if "two of them already have been configured" mean you have an running 2-Node Cluster then yes...
Your further questions looks. that your cluster isn't ready yet! To be sure, you should save your vm-config on both clusternodes.
Like
Code:
tar cvf /root/pve_save.tar /etc/pve

I have already made a backup of this section on the main node.
I will do It on the third node.

I don't have the second node up and running.

the default way is an fresh install... but use the search function for declustering (if I remember right).

Ok - will try to do that - but I would rather try to have cluster working.
Do you think It is safe to move on with this cluster config knowing that we really have critical nodes - what would you advise ?


Yes everything works - hosts are on the same subnet and we have an HP 5200 connecting all that - quite robust.
 
Do you think It is safe to move on with this cluster config knowing that we really have critical nodes - what would you advise ?
Hi Greg,
the content (/etc/pve) of the node, where you defined the cluster is safe. The node, which joined the cluster will replace the content below /etc/pve from the cluster. So if you have critical nodes but no working cluster it's not quite easy to say, but with an backup of /etc/pve you are on the safe side.

how look .members on both nodes?
Code:
cat /etc/pve/.members

Do you have restart pve-cluster on your first node?
Restarting corosync?

Udo
 
On the master node .members looks like this :

{
"nodename": "proxmaster",
"version": 4,
"cluster": { "name": "pmox-osnet", "version": 2, "nodes": 2, "quorate": 0 },
"nodelist": {
"proxmonster": { "id": 2, "online": 0},
"proxmaster": { "id": 1, "online": 1, "ip": "192.168.210.11"}
}
}
And the pve-cluster status is as follow :

root@proxmaster:/etc/pve# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: active (running) since Wed 2016-06-15 11:02:09 CEST; 1 weeks 0 days ago
Process: 3276 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 3205 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
Main PID: 3274 (pmxcfs)
CGroup: /system.slice/pve-cluster.service
└─3274 /usr/bin/pmxcfs

Jun 22 21:36:04 proxmaster pmxcfs[3274]: [dcdb] crit: cpg_initialize failed: 2
Jun 22 21:36:04 proxmaster pmxcfs[3274]: [status] crit: cpg_initialize failed: 2
Jun 22 21:36:10 proxmaster pmxcfs[3274]: [quorum] crit: quorum_initialize failed: 2
Jun 22 21:36:10 proxmaster pmxcfs[3274]: [confdb] crit: cmap_initialize failed: 2
Jun 22 21:36:10 proxmaster pmxcfs[3274]: [dcdb] crit: cpg_initialize failed: 2
Jun 22 21:36:10 proxmaster pmxcfs[3274]: [status] crit: cpg_initialize failed: 2
Jun 22 21:36:16 proxmaster pmxcfs[3274]: [quorum] crit: quorum_initialize failed: 2
Jun 22 21:36:16 proxmaster pmxcfs[3274]: [confdb] crit: cmap_initialize failed: 2
Jun 22 21:36:16 proxmaster pmxcfs[3274]: [dcdb] crit: cpg_initialize failed: 2
Jun 22 21:36:16 proxmaster pmxcfs[3274]: [status] crit: cpg_initialize failed: 2​


After a restart i got :

root@proxmaster:/etc/pve# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: active (running) since Wed 2016-06-22 21:37:59 CEST; 1min 14s ago
Process: 16866 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 16860 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
Main PID: 16863 (pmxcfs)
CGroup: /system.slice/pve-cluster.service
└─16863 /usr/bin/pmxcfs

Jun 22 21:37:58 proxmaster pmxcfs[16863]: [confdb] crit: can't initialize service
Jun 22 21:37:58 proxmaster pmxcfs[16863]: [dcdb] crit: cpg_initialize failed: 2
Jun 22 21:37:58 proxmaster pmxcfs[16863]: [dcdb] crit: can't initialize service
Jun 22 21:37:58 proxmaster pmxcfs[16863]: [status] crit: cpg_initialize failed: 2
Jun 22 21:37:58 proxmaster pmxcfs[16863]: [status] crit: can't initialize service
Jun 22 21:38:04 proxmaster pmxcfs[16863]: [status] notice: update cluster info (cluster name pmox-osnet, version = 2)
Jun 22 21:38:04 proxmaster pmxcfs[16863]: [dcdb] notice: members: 1/16863
Jun 22 21:38:04 proxmaster pmxcfs[16863]: [dcdb] notice: all data is up to date
Jun 22 21:38:04 proxmaster pmxcfs[16863]: [status] notice: members: 1/16863
Jun 22 21:38:04 proxmaster pmxcfs[16863]: [status] notice: all data is up to date​


=====================================

On the second node .members looks like this :

{
"nodename": "proxmonster",
"version": 0
}​


And the pve-cluster status :

root@proxmonster:/home/gregober# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: active (running) since Tue 2016-06-21 18:08:27 CEST; 1 day 4h ago
Process: 6394 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 6364 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
Main PID: 6392 (pmxcfs)
CGroup: /system.slice/pve-cluster.service
└─6392 /usr/bin/pmxcfs

Jun 22 14:24:10 proxmonster pmxcfs[6392]: [dcdb] crit: cpg_initialize failed: 2
Jun 22 14:24:10 proxmonster pmxcfs[6392]: [status] crit: cpg_initialize failed: 2
Jun 22 14:24:16 proxmonster pmxcfs[6392]: [quorum] crit: quorum_initialize failed: 2
Jun 22 14:24:16 proxmonster pmxcfs[6392]: [confdb] crit: cmap_initialize failed: 2
Jun 22 14:24:16 proxmonster pmxcfs[6392]: [dcdb] crit: cpg_initialize failed: 2
Jun 22 14:24:16 proxmonster pmxcfs[6392]: [status] crit: cpg_initialize failed: 2
Jun 22 14:24:22 proxmonster pmxcfs[6392]: [quorum] crit: quorum_initialize failed: 2
Jun 22 14:24:22 proxmonster pmxcfs[6392]: [confdb] crit: cmap_initialize failed: 2
Jun 22 14:24:22 proxmonster pmxcfs[6392]: [dcdb] crit: cpg_initialize failed: 2
Jun 22 14:24:22 proxmonster pmxcfs[6392]: [status] crit: cpg_initialize failed: 2​


What should I do from there ??
 
On the master node .members looks like this :

{
"nodename": "proxmaster",
"version": 4,
"cluster": { "name": "pmox-osnet", "version": 2, "nodes": 2, "quorate": 0 },
"nodelist": {
"proxmonster": { "id": 2, "online": 0},
"proxmaster": { "id": 1, "online": 1, "ip": "192.168.210.11"}
}
}
Hi,
proxmaster don't know the IP of proxmonster...
There must be an error during cluster joining?!

can you do an "ssh root@192.168.210.11" from proxmonster without entering an password?

If yes, the same with "ssh root@proxmaster" ?

What happens if you do on proxmonster a cluster join again?
Code:
root@proxmonster:~# pvecm add 192.168.210.11 -force

Udo
 
can you do an "ssh root@192.168.210.11" from proxmonster without entering an password?

No I can't do that (= from proxmaster to proxmonster).
The opposite is working though (= from proxmonster to proxmaster).

What happens if you do on proxmonster a cluster join again?
Code:
root@proxmonster:~# pvecm add 192.168.210.11 -force

root@proxmonster:/home/gregober# pvecm add 192.168.210.11 -force
can't create shared ssh key database '/etc/pve/priv/authorized_keys'
cluster not ready - no quorum?
unable to add node: command failed (ssh 192.168.210.11 -o BatchMode=yes pvecm addnode proxmonster --force 1)

Thank you very much for your help !
 
Hi,
jepp - proxmaster have no quorum... reach quorum with
Code:
root@proxmaster:~# pvecm expect 1

This will certainly help, but my problem is also clearly related to this bug :

https://forum.proxmox.com/threads/pve-ssl-key-not-installed-with-new-installation.27845/

/etc/pve/priv is not created by default on my second cluster node…
This is most probably the reason why we have these errors when trying to join the node into the cluster.

/etc/pve/priv does not exist and can't be created on the second cluster node.
I can't join the node / copy keys into this directory.

Do you know how to solve this by hand ?

What am I supposed to do to solve this issue and add my node to the cluster ?
 
Last edited:
Hi,
jepp - proxmaster have no quorum... reach quorum with
Code:
root@proxmaster:~# pvecm expect 1
Udo

root@proxmaster:/etc/pve/priv# pvecm expect 1
root@proxmaster:/etc/pve/priv# pvecm add 192.168.210.10 -force
root@192.168.210.10's password:
unable to copy ssh ID: bash: line 2: .ssh/authorized_keys: No such file or directory
No luck with pvecm add I still can't seem to copy anything… sort of lock down host for the time being.

??
 
This will certainly help, but my problem is also clearly related to this bug :

https://forum.proxmox.com/threads/pve-ssl-key-not-installed-with-new-installation.27845/

/etc/pve/priv is not created by default on my second cluster node…
This is most probably the reason why we have these errors when trying to join the node into the cluster.

/etc/pve/priv does not exist and can't be created on the second cluster node.
I can't join the node / copy keys into this directory.

Do you know how to solve this by hand ?

What am I supposed to do to solve this issue and add my node to the cluster ?
Hi,
this looks that your system isn't up to date?!

If you don't have an pve-subscription you must install the pve-nosubscription repro and do an apt-get update/dist-upgrade
Code:
cat /etc/apt/sources.list.d/pve-no-subscription.list
deb http://download.proxmox.com/debian jessie pve-no-subscription

How looks an "pveversion -v" ?

Udo
 
System has been fully re-installed.
This solved the issue… quite annoying though to be forced to re-install system to solve such issues.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!