Cluster Issues

okieunix1957

Member
Feb 11, 2020
71
5
8
67
I followed your site on removing the node, Now I have a cluster issue.

host07# pvecm nodes
Cannot initialize CMAP service
host07:/etc/pve#



host08:~# pvecm status
Quorum information
------------------
Date: Thu Feb 20 15:58:18 2020
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000002
Ring ID: 2/3704
Quorate: Yes

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 4
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 xxx.xx.xx.19 (local)
0x00000003 1 xxx.xx.xx.20
0x00000005 1 xxx.xx.xx.22
0x00000006 1 xxx.xx.xx..23
host08:~#

As you can see I am supposed to have .19 and 21 server. I only wanted to remove one node, now I have 2 of them gone.

host07:/etc/pve# pvecm nodes
Cannot initialize CMAP service
host07:/etc/pve#

So do I correct this mess of mine.
 
I followed your site on removing the node, Now I have a cluster issue.

host07# pvecm nodes
Cannot initialize CMAP service
host07:/etc/pve#



host08:~# pvecm status
Quorum information
------------------
Date: Thu Feb 20 15:58:18 2020
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000002
Ring ID: 2/3704
Quorate: Yes

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 4
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 xxx.xx.xx.19 (local)
0x00000003 1 xxx.xx.xx.20
0x00000005 1 xxx.xx.xx.22
0x00000006 1 xxx.xx.xx..23
host08:~#

As you can see I am supposed to have .19 and 21 server. I only wanted to remove one node, now I have 2 of them gone.

host07:/etc/pve# pvecm nodes
Cannot initialize CMAP service
host07:/etc/pve#

So do I correct this mess of mine.
Here is the status or corosync

# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Thu 2020-02-20 15:02:04 PST; 43min ago
Condition: start condition failed at Thu 2020-02-20 15:45:52 PST; 2s ago
└─ ConditionPathExists=/etc/corosync/corosync.conf was not met
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 10928 (code=exited, status=0/SUCCESS)
CPU: 1d 12h 25min 7.657s

Feb 20 15:02:03 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync cluster quorum service
Feb 20 15:02:04 host07 corosync[10928]: notice [SERV ] Service engine unloaded: corosync profile loading
Feb 20 15:02:04 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync profile loading servic
Feb 20 15:02:04 host07 corosync[10928]: notice [SERV ] Service engine unloaded: corosync resource monito
Feb 20 15:02:04 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync resource monitoring se
Feb 20 15:02:04 host07 corosync[10928]: notice [SERV ] Service engine unloaded: corosync watchdog servic
Feb 20 15:02:04 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync watchdog service
Feb 20 15:02:04 host07 corosync[10928]: notice [MAIN ] Corosync Cluster Engine exiting normally
Feb 20 15:02:04 host07 corosync[10928]: [MAIN ] Corosync Cluster Engine exiting normally
Feb 20 15:02:04 host07 systemd[1]: Stopped Corosync Cluster Engine.
 
Here is the status or corosync

# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Thu 2020-02-20 15:02:04 PST; 43min ago
Condition: start condition failed at Thu 2020-02-20 15:45:52 PST; 2s ago
└─ ConditionPathExists=/etc/corosync/corosync.conf was not met
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 10928 (code=exited, status=0/SUCCESS)
CPU: 1d 12h 25min 7.657s

Feb 20 15:02:03 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync cluster quorum service
Feb 20 15:02:04 host07 corosync[10928]: notice [SERV ] Service engine unloaded: corosync profile loading
Feb 20 15:02:04 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync profile loading servic
Feb 20 15:02:04 host07 corosync[10928]: notice [SERV ] Service engine unloaded: corosync resource monito
Feb 20 15:02:04 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync resource monitoring se
Feb 20 15:02:04 host07 corosync[10928]: notice [SERV ] Service engine unloaded: corosync watchdog servic
Feb 20 15:02:04 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync watchdog service
Feb 20 15:02:04 host07 corosync[10928]: notice [MAIN ] Corosync Cluster Engine exiting normally
Feb 20 15:02:04 host07 corosync[10928]: [MAIN ] Corosync Cluster Engine exiting normally
Feb 20 15:02:04 host07 systemd[1]: Stopped Corosync Cluster Engine.

I thought this was real bad but in the end what I did to fix this:
1. cd /etc/corosync - saw that nothing was there.
2. ftp the files and directory from another node.
3. then restart corosync and that worked.

Whew!

host07:/etc/pve# pvecm status
Quorum information
------------------
Date: Thu Feb 20 16:34:36 2020
Quorum provider: corosync_votequorum
Nodes: 5
Node ID: 0x00000001
Ring ID: 1/3708
Quorate: Yes

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 5
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 x.x.x.18 (local)
0x00000002 1 x.x.x.19
0x00000003 1 x.x.x.20
0x00000005 1 x.x.x.22
0x00000006 1 x.x.x.23

Now to try and add .21 back into the cluster.
 
But I now how 2 issues.

1. i no longer can get the join cluster info
2. the node I want to remove is still present.

This is supposed to be a 6 node cluster. What see in the pvecm nodes lis the correct nodes minus 1 but
The cluster won't let me get the copy of the cluster info.

Wish you guys would work on making this a lot easier. I know it complex. But this cluster can get messy if not done right. But i need to be able to add the nodes back in WITHOUT REBUILDING THEM. I don't have the luxury or the ROOM to move things around. This is a critical production servers.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!