Cluster Issues

okieunix1957

Member
Feb 11, 2020
71
5
8
68
I followed your site on removing the node, Now I have a cluster issue.

host07# pvecm nodes
Cannot initialize CMAP service
host07:/etc/pve#



host08:~# pvecm status
Quorum information
------------------
Date: Thu Feb 20 15:58:18 2020
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000002
Ring ID: 2/3704
Quorate: Yes

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 4
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 xxx.xx.xx.19 (local)
0x00000003 1 xxx.xx.xx.20
0x00000005 1 xxx.xx.xx.22
0x00000006 1 xxx.xx.xx..23
host08:~#

As you can see I am supposed to have .19 and 21 server. I only wanted to remove one node, now I have 2 of them gone.

host07:/etc/pve# pvecm nodes
Cannot initialize CMAP service
host07:/etc/pve#

So do I correct this mess of mine.
 
I followed your site on removing the node, Now I have a cluster issue.

host07# pvecm nodes
Cannot initialize CMAP service
host07:/etc/pve#



host08:~# pvecm status
Quorum information
------------------
Date: Thu Feb 20 15:58:18 2020
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000002
Ring ID: 2/3704
Quorate: Yes

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 4
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 xxx.xx.xx.19 (local)
0x00000003 1 xxx.xx.xx.20
0x00000005 1 xxx.xx.xx.22
0x00000006 1 xxx.xx.xx..23
host08:~#

As you can see I am supposed to have .19 and 21 server. I only wanted to remove one node, now I have 2 of them gone.

host07:/etc/pve# pvecm nodes
Cannot initialize CMAP service
host07:/etc/pve#

So do I correct this mess of mine.
Here is the status or corosync

# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Thu 2020-02-20 15:02:04 PST; 43min ago
Condition: start condition failed at Thu 2020-02-20 15:45:52 PST; 2s ago
└─ ConditionPathExists=/etc/corosync/corosync.conf was not met
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 10928 (code=exited, status=0/SUCCESS)
CPU: 1d 12h 25min 7.657s

Feb 20 15:02:03 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync cluster quorum service
Feb 20 15:02:04 host07 corosync[10928]: notice [SERV ] Service engine unloaded: corosync profile loading
Feb 20 15:02:04 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync profile loading servic
Feb 20 15:02:04 host07 corosync[10928]: notice [SERV ] Service engine unloaded: corosync resource monito
Feb 20 15:02:04 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync resource monitoring se
Feb 20 15:02:04 host07 corosync[10928]: notice [SERV ] Service engine unloaded: corosync watchdog servic
Feb 20 15:02:04 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync watchdog service
Feb 20 15:02:04 host07 corosync[10928]: notice [MAIN ] Corosync Cluster Engine exiting normally
Feb 20 15:02:04 host07 corosync[10928]: [MAIN ] Corosync Cluster Engine exiting normally
Feb 20 15:02:04 host07 systemd[1]: Stopped Corosync Cluster Engine.
 
Here is the status or corosync

# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Thu 2020-02-20 15:02:04 PST; 43min ago
Condition: start condition failed at Thu 2020-02-20 15:45:52 PST; 2s ago
└─ ConditionPathExists=/etc/corosync/corosync.conf was not met
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 10928 (code=exited, status=0/SUCCESS)
CPU: 1d 12h 25min 7.657s

Feb 20 15:02:03 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync cluster quorum service
Feb 20 15:02:04 host07 corosync[10928]: notice [SERV ] Service engine unloaded: corosync profile loading
Feb 20 15:02:04 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync profile loading servic
Feb 20 15:02:04 host07 corosync[10928]: notice [SERV ] Service engine unloaded: corosync resource monito
Feb 20 15:02:04 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync resource monitoring se
Feb 20 15:02:04 host07 corosync[10928]: notice [SERV ] Service engine unloaded: corosync watchdog servic
Feb 20 15:02:04 host07 corosync[10928]: [SERV ] Service engine unloaded: corosync watchdog service
Feb 20 15:02:04 host07 corosync[10928]: notice [MAIN ] Corosync Cluster Engine exiting normally
Feb 20 15:02:04 host07 corosync[10928]: [MAIN ] Corosync Cluster Engine exiting normally
Feb 20 15:02:04 host07 systemd[1]: Stopped Corosync Cluster Engine.

I thought this was real bad but in the end what I did to fix this:
1. cd /etc/corosync - saw that nothing was there.
2. ftp the files and directory from another node.
3. then restart corosync and that worked.

Whew!

host07:/etc/pve# pvecm status
Quorum information
------------------
Date: Thu Feb 20 16:34:36 2020
Quorum provider: corosync_votequorum
Nodes: 5
Node ID: 0x00000001
Ring ID: 1/3708
Quorate: Yes

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 5
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 x.x.x.18 (local)
0x00000002 1 x.x.x.19
0x00000003 1 x.x.x.20
0x00000005 1 x.x.x.22
0x00000006 1 x.x.x.23

Now to try and add .21 back into the cluster.
 
But I now how 2 issues.

1. i no longer can get the join cluster info
2. the node I want to remove is still present.

This is supposed to be a 6 node cluster. What see in the pvecm nodes lis the correct nodes minus 1 but
The cluster won't let me get the copy of the cluster info.

Wish you guys would work on making this a lot easier. I know it complex. But this cluster can get messy if not done right. But i need to be able to add the nodes back in WITHOUT REBUILDING THEM. I don't have the luxury or the ROOM to move things around. This is a critical production servers.