Corosync/Cluster + CEPH Broken

Ashley · Apr 11, 2017

Hello,

I had an issue which caused the Proxmox Cluster to break due to an extended period of network issues on the cluster communications network.

I have brought all VM's online on a new Proxmox cluster, however the old broken cluster still has the CEPH Cluster attached to it, this is running fine however I am unable to add / remove or make any changes to the CEPH Cluster now.

Is it possible to convert a Proxmox CEPH Cluster to a standalone CEPH Setup? Or is it possible to fix the cluster so I can continue to operate CEPH on this.

Checking "service pve-cluster status" outputs:

service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: active (running) since Fri 2017-03-24 09:57:20 GMT; 2 weeks 3 days ago
Main PID: 3104 (pmxcfs)
CGroup: /system.slice/pve-cluster.service
└─3104 /usr/bin/pmxcfs

Mar 28 06:27:53 sn7 pmxcfs[3104]: [status] notice: cpg_send_message retried 2 times
Mar 28 06:27:54 sn7 pmxcfs[3104]: [status] notice: cpg_send_message retried 2 times
Mar 28 06:27:54 sn7 pmxcfs[3104]: [status] notice: cpg_send_message retried 2 times
Mar 28 06:27:54 sn7 pmxcfs[3104]: [status] notice: cpg_send_message retried 2 times
Mar 28 06:27:54 sn7 pmxcfs[3104]: [status] notice: cpg_send_message retried 2 times
Mar 28 06:27:54 sn7 pmxcfs[3104]: [status] notice: cpg_send_message retry 30
Mar 28 06:27:54 sn7 pmxcfs[3104]: [status] notice: cpg_send_message retried 2 times
Mar 28 06:27:55 sn7 pmxcfs[3104]: [status] notice: cpg_send_message retried 2 times
Mar 28 06:27:55 sn7 pmxcfs[3104]: [status] notice: cpg_send_message retried 2 times
Mar 28 06:27:55 sn7 pmxcfs[3104]: [status] notice: cpg_send_message retried 35 times

corosync service on every node is using 1 full CPU core, and pmxcfs using 2 full CPU core's.

Seems data tried to communicate back on the day of the issue and gave up since, I can still ping between all the remaining nodes within the cluster, each cluster FS seem's to have gone readonly, so CEPH is still able to read the ceph.conf file.

Any cluster commands just hang and don't output anything, I think I need to somehow take one copy of the cluster filesystem and replicate to all nodes / clear any pending messages that are stuck. Then remove the 4 node's that are now in the new cluster so that the cluster comes back online.

Any help would be appreciated.

fabian · Apr 11, 2017

try restarting corosync.service and pve-cluster.service

Ashley · Apr 11, 2017

fabian said:
try restarting corosync.service and pve-cluster.service

Should I go around every node and do this one after each other? Or a particular way?

fabian · Apr 11, 2017

Ashley said:
Should I go around every node and do this one after each other? Or a particular way?

one (hanging) node after the other.

Ashley · Apr 11, 2017

fabian said:
one (hanging) node after the other.

On the first node corosync restarted fine.

pve-cluster handed for a while and then failed to restart with the following output:

service pve-cluster restart
Job for pve-cluster.service failed. See 'systemctl status pve-cluster.service' and 'journalctl -xn' for details.
root@sn7:~# ^C
root@sn7:~# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: failed (Result: signal) since Tue 2017-04-11 10:06:32 BST; 7s ago
Process: 26868 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
Main PID: 26870 (code=killed, signal=KILL)

Apr 11 10:06:02 sn7 pmxcfs[26870]: [status] notice: queue not emtpy - resening 46621 messages
Apr 11 10:06:04 sn7 pmxcfs[26870]: [status] notice: members: 1/1675, 5/2149, 6/2057, 7/2168, ...2990
Apr 11 10:06:20 sn7 pmxcfs[26870]: [status] notice: members: 1/1675, 5/2149, 6/2057, 7/2168, ...2990
Apr 11 10:06:20 sn7 pmxcfs[26870]: [status] notice: queue not emtpy - resening 62268 messages
Apr 11 10:06:21 sn7 systemd[1]: pve-cluster.service start-post operation timed out. Stopping.
Apr 11 10:06:26 sn7 pmxcfs[26870]: [status] notice: members: 1/1675, 5/2149, 6/2057, 7/2168, ...2990
Apr 11 10:06:31 sn7 systemd[1]: pve-cluster.service stop-sigterm timed out. Killing.
Apr 11 10:06:31 sn7 systemd[1]: pve-cluster.service: main process exited, code=killed, status=9/KILL
Apr 11 10:06:32 sn7 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Apr 11 10:06:32 sn7 systemd[1]: Unit pve-cluster.service entered failed state.

I also noticed /etc/pve/ has now become unmounted on all nodes.

Doing the same on another node has also outputted this:

systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: failed (Result: signal) since Tue 2017-04-11 10:10:32 BST; 40s ago
Process: 11862 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
Main PID: 11864 (code=killed, signal=KILL)

Apr 11 10:09:53 sn1 pmxcfs[11864]: [status] notice: members: 1/1675, 4/11864, 5/2149, 6/2057, 7/2168, 8/2153, 10/27176, 11/2760, 12/2718, 14/2990
Apr 11 10:09:53 sn1 pmxcfs[11864]: [status] notice: queue not emtpy - resening 25695 messages
Apr 11 10:10:08 sn1 pmxcfs[11864]: [status] notice: members: 1/1675, 4/11864, 5/2149, 6/2057, 7/2168, 8/2153, 10/27176, 12/2718, 14/2990
Apr 11 10:10:17 sn1 pmxcfs[11864]: [status] notice: members: 1/1675, 4/11864, 5/2149, 6/2057, 7/2168, 8/2153, 10/27176, 11/2760, 12/2718, 14/2990
Apr 11 10:10:17 sn1 pmxcfs[11864]: [status] notice: queue not emtpy - resening 62350 messages
Apr 11 10:10:21 sn1 systemd[1]: pve-cluster.service start-post operation timed out. Stopping.
Apr 11 10:10:32 sn1 systemd[1]: pve-cluster.service stop-sigterm timed out. Killing.
Apr 11 10:10:32 sn1 systemd[1]: pve-cluster.service: main process exited, code=killed, status=9/KILL
Apr 11 10:10:32 sn1 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Apr 11 10:10:32 sn1 systemd[1]: Unit pve-cluster.service entered failed state.

fabian · Apr 11, 2017

is the pmxcfs process still running? if not, can you try starting pve-cluster.service again?

Ashley · Apr 11, 2017

fabian said:
is the pmxcfs process still running? if not, can you try starting pve-cluster.service again?

Is not still running, I have just run it again and get the following:

service pve-cluster restart
Job for pve-cluster.service failed. See 'systemctl status pve-cluster.service' and 'journalctl -xn' for details.
root@sn7:/# ^C
root@sn7:/# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: failed (Result: signal) since Tue 2017-04-11 10:19:18 BST; 49s ago
Process: 27582 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
Main PID: 27585 (code=killed, signal=KILL)

Apr 11 10:19:01 sn7 pmxcfs[27585]: [dcdb] notice: members: 1/1675, 6/2057, 7/2168, 8/2153, 10/27585, 11/2760, 12/2718, 14/2990
Apr 11 10:19:01 sn7 pmxcfs[27585]: [status] notice: members: 1/1675, 6/2057, 7/2168, 8/2153, 10/27585, 11/2760, 12/2718, 14/2990
Apr 11 10:19:01 sn7 pmxcfs[27585]: [status] notice: members: 1/1675, 6/2057, 7/2168, 8/2153, 10/27585, 12/2718, 14/2990
Apr 11 10:19:01 sn7 pmxcfs[27585]: [status] notice: members: 1/1675, 6/2057, 7/2168, 8/2153, 10/27585, 11/2760, 12/2718, 14/2990
Apr 11 10:19:01 sn7 pmxcfs[27585]: [status] notice: queue not emtpy - resening 62497 messages
Apr 11 10:19:08 sn7 systemd[1]: pve-cluster.service start-post operation timed out. Stopping.
Apr 11 10:19:18 sn7 systemd[1]: pve-cluster.service stop-sigterm timed out. Killing.
Apr 11 10:19:18 sn7 systemd[1]: pve-cluster.service: main process exited, code=killed, status=9/KILL
Apr 11 10:19:18 sn7 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Apr 11 10:19:18 sn7 systemd[1]: Unit pve-cluster.service entered failed state.

It's like there is far too messages pending to send and it times out during the start, is there a way I could clear all pending messages, and then copy the DB from one node into the running cluster filesystem.

The end goal I need is the GUI + global ceph.conf accessible for the CEPH side, all the VM config's / data is no longer required, as I was already able to extract this data.

*Edit*

I was just able to restart corosync on another node and then was able to run the cluster cli commands and service pve-cluster status shows :

root@sn3:/# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: active (running) since Sun 2017-03-26 06:59:41 BST; 2 weeks 2 days ago
Main PID: 2149 (pmxcfs)
CGroup: /system.slice/pve-cluster.service
└─2149 /usr/bin/pmxcfs

Apr 11 10:21:45 sn3 pmxcfs[2149]: [dcdb] notice: start cluster connection
Apr 11 10:21:45 sn3 pmxcfs[2149]: [status] notice: node has quorum
Apr 11 10:21:45 sn3 pmxcfs[2149]: [dcdb] notice: members: 1/1675, 5/2149, 6/2057, 7/2168, 8/2153, 11/2760, 12/2718, 14/2990
Apr 11 10:21:45 sn3 pmxcfs[2149]: [dcdb] notice: starting data syncronisation
Apr 11 10:21:45 sn3 pmxcfs[2149]: [status] notice: members: 1/1675, 5/2149, 6/2057, 7/2168, 8/2153, 11/2760, 12/2718, 14/2990
Apr 11 10:21:45 sn3 pmxcfs[2149]: [status] notice: starting data syncronisation
Apr 11 10:22:14 sn3 pmxcfs[2149]: [dcdb] notice: members: 1/1675, 5/2149, 6/2057, 7/2168, 8/2153, 10/27844, 11/2760, 12/2718, 14/2990
Apr 11 10:22:14 sn3 pmxcfs[2149]: [dcdb] notice: queue not emtpy - resening 3 messages
Apr 11 10:22:14 sn3 pmxcfs[2149]: [status] notice: members: 1/1675, 5/2149, 6/2057, 7/2168, 8/2153, 10/27844, 11/2760, 12/2718, 14/2990
Apr 11 10:22:14 sn3 pmxcfs[2149]: [status] notice: queue not emtpy - resening 62586 messages

Ashley · Apr 11, 2017

fabian said:
is the pmxcfs process still running? if not, can you try starting pve-cluster.service again?

So corosync is running fine now on every node, below is an example output.

service corosync status
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
Active: active (running) since Tue 2017-04-11 10:57:20 BST; 12min ago
Process: 29932 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
Process: 29944 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
Main PID: 29952 (corosync)
CGroup: /system.slice/corosync.service
└─29952 corosync

Apr 11 10:57:20 sn7 corosync[29952]: [QUORUM] Members[10]: 1 4 5 6 7 8 10 11 12 14
Apr 11 10:57:20 sn7 corosync[29952]: [MAIN ] Completed service synchronization, ready to provide service.
Apr 11 10:57:20 sn7 corosync[29944]: Starting Corosync Cluster Engine (corosync): [ OK ]
Apr 11 10:57:20 sn7 systemd[1]: Started Corosync Cluster Engine.
Apr 11 11:05:41 sn7 corosync[29952]: [TOTEM ] A new membership (172.16.1.1:5320) was formed. Members left: 8
Apr 11 11:05:41 sn7 corosync[29952]: [QUORUM] Members[9]: 1 4 5 6 7 10 11 12 14
Apr 11 11:05:41 sn7 corosync[29952]: [MAIN ] Completed service synchronization, ready to provide service.
Apr 11 11:05:42 sn7 corosync[29952]: [TOTEM ] A new membership (172.16.1.1:5324) was formed. Members joined: 8
Apr 11 11:05:42 sn7 corosync[29952]: [QUORUM] Members[10]: 1 4 5 6 7 8 10 11 12 14
Apr 11 11:05:42 sn7 corosync[29952]: [MAIN ] Completed service synchronization, ready to provide service.

However have a couple of the nodes that won't start pve-cluster still / mount /etc/pve they have output's like this:

service pve-cluster restart
Job for pve-cluster.service failed. See 'systemctl status pve-cluster.service' and 'journalctl -xn' for details.
root@sn6:/# ^C
root@sn6:/# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: failed (Result: signal) since Tue 2017-04-11 11:10:41 BST; 30s ago
Process: 25694 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
Main PID: 25711 (code=killed, signal=KILL)

Apr 11 11:09:00 sn6 pmxcfs[25711]: [status] notice: node has quorum
Apr 11 11:09:00 sn6 pmxcfs[25711]: [dcdb] notice: members: 1/1675, 6/2057, 7/2168, 8/25711, 11/2760, 12/2718, 14/2990
Apr 11 11:09:00 sn6 pmxcfs[25711]: [dcdb] notice: starting data syncronisation
Apr 11 11:09:00 sn6 pmxcfs[25711]: [status] notice: members: 1/1675, 6/2057, 7/2168, 8/25711, 11/2760, 12/2718, 14/2990
Apr 11 11:09:00 sn6 pmxcfs[25711]: [status] notice: starting data syncronisation
Apr 11 11:10:30 sn6 systemd[1]: pve-cluster.service start-post operation timed out. Stopping.
Apr 11 11:10:41 sn6 systemd[1]: pve-cluster.service stop-sigterm timed out. Killing.
Apr 11 11:10:41 sn6 systemd[1]: pve-cluster.service: main process exited, code=killed, status=9/KILL
Apr 11 11:10:41 sn6 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Apr 11 11:10:41 sn6 systemd[1]: Unit pve-cluster.service entered failed state.

Running pvecm status on a node that has the cluster filesystem mounted shows as expected all the CEPH nodes online:

pvecm status
Quorum information
------------------
Date: Tue Apr 11 11:11:50 2017
Quorum provider: corosync_votequorum
Nodes: 10
Node ID: 0x00000007
Ring ID: 1/5324
Quorate: Yes

Votequorum information
----------------------
Expected votes: 14
Highest expected: 14
Total votes: 10
Quorum: 8
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 172.16.1.1
0x00000004 1 172.16.1.101
0x00000005 1 172.16.1.103
0x00000006 1 172.16.1.104
0x00000007 1 172.16.1.105 (local)
0x00000008 1 172.16.1.106
0x0000000a 1 172.16.1.107
0x0000000b 1 172.16.1.108
0x0000000c 1 172.16.1.109
0x0000000e 1 172.16.1.110

However still unable to start pveproxy on any of the nodes.

Ashley · Apr 11, 2017

fabian said:
is the pmxcfs process still running? if not, can you try starting pve-cluster.service again?

In a better situation that I was at the start, nodes that have pve-cluster started accept cluster cli commands and can list all the nodes communicating via corosync.

However nodes that don't have pve-cluster started no matter how many restart commands after a period of the start command "hanging" the following output is shown:

Is there anything particular it is doing at data synchronization that could be causing what looks to be a timeout after 30 seconds, any further log's or information, or anything I can do to get these pve-cluster all online and then hopefully get the pveproxy online and fully back up and running.

Not sure if I can mount /etc/pve on the servers that the service won't start and copy the files from a working node and then see if pve-cluster will start due to all files being in sync already?

service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: failed (Result: signal) since Tue 2017-04-11 13:44:59 BST; 2min 37s ago
Process: 11851 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
Main PID: 11853 (code=killed, signal=KILL)

Apr 11 13:43:19 sn7 pmxcfs[11853]: [status] notice: node has quorum
Apr 11 13:43:19 sn7 pmxcfs[11853]: [dcdb] notice: members: 1/1675, 6/2057, 7/2168, 10/11853, 11/2760, 12/2718, 14/2990
Apr 11 13:43:19 sn7 pmxcfs[11853]: [dcdb] notice: starting data syncronisation
Apr 11 13:43:19 sn7 pmxcfs[11853]: [status] notice: members: 1/1675, 6/2057, 7/2168, 10/11853, 11/2760, 12/2718, 14/2990
Apr 11 13:43:19 sn7 pmxcfs[11853]: [status] notice: starting data syncronisation
Apr 11 13:44:49 sn7 systemd[1]: pve-cluster.service start-post operation timed out. Stopping.
Apr 11 13:44:59 sn7 systemd[1]: pve-cluster.service stop-sigterm timed out. Killing.
Apr 11 13:44:59 sn7 systemd[1]: pve-cluster.service: main process exited, code=killed, status=9/KILL
Apr 11 13:44:59 sn7 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Apr 11 13:44:59 sn7 systemd[1]: Unit pve-cluster.service entered failed state.

fabian · Apr 11, 2017

what does "ps faxl | grep pmxcfs" output? a restart will first attempt to stop the old process, which seems to fail..

Ashley · Apr 11, 2017

fabian said:
what does "ps faxl | grep pmxcfs" output? a restart will first attempt to stop the old process, which seems to fail..

Just the grep command it self:

ps faxl | grep pmxcfs
0 0 12981 11801 20 0 12728 1852 pipe_w S+ pts/0 0:00 \_ grep pmxcfs

If I am reading the status output right and from what df -h shows while the start command is hanging it does start and mount /etc/pve at "notice: starting data syncronisation" and then times out 30 seconds later @ "pve-cluster.service start-post operation timed out. Stopping."

I can mount /etc/pve with no issues locally using pmxcfs -l

Thanks

fabian · Apr 11, 2017

sorry, misread your output - you are correct.

you can start pmxcfs manually in the foreground with

Code:

pmxcfs -f

then wait for it to be synced up, kill it (ctrl+c), and start the pve-cluster service again with

Code:

systemctl start pve-cluster

Ashley · Apr 11, 2017

fabian said:
sorry, misread your output - you are correct.

you can start pmxcfs manually in the foreground with

Code:

pmxcfs -f

then wait for it to be synced up, kill it (ctrl+c), and start the pve-cluster service again with

Code:

systemctl start pve-cluster

Thanks! will try that, will it confirm once sync is completed, as there is a couple of servers with /etc/pve down will it sync from few that have pve-cluster running?

Last output currently is "[libqb] info: server name: pve2"

Ashley · Apr 11, 2017

fabian said:
sorry, misread your output - you are correct.

you can start pmxcfs manually in the foreground with

Code:

pmxcfs -f

then wait for it to be synced up, kill it (ctrl+c), and start the pve-cluster service again with

Code:

systemctl start pve-cluster

Let it to run for a while and didn't do any further output, come out and tried to start the cluster and have the same error message.

I have restarted the command again, only thing I am wondering is if it's trying to cross sync with some of the other servers where there /etc/pve/ is offline, causing it to hang.

*Edit*

I have noticed however after running the command pveproxy has restarted it self on this node and is now online and accessible for the GUI for this node only.

Ashley · Apr 11, 2017

fabian said:
sorry, misread your output - you are correct.

you can start pmxcfs manually in the foreground with

Code:

pmxcfs -f

then wait for it to be synced up, kill it (ctrl+c), and start the pve-cluster service again with

Code:

systemctl start pve-cluster

Fixed!

What I had to do was kill/stop the service on every node, and run pmxcfs -f on each node, then left everything for a few minutes to sync and clear the backlog.

After then ctrl + c the pmxfs the service then starts fine, seems that there was too big of a backlog to catch while doing the restart across multiple nodes.

Thanks for your help and guidance in the right direction

*One question I actually have a subscription how is it linked to forum user automatically by matching email?

fabian · Apr 12, 2017

Ashley said:
Fixed!

..

*One question I actually have a subscription how is it linked to forum user automatically by matching email?

See https://forum.proxmox.com/threads/new-forum-software.25169/

Search

Search

Corosync/Cluster + CEPH Broken

Ashley

Member

fabian

Proxmox Staff Member

Ashley

Member

fabian

Proxmox Staff Member

Ashley

Member

fabian

Proxmox Staff Member

Ashley

Member

Ashley

Member

Ashley

Member

fabian

Proxmox Staff Member

Ashley

Member

fabian

Proxmox Staff Member

Ashley

Member

Ashley

Member

Ashley

Member

fabian

Proxmox Staff Member

We value your privacy