Corosync problems.

phunyguy

New Member
Aug 8, 2019
1
0
1
43
Greetings, I come here after many days of bashing my head into a wall with one of my new Proxmox 6 nodes. I am just a home user, so please go easy on my less-than-conventional setup when I explain it.

I have 3 Proxmox 6 nodes, each one running GlusterFS as well, for shared storage to be able to migrate VMs around. This all works great, even connected with just one interface for all of these nodes. Like I said, I am a home user, and only want to run a few VMs on this, so I am not all that worried about performance.

There is one wonky issue I am struggling with. Sometimes, be it once every few days, or 3 times in an hour, the second node in the group goes out to lunch as far as corosync is concerned. It still responds on the network, but corosync isn't able to communicate with the other two nodes, anymore. When that happens, I am not sure if it is the second node doing it, but node 1 and node 3 *both* reboot almost immediately, I assume as some form of protection for the cluster. It takes all the VMs with it. Sometimes corosync is still out to lunch when the other two nodes come back, and they refuse to start the VMs until I reboot the second node. I should also mention, that pve2 (the second node) never hosts any VMs, it's just there for quorum purposes, as it's older hardware with no hardware VM extensions, and it is able to support the glusterfs shared storage perfectly fine.

Here is a syslog snippet from when this all goes south on pve2:

Aug 7 09:27:09 pve2 corosync[7344]: [KNET ] link: host: 3 link: 0 is down
Aug 7 09:27:09 pve2 corosync[7344]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 0)
Aug 7 09:27:09 pve2 corosync[7344]: [KNET ] host: host: 3 has no active links
Aug 7 09:27:10 pve2 corosync[7344]: [TOTEM ] Token has not been received in 61 ms
Aug 7 09:27:12 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1243 ms
Aug 7 09:27:13 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2894 ms
Aug 7 09:27:14 pve2 corosync[7344]: [TOTEM ] A new membership (2:88188) was formed. Members left: 1 3
Aug 7 09:27:14 pve2 corosync[7344]: [TOTEM ] Failed to receive the leave message. failed: 1 3
Aug 7 09:27:14 pve2 corosync[7344]: [CPG ] downlist left_list: 2 received
Aug 7 09:27:14 pve2 corosync[7344]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 7 09:27:14 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:14 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:14 pve2 pmxcfs[7346]: [status] notice: node lost quorum
Aug 7 09:27:14 pve2 pmxcfs[7346]: [dcdb] notice: members: 2/7346
Aug 7 09:27:14 pve2 pmxcfs[7346]: [status] notice: members: 2/7346
Aug 7 09:27:14 pve2 pmxcfs[7346]: [dcdb] crit: received write while not quorate - trigger resync
Aug 7 09:27:14 pve2 pmxcfs[7346]: [dcdb] crit: leaving CPG group
Aug 7 09:27:15 pve2 pmxcfs[7346]: [dcdb] notice: start cluster connection
Aug 7 09:27:15 pve2 pmxcfs[7346]: [dcdb] crit: cpg_join failed: 14
Aug 7 09:27:15 pve2 pve-ha-crm[1614]: status change slave => wait_for_quorum
Aug 7 09:27:15 pve2 pve-ha-lrm[1860]: unable to write lrm status file - unable to open file '/etc/pve/nodes/pve2/lrm_status.tmp.1860' - Permission denied
Aug 7 09:27:15 pve2 pmxcfs[7346]: [dcdb] crit: can't initialize service
Aug 7 09:27:15 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1289 ms
Aug 7 09:27:17 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2940 ms
Aug 7 09:27:17 pve2 corosync[7344]: [TOTEM ] A new membership (2:88200) was formed. Members
Aug 7 09:27:17 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:17 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:17 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:19 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1290 ms
Aug 7 09:27:20 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2941 ms
Aug 7 09:27:21 pve2 corosync[7344]: [TOTEM ] A new membership (2:88212) was formed. Members
Aug 7 09:27:21 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:21 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:21 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:22 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1246 ms
Aug 7 09:27:24 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2897 ms
Aug 7 09:27:24 pve2 corosync[7344]: [TOTEM ] A new membership (2:88224) was formed. Members
Aug 7 09:27:24 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:24 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:24 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:24 pve2 pmxcfs[7346]: [dcdb] notice: members: 2/7346
Aug 7 09:27:24 pve2 pmxcfs[7346]: [dcdb] notice: all data is up to date
Aug 7 09:27:25 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1291 ms
Aug 7 09:27:27 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2942 ms
Aug 7 09:27:27 pve2 corosync[7344]: [TOTEM ] A new membership (2:88236) was formed. Members
Aug 7 09:27:27 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:27 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:27 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:29 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1290 ms
Aug 7 09:27:30 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2941 ms
Aug 7 09:27:31 pve2 corosync[7344]: [TOTEM ] A new membership (2:88248) was formed. Members
Aug 7 09:27:31 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:31 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:31 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:32 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1290 ms
Aug 7 09:27:34 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2941 ms
Aug 7 09:27:34 pve2 corosync[7344]: [TOTEM ] A new membership (2:88260) was formed. Members
Aug 7 09:27:34 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:34 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:34 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:35 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1290 ms
Aug 7 09:27:37 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2941 ms
Aug 7 09:27:38 pve2 corosync[7344]: [TOTEM ] A new membership (2:88272) was formed. Members
Aug 7 09:27:38 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:38 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:38 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:39 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1291 ms
Aug 7 09:27:41 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2942 ms
Aug 7 09:27:41 pve2 corosync[7344]: [TOTEM ] A new membership (2:88284) was formed. Members
Aug 7 09:27:41 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:41 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:41 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:42 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1290 ms
Aug 7 09:27:44 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2941 ms
Aug 7 09:27:44 pve2 corosync[7344]: [TOTEM ] A new membership (2:88296) was formed. Members
Aug 7 09:27:44 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:44 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:44 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:46 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1290 ms
Aug 7 09:27:47 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2941 ms
Aug 7 09:27:48 pve2 corosync[7344]: [TOTEM ] A new membership (2:88308) was formed. Members
Aug 7 09:27:48 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:48 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:48 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:49 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1290 ms
Aug 7 09:27:51 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2941 ms
Aug 7 09:27:51 pve2 corosync[7344]: [TOTEM ] A new membership (2:88320) was formed. Members
Aug 7 09:27:51 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:51 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:51 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:52 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1290 ms
Aug 7 09:27:54 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2941 ms
Aug 7 09:27:54 pve2 corosync[7344]: [TOTEM ] A new membership (2:88332) was formed. Members
Aug 7 09:27:54 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:54 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:54 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:56 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1290 ms
Aug 7 09:27:57 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2941 ms
Aug 7 09:27:58 pve2 corosync[7344]: [TOTEM ] A new membership (2:88344) was formed. Members
Aug 7 09:27:58 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:27:58 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:27:58 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:27:59 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1291 ms
Aug 7 09:28:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Aug 7 09:28:01 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2942 ms
Aug 7 09:28:01 pve2 pvesr[30856]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 7 09:28:01 pve2 corosync[7344]: [TOTEM ] A new membership (2:88356) was formed. Members
Aug 7 09:28:01 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:28:01 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:28:01 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:28:02 pve2 pvesr[30856]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 7 09:28:03 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1289 ms
Aug 7 09:28:03 pve2 pvesr[30856]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 7 09:28:04 pve2 pvesr[30856]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 7 09:28:04 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2941 ms
Aug 7 09:28:05 pve2 corosync[7344]: [TOTEM ] A new membership (2:88368) was formed. Members
Aug 7 09:28:05 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:28:05 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:28:05 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:28:05 pve2 pvesr[30856]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 7 09:28:06 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1290 ms
Aug 7 09:28:06 pve2 pvesr[30856]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 7 09:28:07 pve2 pvesr[30856]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 7 09:28:08 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2941 ms
Aug 7 09:28:08 pve2 pvesr[30856]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 7 09:28:08 pve2 corosync[7344]: [TOTEM ] A new membership (2:88380) was formed. Members
Aug 7 09:28:08 pve2 corosync[7344]: [CPG ] downlist left_list: 0 received
Aug 7 09:28:08 pve2 corosync[7344]: [QUORUM] Members[1]: 2
Aug 7 09:28:08 pve2 corosync[7344]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 7 09:28:09 pve2 pvesr[30856]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 7 09:28:09 pve2 corosync[7344]: [TOTEM ] Token has not been received in 1290 ms
Aug 7 09:28:10 pve2 pvesr[30856]: error with cfs lock 'file-replication_cfg': no quorum!
Aug 7 09:28:10 pve2 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Aug 7 09:28:10 pve2 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Aug 7 09:28:10 pve2 systemd[1]: Failed to start Proxmox VE replication runner.
Aug 7 09:28:11 pve2 corosync[7344]: [TOTEM ] Token has not been received in 2941 ms
Aug 7 09:28:13 pve2 pvestatd[1327]: got timeout

I have no idea what else to even check here, and like I said, the network is working perfectly fine, which is why I assume the reboots are happening.

One last tidbit, hopefully it leads to something. this problematic node wasn't installed like the rest (using the proxmox install ISO), because I couldn't get it to agree with the onboard GPU for that hardware. Instead, I had to do a fresh Debian Buster install, and then layer Proxmox over the top of that.

I found a forum post with someone explaining a similar issue, which can be found here, and I tried the solution suggested, which didn't help.
 
I just had the same problem and ran into your posting just now, and I just fixed it like this. I hope it works for you as well.

I opened a terminator window to all hosts, put them in broadcast mode so what I type is the same in each window, then did a

tail -f /var/log/daemon.log | grep corosync

And there was one host that produced suspicious messages like you pasted, and a lot more messages.

I rebooted that host and after that everything stabilized and was running fine again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!