Cluster failed

baldy

Active Member
Feb 9, 2017
17
4
43
41
Hi all,

today u was in the datacenter and bring back a broken server.
I added the server with pvecm 10.0.2.110 --force

After this command Cluster was complete and working.
Then i added a VLAN (tagged) in my Switches and then the chaos begins.
After i see a lot a trouble i deleted my VLANs again in hope i will run in normal mode again.

but things happends. On Node 1 and Node2 i got a lot of errors:

Mar 10 15:51:59 host01 systemd[1]: Stopping The Proxmox VE cluster filesystem...
Mar 10 15:52:00 host01 corosync[14350]: [TOTEM ] A new membership (10.0.2.110:127136) was formed. Members
Mar 10 15:52:00 host01 corosync[14350]: [QUORUM] Members[12]: 1 2 3 4 5 6 7 8 9 10 11 12
Mar 10 15:52:00 host01 corosync[14350]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 10 15:52:01 host01 corosync[14350]: [TOTEM ] A new membership (10.0.2.110:127140) was formed. Members
Mar 10 15:52:01 host01 corosync[14350]: [QUORUM] Members[12]: 1 2 3 4 5 6 7 8 9 10 11 12
Mar 10 15:52:01 host01 corosync[14350]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 10 15:52:04 host01 corosync[14350]: [TOTEM ] A new membership (10.0.2.110:127144) was formed. Members
Mar 10 15:52:04 host01 corosync[14350]: [QUORUM] Members[12]: 1 2 3 4 5 6 7 8 9 10 11 12
Mar 10 15:52:04 host01 corosync[14350]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 10 15:52:10 host01 systemd[1]: pve-cluster.service stop-sigterm timed out. Killing.
Mar 10 15:52:10 host01 cron[2006]: (*system*vzdump) CAN'T OPEN SYMLINK (/etc/cron.d/vzdump)
Mar 10 15:52:10 host01 pve-ha-lrm[2128]: unable to write lrm status file - unable to open file '/etc/pve/nodes/host01/lrm_status.tmp.2128' - Transport endpoint is not connected
Mar 10 15:52:10 host01 systemd[1]: pve-cluster.service: main process exited, code=killed, status=9/KILL
Mar 10 15:52:10 host01 systemd[1]: Unit pve-cluster.service entered failed state.
Mar 10 15:52:10 host01 systemd[1]: Starting The Proxmox VE cluster filesystem...
Mar 10 15:52:10 host01 pmxcfs[23185]: [status] notice: update cluster info (cluster name fcse, version = 13)
Mar 10 15:52:10 host01 pmxcfs[23185]: [status] notice: node has quorum
Mar 10 15:52:10 host01 pmxcfs[23185]: [dcdb] notice: members: 1/23185, 3/1990, 4/1910, 5/22930, 6/1893, 7/2035, 8/1927, 9/1887, 10/1989, 11/1509, 12/2135
Mar 10 15:52:10 host01 pmxcfs[23185]: [dcdb] notice: starting data syncronisation
Mar 10 15:52:10 host01 pmxcfs[23185]: [dcdb] notice: received sync request (epoch 1/23185/00000001)
Mar 10 15:52:10 host01 pmxcfs[23185]: [status] notice: members: 1/23185, 3/1990, 4/1910, 5/22930, 6/1893, 7/2035, 8/1927, 9/1887, 10/1989, 11/1509, 12/2135
Mar 10 15:52:10 host01 pmxcfs[23185]: [status] notice: starting data syncronisation
Mar 10 15:52:10 host01 pmxcfs[23185]: [status] notice: received sync request (epoch 1/23185/00000001)
Mar 10 15:52:10 host01 pvestatd[31372]: ipcc_send_rec failed: Transport endpoint is not connected
Mar 10 15:52:10 host01 pvestatd[31372]: ipcc_send_rec failed: Connection refused
Mar 10 15:52:10 host01 pvestatd[31372]: ipcc_send_rec failed: Connection refused
Mar 10 15:52:10 host01 pvestatd[31372]: ipcc_send_rec failed: Connection refused
Mar 10 15:52:10 host01 pvestatd[31372]: status update time (35.230 seconds)
Mar 10 15:52:10 host01 pvestatd[31372]: ipcc_send_rec failed: Connection refused
Mar 10 15:52:10 host01 pvestatd[31372]: ipcc_send_rec failed: Connection refused
Mar 10 15:52:10 host01 pvestatd[31372]: ipcc_send_rec failed: Connection refused
Mar 10 15:52:10 host01 pvestatd[31372]: ipcc_send_rec failed: Connection refused
Mar 10 15:52:10 host01 pvestatd[31372]: ipcc_send_rec failed: Connection refused
Mar 10 15:52:10 host01 pvestatd[31372]: ipcc_send_rec failed: Connection refused
Mar 10 15:52:13 host01 corosync[14350]: [TOTEM ] A new membership (10.0.2.110:127148) was formed. Members
Mar 10 15:52:13 host01 corosync[14350]: [QUORUM] Members[12]: 1 2 3 4 5 6 7 8 9 10 11 12
Mar 10 15:52:13 host01 corosync[14350]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 10 15:52:15 host01 pve-ha-lrm[2128]: loop take too long (45 seconds)
Mar 10 15:52:15 host01 pve-ha-crm[2115]: ipcc_send_rec failed: Transport endpoint is not connected
Mar 10 15:52:15 host01 pve-ha-lrm[2128]: ipcc_send_rec failed: Transport endpoint is not connected
Mar 10 15:52:22 host01 corosync[14350]: [TOTEM ] A new membership (10.0.2.110:127152) was formed. Members
Mar 10 15:52:22 host01 corosync[14350]: [QUORUM] Members[12]: 1 2 3 4 5 6 7 8 9 10 11 12
Mar 10 15:52:22 host01 corosync[14350]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 10 15:52:25 host01 corosync[14350]: [TOTEM ] A new membership (10.0.2.110:127156) was formed. Members
Mar 10 15:52:25 host01 corosync[14350]: [QUORUM] Members[12]: 1 2 3 4 5 6 7 8 9 10 11 12
Mar 10 15:52:25 host01 corosync[14350]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 10 15:52:27 host01 corosync[14350]: [TOTEM ] A new membership (10.0.2.110:127160) was formed. Members
Mar 10 15:52:27 host01 corosync[14350]: [QUORUM] Members[12]: 1 2 3 4 5 6 7 8 9 10 11 12
Mar 10 15:52:27 host01 corosync[14350]: [MAIN ] Completed service synchronization, ready to provide service.​

I a not able to start/restart pve-cluster and pvestatd.

Multicast is still working tested with omping arround the whole cluster with 12 nodes.

Has anyone any idea because i am not able to reinstall the cluster :-(

Cheers

Daniel
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!