Proxmox reboot after spanning tree issue

max.nolent

Member
Aug 14, 2020
12
0
6
26
Hello Everyone,

I have a cluster of 5 proxmox nodes, all nodes are in Datacenter, we have an issue when we have a spanning recalculation. This morning, we lost a 10gbs on the switch connected to the proxmox. After link down, the switch calculate spanning tree then we lost 4 of our 5 nodes.

Everytime that we have a spanning tree, somes of our node restart. Do you any idea ?

Proxmox 5.4-3
 
Yes, i have HA ressource Between my first and third node but my first four nodes has restart, is there any log when node restart because of lost connection ?
 
From my logs :

Aug 14 11:14:56 proxmox3 corosync[3011]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 14 11:14:56 proxmox3 corosync[3011]: [QUORUM] Members[1]: 1
Aug 14 11:14:56 proxmox3 corosync[3011]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 14 11:14:56 proxmox3 pmxcfs[2961]: [status] notice: node lost quorum
Aug 14 11:14:56 proxmox3 pmxcfs[2961]: [dcdb] crit: received write while not quorate - trigger resync
Aug 14 11:14:56 proxmox3 pmxcfs[2961]: [dcdb] crit: leaving CPG group
Aug 14 11:14:57 proxmox3 pmxcfs[2961]: [dcdb] notice: start cluster connection
Aug 14 11:14:57 proxmox3 pmxcfs[2961]: [dcdb] notice: members: 1/2961
Aug 14 11:14:57 proxmox3 pmxcfs[2961]: [dcdb] notice: all data is up to date


From here, i lost my quorum and my node is Alone.

STP recalculation : then

Few minutes laters :

Aug 14 11:15:24 proxmox3 corosync[3011]: notice [QUORUM] Members[5]: 1 2 3 5 4
Aug 14 11:15:24 proxmox3 corosync[3011]: notice [MAIN ] Completed service synchronization, ready to provide service.
Aug 14 11:15:24 proxmox3 pmxcfs[2961]: [dcdb] notice: starting data syncronisation
Aug 14 11:15:24 proxmox3 corosync[3011]: [QUORUM] Members[5]: 1 2 3 5 4
Aug 14 11:15:24 proxmox3 corosync[3011]: [MAIN ] Completed service synchronization, ready to provide service.

My quorum is now up to date.
After that, i lost the quorm a second time, my 10gbs link comes up

Aug 14 11:15:43 proxmox3 corosync[3011]: notice [TOTEM ] A processor failed, forming new configuration.
Aug 14 11:15:43 proxmox3 corosync[3011]: [TOTEM ] A processor failed, forming new configuration.
Aug 14 11:15:46 proxmox3 corosync[3011]: notice [TOTEM ] A new membership (100.101.20.2:9696) was formed. Members left: 2 3 5 4
Aug 14 11:15:46 proxmox3 corosync[3011]: notice [TOTEM ] Failed to receive the leave message. failed: 2 3 5 4
Aug 14 11:15:46 proxmox3 corosync[3011]: warning [CPG ] downlist left_list: 4 received
Aug 14 11:15:46 proxmox3 corosync[3011]: notice [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 14 11:15:46 proxmox3 corosync[3011]: notice [QUORUM] Members[1]: 1

New STP recalculation :
Aug 14 11:16:29 proxmox3 corosync[3011]: notice [QUORUM] This node is within the primary component and will provide service.
Aug 14 11:16:29 proxmox3 corosync[3011]: notice [QUORUM] Members[5]: 1 2 3 5 4
Aug 14 11:16:29 proxmox3 corosync[3011]: notice [MAIN ] Completed service synchronization, ready to provide service.
Aug 14 11:16:29 proxmox3 pmxcfs[2961]: [dcdb] notice: starting data syncronisation
Aug 14 11:16:29 proxmox3 corosync[3011]: [QUORUM] This node is within the primary component and will provide service.
Aug 14 11:16:29 proxmox3 corosync[3011]: [QUORUM] Members[5]: 1 2 3 5 4
Aug 14 11:16:29 proxmox3 corosync[3011]: [MAIN ] Completed service synchronization, ready to provide service.


But the node restart at 14:17:00
Aug 14 11:16:29 proxmox3 pmxcfs[2961]: [status] notice: received all states
Aug 14 11:16:29 proxmox3 pmxcfs[2961]: [status] notice: all data is up to date
Aug 14 11:16:29 proxmox3 pmxcfs[2961]: [status] notice: dfsm_deliver_queue: queue length 50
Aug 14 11:16:32 proxmox3 pve-ha-lrm[3150]: successfully acquired lock 'ha_agent_proxmox3_lock'
Aug 14 11:16:32 proxmox3 pve-ha-lrm[3150]: status change lost_agent_lock => active
Aug 14 11:16:37 proxmox3 pve-ha-crm[3107]: status change wait_for_quorum => slave
Aug 14 11:17:00 proxmox3 systemd[1]: Starting Proxmox VE replication runner...

Aug 14 11:20:30 proxmox3 systemd[1]: Starting Flush Journal to Persistent Storage...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!