[SOLVED] After a full shutdown, pve cluster not restarting.

Pigi_102

New Member
May 24, 2024
17
4
3
Hello all.
I have a strange situation. I had to shutdown my cluster ( 3 nodes ) because we moved our office, and now I cannot get the cluster to restart.
The behavious on the 3 nodes is different ( for all 3 ).
On node1 pveproxy hangs and does not start. pvecm says the cluster is up on all node and quorate:
Code:
Cluster information
-------------------
Name:             ProxMox-CL-LAB
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed May  7 10:17:28 2025
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.c8e8
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.126.201 (local)
0x00000002          1 192.168.126.202
0x00000003          1 192.168.126.203
pvesm ( no matter wich command ) hangs . Strange enough, every ssh to other nodes hangs too.

On node 2 pveproxy is started. pvecm says everything is fine and all nodes are connected:
Code:
Cluster information
-------------------
Name:             ProxMox-CL-LAB
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed May  7 10:19:48 2025
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000002
Ring ID:          1.c8e8
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.126.201
0x00000002          1 192.168.126.202 (local)
0x00000003          1 192.168.126.203
pvesm commands works fine. ssh towards other nodes works fine.


On node 3 pveproxy is started. pvecm says he is the only node on the cluster ( also ring id is different) :
Code:
Cluster information
-------------------
Name:             ProxMox-CL-LAB
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed May  7 10:24:12 2025
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000003
Ring ID:          3.ce95
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:           

Membership information
----------------------
    Nodeid      Votes Name
0x00000003          1 192.168.126.203 (local)
pvesm commands works fine ( and shows the correct volumes of the cluster before shutdown ). ssh towards other nodes works too.
Before the shutdown everything was fine.

Can you suggest a way to troubleshoot and make it restart again ?

Thanks in advance.

Pigi_102
 
Last edited:
check the network configuration and corosync logs on all nodes, it sounds like something is messed up there..
 
Very very strange, but it ended up to be a broken switch.
Not hardware broken but something messy was happening with the vlans.
Moving everything to another switch made everything working.

Thanks
 
  • Like
Reactions: fabian