3 Node cluster - Icons disappear - Can't change VM settings

Feb 24, 2022
87
5
13
40
Hi,

Once again, it seems that I messed up something in my config.
I had troubles in the past in getting instabilities with corosync/cluster config. One time I hade issues with a bad network cable and one time the ssh keys were out of sync.

This time the problem seems to be different.

I can ping all 3 nodes.
Web UI login works on node1 and node3.
On node 2 I get an error with
Code:
"Login failed. Please try again"

The strange thing is that the icons on the left tree menu are only shown for node1.
1700812350863.png
I had this before but the icons were always refreshed after seconds. This time it seems to stay.

I can access to different views and settings of all nodes but on node 2 and node 3 I can't change anything from the gui.
1700812533024.png

I checked the /etc/corosync/corosync.conf and /etc/pve/corosync.conf.
Both look fine and the same.

Here my
pvecm status
Code:
Cluster information
-------------------
Name:             HomeCluster
Config Version:   13
Transport:        knet
Secure auth:      on


Quorum information
------------------
Date:             Fri Nov 24 08:57:02 2023
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.796
Quorate:          Yes


Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate


Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.1.254.254 (local)
0x00000002          1 10.1.254.253
0x00000003          1 10.1.254.251

Syslog shows nothing unusual, at least from my perspective.

Hope you can help.
Thanks.
 

Attachments

  • 1700812687545.png
    1700812687545.png
    22 KB · Views: 1
Hi,
please share the journal since boot for all 3 nodes. You can generate it on each node by running journalctl -b > "$(hostname)-journal.txt". What was done before the issues appeared? Please try to stop and restart the pvestad via sytemctl restart pvestatd.service on each node and see if this has an effect. Also, what is the output of pvesh get /cluster/resources for all 3 nodes.
 
Do you have multiple networks between the nodes? If so, try to see if you can ping them should you run into the same issue. It might be possible that the network for Corosync is still working, but the Mgmt network doesn't. In that case, access to the API would not work, which can look a lot like what you saw.
 
Hi Aaron,
Thank you again for your patience during training. xD (can recommend)

I was able to ping all nodes from each other.
In this case it is my home setup so no special networking. Just one NIC for each node and corosync and mgmt should run on the same network.

However, it works again and maybe it was just because of a networking hickup? Hard to say...

Before I was reinstalling node 2 because I wanted to switch from lvm to zfs. The reinstall and the rejoin of the newly set up node went well but then after some days the icons disappeared without making further changes to the setup.