[SOLVED] Node not in cluster after debian update

anone

New Member
Nov 24, 2022
22
1
3
Hello everyone,

first time post, sorry if the formatting is wrong.

So I started in a company and took over a proxmox cluster on Debian 11. After toying with proxmox for a few months, I decided to check the last thing on my predecessor's to-do list "keep the system up to date".

Simple enough task right ? I checked a few guides and ended up doing an apt-get update && apt-get upgrade ( I, of course turned off all the VM's and LXC's before hand).
Upgrade goes through, I still have some "pve-kernel" updates to do but I want to check things first. I reload my WEB UI, and there it was, my node1 could only see node2 (I have 6 in total) and he saw it offline. I check the logs and it says something along the lines of can't find hostname. I edit my /etc/hosts and it's working fine again (no green icon on the node tho ?). I repeat the process for the nodes left, but nothing.

When I log on my second node's web UI, all the nodes are there, but the first node has a red icon saying it's unavailable.

If i try to start a VM or LXC on node1 i get "cluster not ready - no quorum? (500)".

The .members files on node1 is basically empty, while good on node2. The corosyncs files are exactly the same.

Pvecm status returns "Cannot initialize CMAP service".

On node2, corosync status says that host 4 (node1) has no active links (all my nodes can ping one another).

I am on proxmox 7.2-7 and havent upgraded to 7.3 yet (was planning on it ... ).

You will find screenshots of everything with my post.

Thank you for your time and effort.

Friendly yours,

Anone
 

Attachments

  • corosync_status.PNG
    corosync_status.PNG
    12 KB · Views: 6
  • node1_view.PNG
    node1_view.PNG
    8 KB · Views: 5
  • node2_corosync_status.PNG
    node2_corosync_status.PNG
    40.4 KB · Views: 6
  • node2_view.PNG
    node2_view.PNG
    12.2 KB · Views: 6
  • pve-cluster_status.PNG
    pve-cluster_status.PNG
    37.3 KB · Views: 7
  • pvecm_status.PNG
    pvecm_status.PNG
    4.1 KB · Views: 7
Last edited:
Simple enough task right ? I checked a few guides and ended up doing an apt-get update && apt-get upgrade ( I, of course turned off all the VM's and LXC's before hand).
We highly recommend to not use the (apt update && apt upgrade) instead you have to do (apt update && apt dist-upgrade).

Can you please provide us with the Corosync config as well as the output of pveversion -v
 
Thanks for the quick response, I didn't know how you wanted it so I put the results in txt files.

Friendly yours,

Anone
 

Attachments

  • node1_corosync.txt
    971 bytes · Views: 4
  • node1_pve-version.txt
    2.1 KB · Views: 1
  • node2_corosync.txt
    1 KB · Views: 3
  • node2_pve-version.txt
    1.3 KB · Views: 1
Hello,

Thank you for the files!

Can you please do (apt update && apt dist-upgrade) if this does not help please provide us with the Syslog (/var/log/syslog) as attach.
 
well I clearly forgot to mention, I dont have access to internet anymore. The place I work at doesn't allow internet access on the server and I only had internet via a VM who's MAC was allowed to go on the web (don't ask me why, it's managed by another team). I tried the apt dist-upgrade but It can't reach internet. I can try via the web ui tho as I have these to updates :

updates.PNG

Sorry I didn't mention this sooner.

Friendly yours,

Anone
 
Otherwise been thinking of doing :

pvecm expected 1;
pvecm add IP-OF-ANY-CLUSTER-NODE;

Would that work out ? What about the pending updates tho ?

Friendly yours,

Anone
 
Hello, (thank you again btw)

I dont think I was clear enough, I don't have an internet access at all. Only the MAC of a VM's NIC was allowed internet access but I can't start it anymore so I cant do "apt install proxmox-offline-mirror".

You will find the syslog file attached. (I had to delete most of if because it was too big to upload, either way it's the same error messages in a loop).

When running status corosyn on node2 for example, I get :
1669640975516.png
I don't understand why I get host 4 has no active links since I just updated&upgraded (did not touch the network settings) and they can still ping each other.


Friendly Yours,


Anone
 

Attachments

  • syslog.txt
    7.5 KB · Views: 2
  • 1669640948473.png
    1669640948473.png
    54.3 KB · Views: 2
Last edited:
edit : I just restarted coroync and pve-cluster on node2 and now it has the same error message as my first post, meaning node2 only sees node1 offline and doesn't see the other nodes anymore.
 
second edit : I managed to finish the dist upgrade, so I now run proxmox 7.3-3 ( I had a hard time with pve-manager ) but it still doesn't work.

Syslog stil says Cluster not quorate -- extending auth key lifetime!
 
Hello,

Can you please provide us with full syslog since the upgrade?

After you did the upgrade to 7.3 did you try to restart the pve-cluster and corosync services?
 
Hello,

yes, I restarted both services which gave me the same results as before.

what I don't understand is when I look at the corosync status on node3 it says that host link is down :

1669727867641.png

I'll attach the syslogs of today.

Friendly yours,

Anone
 

Attachments

  • syslog.txt
    427.2 KB · Views: 1
Yes, all the nodes can ping each other.

The command :

1669731156572.png

Repo :

1669731193940.png

Friendly yours,

Anone
 
I just checked the version and no, node1 has 3.1.7 (which makes sense since it's the one i updated) but the rest has 3.1.5 (which doesnt explain why node2 left the cluster too).

Concerning the NIC's, when I joined the company everything was already set up, VM's and LXC's were running and I've been working on them for the past 3 months, and still learning/discovering about proxmox.

I am willing to try and set up a redundant ring once I get this back up !

Friendly yours,

Anone
 
Hi,

Consider upgrading the nodes to be in the same version, especially the Corosync in order to narrow down the issue.
 
Hi,

Consider upgrading the nodes to be in the same version, especially the Corosync in order to narrow down the issue.
ok so just to make sure you want me to do apt update && apt dist-upgrade on all my nodes ?

Friendly yours,

Anone
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!