[SOLVED] Node not in cluster after debian update

anone

New Member
Nov 24, 2022
22
1
3
I litteraly came back this morning and it's still hanging... but if I open another shell and type apt list --upgradable it says done.

If I try apt dist-upgrade tho it tells me that the dpkg lock frontend is held by apt. Is this normal ? Should I reboot ?

Friendly yours,

Anone
 

Moayad

Proxmox Staff Member
Staff member
Jan 2, 2020
2,140
173
68
29
Vienna
shop.maurer-it.com
I litteraly came back this morning and it's still hanging... but if I open another shell and type apt list --upgradable it says done.
This should not happen. Can you please check the term.log `/var/log/apt/term.log` from the server who gets hangs at upgrade process?
 

anone

New Member
Nov 24, 2022
22
1
3
well one of my colleagues restarted it ... the term.log file is empty.

If I try to re run apt dist-upgrade it says :

1669794431929.png

So I then run dpk configure and as you can see it's setting up pve-manager, but I know it will just hang forever like it did on node1. Only fix I found was moving the pve-manager.postinst file out then running the dpkg configure again.

I haven't updated the other nodes yet, should I ?

Friendly yours,

Anone
 

Moayad

Proxmox Staff Member
Staff member
Jan 2, 2020
2,140
173
68
29
Vienna
shop.maurer-it.com
Hi,

I would Ctrl+C on the upgrade process shell, then do the following:

Bash:
dpkg --configure -a
apt install -f
apt dist-upgrade -y

EDIT: Do you see anything in the Syslog when the node hangs?
 
Last edited:

anone

New Member
Nov 24, 2022
22
1
3
here is the syslog while it's hanging :

1669795607605.png

Otherwise after aborting dpkg configure I got :

1669795648016.png

Do you still want me to make a script of what you mentioned above ?

Friendly yours,

Anone
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
8,412
1,675
174
what is the status of corosync.service on the failed nodes?
 

anone

New Member
Nov 24, 2022
22
1
3
Here is the status of the failed nodes,

node1 :

1669797687377.png

node2 :

1669797725215.png

Friendly yours,

Anone
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
8,412
1,675
174
your config is bogus (if the nodes have never been rebooted or upgraded since before you started, this might have been leftover from your predecessor), and it only got noticed at the time of the upgrade because that restarted the corosync service..

prepare a correct corosync.conf with the following:
- version set to 2 , instead of 3
- config_version increased by 1

distribute this corosync.conf to all nodes and put it into /etc/corosync/corosync.conf
now on each node run

systemctl reset-failed corosync; systemctl restart corosync

verify with corosync-cfgtool -s , pvecm status that the cluster is up and quorate again
cp the fixed config into /etc/pve/corosync.conf

now proceed with the updates
 
  • Like
Reactions: Moayad

anone

New Member
Nov 24, 2022
22
1
3
You are a legend !

I did what you said and it totally fixed everything !

For other people having a similar problem, I'm guessing updating the first node possibly changed the version to version 3 ?? But I don't understand how the other ones changed to 3 and still kept functionning while also being outdated.

Anyways, all is good and up to date now !

Big thanks !

Friendly yours,

Anone
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
8,412
1,675
174
no, nothing in PVE will change that part of the config on its own (it's pretty much hardcoded to "2" ;)).

someone with admin level access must have done that at some point in the past (probably confusing version and config_version), which made the config no longer parseable. the next restart of corosync then triggers the error, until then corosync just refuses to reload the faulty config and keeps running with the previously loaded one..
 

anone

New Member
Nov 24, 2022
22
1
3
no, nothing in PVE will change that part of the config on its own (it's pretty much hardcoded to "2" ;)).

someone with admin level access must have done that at some point in the past (probably confusing version and config_version), which made the config no longer parseable. the next restart of corosync then triggers the error, until then corosync just refuses to reload the faulty config and keeps running with the previously loaded one..
oh okay, I had it all wrong then. Thanks for the explanation !
 
  • Like
Reactions: fabian

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!