Broken cluster after upgrade

decibel83

Renowned Member
Oct 15, 2008
210
1
83
Hi,
today I made "apt-get dist-upgrade" to my Proxmox hosts, but the upgrade did not finish well and after the cluster was broken.

The problem was on the upgrade of pve-manager which cannot be started:

Code:
root@node07:/home/mattia# apt-get install pve-manager
Reading package lists... Done
Building dependency tree       
Reading state information... Done
pve-manager is already the newest version (5.3-7).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n]
Setting up pve-manager (5.3-7) ...
Job for pvestatd.service failed because the control process exited with error code.
See "systemctl status pvestatd.service" and "journalctl -xe" for details.
dpkg: error processing package pve-manager (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 pve-manager
E: Sub-process /usr/bin/dpkg returned an error code (1)

Code:
root@node07:/home/mattia# journalctl -xe

Jan 15 10:38:14 node07 pveproxy[25963]: worker exit
Jan 15 10:38:14 node07 pveproxy[5782]: worker 25963 finished
Jan 15 10:38:14 node07 pveproxy[5782]: starting 1 worker(s)
Jan 15 10:38:14 node07 pveproxy[5782]: worker 26075 started
Jan 15 10:38:14 node07 pveproxy[25964]: worker exit
Jan 15 10:38:14 node07 pveproxy[25965]: worker exit
Jan 15 10:38:14 node07 pveproxy[26075]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIS
Jan 15 10:38:14 node07 pveproxy[5782]: worker 25965 finished
Jan 15 10:38:14 node07 pveproxy[5782]: worker 25964 finished
Jan 15 10:38:14 node07 pveproxy[5782]: starting 2 worker(s)
Jan 15 10:38:14 node07 pveproxy[5782]: worker 26076 started
Jan 15 10:38:14 node07 pveproxy[5782]: worker 26077 started
Jan 15 10:38:14 node07 pveproxy[26076]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIS
Jan 15 10:38:14 node07 pveproxy[26077]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIS
Jan 15 10:38:15 node07 pvestatd[5279]: ipcc_send_rec[1] failed: Connection refused
Jan 15 10:38:15 node07 pvestatd[5279]: ipcc_send_rec[2] failed: Connection refused
Jan 15 10:38:15 node07 pvestatd[5279]: ipcc_send_rec[3] failed: Connection refused
Jan 15 10:38:15 node07 pvestatd[5279]: ipcc_send_rec[4] failed: Connection refused
Jan 15 10:38:15 node07 pvestatd[5279]: status update error: Connection refused

/etc/pve is not connected:

Code:
root@node07:/home/mattia# ls /etc/pve
ls: cannot access '/etc/pve': Transport endpoint is not connected

I already tried to restart the pve-cluster service and run pvecm updatecerts with no success.

In the web interface failed nodes are displayed with the red X icon.

Could you help me please?
 
Is pve-manager the one
1 not fully installed or removed.
Find out with
Code:
apt-get dist-upgrade
It should show what is to be upgraded and hopefully advice on how to fix.

Did the Proxmox subscription run out? Or is the /etc/pve directory missing completely?
/etc/pve/local/pve-ssl.key:
Where does the /etc/pve come from?
 
I managed in solving the problem running:

Code:
root@node08:/# systemctl start pve-cluster
root@node08:/# pvecm updatecerts
(re)generate node files
merge authorized SSH keys and known hosts
root@node08:/# apt-get -f install
Reading package lists... Done
Building dependency tree       
Reading state information... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up pve-manager (5.3-7) ...
W: APT had planned for dpkg to do more than it reported back (0 vs 4).
   Affected packages: pve-manager:amd64

/etc/pve was not mounted at all.

Now everything seems to work.
Thank you!