Need help: Upgrade failed from 4.3-3/557191d3 to 4.3-9/f7c6f0cd

bladux

Well-Known Member
Nov 7, 2016
30
0
46
42
Hello,

I have been running a 7 nodes proxmox cluster for a year that was running 4.3-3/557191d3.

I added 2 new nodes to the cluster last week. Nodes were freshly installed so those 2 nodes are running 4.3-9/f7c6f0cd (latest to date)

Today I had an issue on my master node, rebooting it did not help so I dist-upgrade it to at least be on the lastest version. During upgrade, pve-manager installed timed out and my upgarde failed.
I tried rebooting and dist-upgrade again with no luck : it seems to timeout when restarting/stoping pve-manager.

I tried upgrading my node #2 and ran into the exact same issue.

Right now I have 2 nodes offline: my master node, and node #2.

Here is my pveversion -v for those 2 nodes:
proxmox-ve: not correctly installed (running kernel: 4.4.21-1-pve)
pve-manager: not correctly installed (running version: 4.3-9/f7c6f0cd)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.13-2-pve: 4.4.13-58
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.15-1-pve: 4.4.15-60
pve-kernel-4.2.8-1-pve: 4.2.8-41
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-46
qemu-server: 4.0-92
pve-firmware: 1.1-10
libpve-common-perl: 4.0-79
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-68
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.3-12
pve-qemu-kvm: 2.7.0-4
pve-container: 1.0-80
pve-firewall: 2.0-31
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.5-1
lxcfs: 2.0.4-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80

The current kernel:
Linux proxmox1 4.4.21-1-pve #1 SMP Thu Oct 27 09:31:44 CEST 2016 x86_64 GNU/Linux

I'm running out of ideas..

Many thanks !
 
I just tried to reboot on a non pve kernel to upgrade with no luck... Same thing on an older pve kernel..
 
Don`t know how you try to upgrade but even installing package with errors you can try it again. BTW do you upgrade system from proxmox web UI ?
Have you tried from shell (ssh) do upgrade procedure? I have never see any timeout doing package upgrade.
 
Hello,

I did install from gui, but have been working from shell ever since the first timeout.

Some more debug:

systemctl status pvedaemon.service
root@proxmox1:~# systemctl status pvedaemon.service
pvedaemon.service - PVE API Daemon
Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled)
Active: active (running) since Mon 2016-11-07 11:41:23 CET; 13min ago
Main PID: 1138 (pvedaemon)
CGroup: /system.slice/pvedaemon.service
├─1138 pvedaemon
├─1139 pvedaemon worker
├─1140 pvedaemon worker
├─1141 pvedaemon worker
└─control
└─3364 /usr/bin/perl -T /usr/bin/pvedaemon restart

Nov 07 11:41:23 proxmox1 pvedaemon[1138]: starting server
Nov 07 11:41:23 proxmox1 pvedaemon[1138]: starting 3 worker(s)
Nov 07 11:41:23 proxmox1 pvedaemon[1138]: worker 1139 started
Nov 07 11:41:23 proxmox1 pvedaemon[1138]: worker 1140 started
Nov 07 11:41:23 proxmox1 pvedaemon[1138]: worker 1141 started
Nov 07 11:41:23 proxmox1 systemd[1]: Started PVE API Daemon.
Nov 07 11:43:20 proxmox1 pvedaemon[1139]: <root@pam> successful auth for user 'root@pam'
Nov 07 11:52:05 proxmox1 systemd[1]: Reloading PVE API Daemon.
Nov 07 11:53:35 proxmox1 systemd[1]: pvedaemon.service reload operation timed out. Stopping.
Nov 07 11:53:35 proxmox1 systemd[1]: Reload failed for PVE API Daemon.
 
I finally found a way out of this mess..

I had to stop pve-cluster to be able to reinstall correctly these 2 packages.

Then I had to hard reboot all my cluster nodes.