[SOLVED] Proxmox 4 Upgrade Hanging "Setting up pve-manager"

Jospeh Huber

Renowned Member
Apr 18, 2016
99
7
73
45
Hi,

today I am trying to upgrade from an Proxmox 4 Version to the current Proxmox 4 Version in my cluster on the no-subscription repo.
The first node has no problems.
The second node hangs in the configuring step of "pve-manager".
My Starting Version
proxmox-ve: 4.4-93 (running kernel: 4.4.76-1-pve)
pve-manager: 4.4-17 (running version: 4.4-17/70a65945)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.16-1-pve: 4.4.16-64
pve-kernel-4.4.24-1-pve: 4.4.24-72
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.76-1-pve: 4.4.76-93
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-111
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
drbdmanage: not correctly installed
ceph: 10.2.9-1~bpo80+1


This is the last printout:
Setting up pve-manager (4.4-21) ...

This command is executed:
/usr/bin/dpkg --status-fd 22 --configure perl-modules:all perl:amd64 libssl1.0.0:amd64 libkrb5support0:amd64 libk5crypto3:amd64 libkrb5-3:amd64 libgssapi-krb5-2:amd64 libgssrpc4:amd64 libkadm5clnt-mit9:amd64 libkdb5-7:amd64 libkadm5srv-mit9:amd64 libkrad0:amd64 libxml2:amd64 libcups2:amd64 libcurl3:amd64 libcurl3-gnutls:amd64 libisc-export95:amd64 libdns-export100:amd64 libx11-data:all libx11-6:amd64 libgdk-pixbuf2.0-common:all libgdk-pixbuf2.0-0:amd64 libicu52:amd64 libisccfg-export90:amd64 libirs-export91:amd64 mysql-common:all libmysqlclient18:amd64 libnss3:amd64 libwbclient0:amd64 samba-common:all samba-libs:amd64 libsmbclient:amd64 smbclient:amd64 libx11-xcb1:amd64 libxfixes3:amd64 libxcursor1:amd64 libxi6:amd64 libxrandr2:amd64 libxtst6:amd64 rsync:amd64 openssl:amd64 pve-cluster:amd64 openssh-client:amd64 openssh-sftp-server:amd64 openssh-server:amd64 ssh:all wget:amd64 libisc95:amd64 libdns100:amd64 libisccc90:amd64 libisccfg90:amd64 libbind9-90:amd64 liblwres90:amd64 bind9-host:amd64 dnsutils:amd64 krb5-locales:all ncurses-term:all procmail:amd64 librados2:amd64 librbd1:amd64 libradosstriper1:amd64 librgw2:amd64 python-rados:amd64 libcephfs1:amd64 python-cephfs:amd64 python-rbd:amd64 ceph-common:amd64 ceph-base:amd64 ceph-osd:amd64 ceph-mon:amd64 ceph:amd64 python-ceph:amd64 libdbi1:amd64 libio-socket-ssl-perl:all libxml-libxml-perl:amd64 pve-kernel-4.4.98-3-pve:amd64 pve-qemu-kvm:amd64 qemu-server:amd64 pve-container:all pve-manager:amd64 proxmox-ve:all pve-kernel-4.4.76-1-pve:amd64 tcpdump:amd64


fuser -v /var/cache/debconf/config.dat
USER PID ACCESS COMMAND
/var/cache/debconf/config.dat:

This is the hanging command
root 8163 0.0 0.0 63592 17616 ? S 18:28 0:00 /usr/bin/perl -w /usr/share/debconf/frontend /var/lib/dpkg/info/pve-manager.postinst configure 4.4-17


There is nothing special in the logs...
Any Ideas?
 
please post the whole subtree of with the hanging command generated by "ps faxl"
 
Unfortunately I killed the command, I was to impatient ;-)

So everything in the dpkg --configure command after pve-manager is unconfigured now - including the kernel and proxmox-ve :(
"pve-manager:amd64 proxmox-ve:all pve-kernel-4.4.76-1-pve:amd64 tcpdump:amd64"
At the moment the System is broken.

But I can start it again only for the manager with the same result - it is hanging:
0 0 26518 19121 20 0 19020 4728 wait S+ pts/20 0:00 \_ /usr/bin/dpkg --configure pve-manager:amd64
0 0 26519 26518 20 0 63592 17716 wait S+ pts/20 0:00 \_ /usr/bin/perl -w /usr/share/debconf/frontend /var/lib/dpkg/info/pve-manager.postinst configure 4.4-17
0 0 26526 26519 20 0 13328 3084 wait S+ pts/20 0:00 \_ /bin/bash /var/lib/dpkg/info/pve-manager.postinst configure 4.4-17


The problem is in the script "/var/lib/dpkg/info/pve-manager.postinst" configure.
I modified the script with an "set -x" and the output is:
Setting up pve-manager (4.4-21) ...
+ set -e
+ . /usr/share/debconf/confmodule
++ '[' '!' '' ']'
++ PERL_DL_NONLAZY=1
++ export PERL_DL_NONLAZY
++ '[' '' ']'
++ exec /usr/share/debconf/frontend /var/lib/dpkg/info/pve-manager.postinst configure 4.4-17
+ set -e
+ . /usr/share/debconf/confmodule
++ '[' '!' 1 ']'
++ '[' -z '' ']'
++ exec
++ '[' '' ']'
++ exec
++ DEBCONF_REDIR=1
++ export DEBCONF_REDIR
+ db_stop
+ echo STOP
+ case "$1" in
+ mkdir /etc/pve
+ true
+ rm -rf /var/lib/pve-manager/apl-available
+ test -e /etc/cron.daily/pve
+ rm -f /etc/init.d/pvebanner
+ rm -f /etc/init.d/pvenetcommit
++ shuf -i 0-59 -n 1
+ MIN=44
++ shuf -i 2-5 -n 1
+ HOUR=3
+ cat
+ test '!' -e /var/lib/pve-manager/apl-info/download.proxmox.com
+ test -f /root/.forward
+ grep -q '|/usr/bin/pvemailforward' /root/.forward
+ test -f /etc/lsb-base-logging.sh
+ '[' -f /etc/systemd/system/ceph.service ']'
++ md5sum /etc/systemd/system/ceph.service
+ md5='f716952fcc5dda4ecdb153c02627da52 /etc/systemd/system/ceph.service'
+ [[ f716952fcc5dda4ecdb153c02627da52 /etc/systemd/system/ceph.service == \2\1\b\2\e\7\a\7\c\4\f\f\c\f\9\2\a\d\0\e\c\2\c\9\0\5\e\8\8\e\5\b\ \ \/\e\t\c\/\s\y\s\t\e\m\d\/\s\y\s\t\e\m\/\c\e\p\h\.\s\e\r\v\i\c\e ]]
+ systemctl --system daemon-reload
+ for service in pvedaemon pveproxy spiceproxy pvestatd pvebanner pvenetcommit pve-manager
+ deb-systemd-helper unmask pvedaemon.service
+ deb-systemd-helper --quiet was-enabled pvedaemon.service
+ deb-systemd-helper enable pvedaemon.service
+ for service in pvedaemon pveproxy spiceproxy pvestatd pvebanner pvenetcommit pve-manager
+ deb-systemd-helper unmask pveproxy.service
+ deb-systemd-helper --quiet was-enabled pveproxy.service
+ deb-systemd-helper enable pveproxy.service
+ for service in pvedaemon pveproxy spiceproxy pvestatd pvebanner pvenetcommit pve-manager
+ deb-systemd-helper unmask spiceproxy.service
+ deb-systemd-helper --quiet was-enabled spiceproxy.service
+ deb-systemd-helper enable spiceproxy.service
+ for service in pvedaemon pveproxy spiceproxy pvestatd pvebanner pvenetcommit pve-manager
+ deb-systemd-helper unmask pvestatd.service
+ deb-systemd-helper --quiet was-enabled pvestatd.service
+ deb-systemd-helper enable pvestatd.service
+ for service in pvedaemon pveproxy spiceproxy pvestatd pvebanner pvenetcommit pve-manager
+ deb-systemd-helper unmask pvebanner.service
+ deb-systemd-helper --quiet was-enabled pvebanner.service
+ deb-systemd-helper enable pvebanner.service
+ for service in pvedaemon pveproxy spiceproxy pvestatd pvebanner pvenetcommit pve-manager
+ deb-systemd-helper unmask pvenetcommit.service
+ deb-systemd-helper --quiet was-enabled pvenetcommit.service
+ deb-systemd-helper enable pvenetcommit.service
+ for service in pvedaemon pveproxy spiceproxy pvestatd pvebanner pvenetcommit pve-manager
+ deb-systemd-helper unmask pve-manager.service
+ deb-systemd-helper --quiet was-enabled pve-manager.service
+ deb-systemd-helper enable pve-manager.service
+ test '!' -e /proxmox_install_mode
+ for service in pvedaemon pveproxy spiceproxy pvestatd
+ deb-systemd-invoke reload-or-restart pvedaemon
+ for service in pvedaemon pveproxy spiceproxy pvestatd
+ deb-systemd-invoke reload-or-restart pveproxy



If you can help me, I think can go on like this ...
dpkg --configure proxmox-ve:all pve-kernel-4.4.76-1-pve:amd64 tcpdump:amd64
or a
apt-get upgrade
should do the job
 
please post the full output of ps faxl..
 
Do need really the full output, it is a running system the output would be about 1.100 lines...
Or is this extract sufficient?

4 0 21543 31254 20 0 82740 5848 poll_s Ss ? 0:00 \_ sshd: root@pts/27
4 0 21655 21543 20 0 24344 6340 wait Ss pts/27 0:00 | \_ -bash
0 0 11611 21655 20 0 19020 4716 wait S+ pts/27 0:00 | \_ /usr/bin/dpkg --configure pve-manager:amd64
0 0 11612 11611 20 0 63608 17592 wait S+ pts/27 0:00 | \_ /usr/bin/perl -w /usr/share/debconf/frontend /var/lib/dpkg/info/pve-manager.postinst configu
0 0 11619 11612 20 0 13328 3092 wait S+ pts/27 0:00 | \_ /bin/bash /var/lib/dpkg/info/pve-manager.postinst configure 4.4-17
4 0 11682 11619 20 0 22484 2564 poll_s S+ pts/27 0:00 | \_ /bin/systemctl reload-or-restart pveproxy
4 0 18747 31254 20 0 82740 5884 - Ss ? 0:00 \_ sshd: root@pts/20
4 0 19121 18747 20 0 24352 6348 wait Ss pts/20 0:00 \_ -bash
0 0 12880 19121 20 0 11204 2640 - R+ pts/20 0:00 \_ ps faxl
4 0 8239 1 20 0 22484 2548 poll_s S ? 0:00 /bin/systemctl reload-or-restart pveproxy
4 0 30723 1 0 -20 17960 5684 pause S<L ? 0:11 /usr/bin/atop -a -w /var/log/atop/atop_20180112 600
4 0 20100 1 20 0 22484 2556 poll_s S pts/20 0:00 /bin/systemctl reload-or-restart pveproxy
4 0 22971 1 20 0 22484 2560 poll_s S ? 0:00 /bin/systemctl reload-or-restart pveproxy

I killed all other restarts of "/bin/systemctl reload-or-restart pveproxy"... but it hangs again.
It looks like the problem is in the restart of pveproxy.

/bin/systemctl status pveproxy
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled)
Active: inactive (dead) since Fri 2018-01-12 06:25:06 CET; 6h ago
Main PID: 20008 (code=exited, status=0/SUCCESS)

Jan 12 06:25:02 vmhost2 systemd[1]: Stopping PVE API Proxy Server...
Jan 12 06:25:05 vmhost2 pveproxy[20008]: received signal TERM
Jan 12 06:25:05 vmhost2 pveproxy[20008]: server closing
Jan 12 06:25:05 vmhost2 pveproxy[15525]: worker exit
Jan 12 06:25:05 vmhost2 pveproxy[3584]: worker exit
Jan 12 06:25:05 vmhost2 pveproxy[20008]: worker 15525 finished
Jan 12 06:25:05 vmhost2 pveproxy[20008]: worker 3584 finished
Jan 12 06:25:05 vmhost2 pveproxy[20008]: worker 13075 finished
Jan 12 06:25:05 vmhost2 pveproxy[20008]: server stopped
Jan 12 06:25:06 vmhost2 pveproxy[22983]: worker exit
 
Last edited:
It looks like an hanging cronjob which is also blocked by the "/bin/systemctl restart pveproxy.service"

4 0 1648 1 20 0 27504 2840 hrtime Ss ? 0:56 /usr/sbin/cron -f
5 0 22364 1648 20 0 42240 2604 wait S ? 0:00 \_ /usr/sbin/CRON -f
4 0 22367 22364 20 0 4336 820 wait Ss ? 0:00 \_ /bin/sh -c test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
0 0 22368 22367 20 0 4224 668 poll_s S ? 0:00 \_ run-parts --report /etc/cron.daily
0 0 22530 22368 20 0 4336 808 wait S ? 0:00 \_ /bin/sh /etc/cron.daily/logrotate
4 0 22531 22530 20 0 29456 2792 wait S ? 0:00 \_ /usr/sbin/logrotate /etc/logrotate.conf
0 0 22571 22531 20 0 4336 728 wait S ? 0:00 \_ sh -c ??/etc/init.d/pveproxy restart > /dev/null ??/etc/init.d/spiceproxy restart >
/dev/null logrotate_script /var/log/pveproxy/access.log
0 0 22572 22571 20 0 4336 1668 wait S ? 0:00 \_ /bin/sh /etc/init.d/pveproxy restart
4 0 22583 22572 20 0 22484 2632 poll_s S ? 0:00 \_ /bin/systemctl restart pveproxy.service


But anyway, if I kill all these restart jobs I can not start or restart it on the command line manually.
" service pveproxy start" hangs ...
There is nothing in the logs /var/log/syslog messages pveproxy
In my opinion the problem is in pveproxy which cannot be started.

How can I solve this problem without rebooting?
... a reboot of the unconfigured kernel and system will fail...
 
in that case the question is why pveproxy does not restart - are there still leftover pveproxy processes? if so, can you do "cat /proc/PID/stack" for each of their pids? what happens if you kill -9 them? is there anything else in the logs that looks suspicious? can you please provide "pveversion -v"
 
No there is no pve-proxy running.
ps waux | grep pveproxy

pveversion -v
proxmox-ve: not correctly installed (running kernel: 4.4.76-1-pve)
pve-manager: not correctly installed (running version: 4.4-21/e0dadcf8)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.98-3-pve: 4.4.98-103
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.15-1-pve: 4.4.15-60
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.16-1-pve: 4.4.16-64
pve-kernel-4.4.24-1-pve: 4.4.24-72
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.76-1-pve: 4.4.76-94
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-54
qemu-server: 4.0-114
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.1-5~pve4
pve-container: 1.0-104
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
drbdmanage: not correctly installed
ceph: 10.2.10-1~bpo80+1


Is this the problem with the hanging clustered filesystem?
forum.proxmox.com/threads/pveproxy-crashes-unable-to-start-it.35144/
 
# this hangs
service pveproxy start

# ps tree
0 19121 18747 20 0 24376 6376 wait Ss pts/20 0:00 \_ -bash
4 0 2761 19121 20 0 22484 2540 poll_s S+ pts/20 0:00 \_ systemctl start pveproxy.service
0 0 2795 2761 20 0 13176 1544 poll_s S+ pts/20 0:00 \_ /bin/systemd-tty-ask-password-agent --watch


cat /proc/2761/stack
[<ffffffff812246c9>] poll_schedule_timeout+0x49/0x70
[<ffffffff81225d52>] do_sys_poll+0x442/0x560
[<ffffffff8122618d>] SyS_ppoll+0x17d/0x1b0
[<ffffffff81865d76>] entry_SYSCALL_64_fastpath+0x16/0x75
[<ffffffffffffffff>] 0xffffffffffffffff

cat /proc/2795/stack
[<ffffffff812246c9>] poll_schedule_timeout+0x49/0x70
[<ffffffff81225d52>] do_sys_poll+0x442/0x560
[<ffffffff81225f87>] SyS_poll+0x97/0x120
[<ffffffff81865d76>] entry_SYSCALL_64_fastpath+0x16/0x75
[<ffffffffffffffff>] 0xffffffffffffffff


Any ideas?
 
Is it possible and safe to reboot and try it again to configure?

All packages except proxmox-ve and pve-manager are configured, also the kernel.

/usr/bin/dpkg --configure proxmox-ve:all
dpkg: dependency problems prevent configuration of proxmox-ve:
proxmox-ve depends on pve-manager; however:
Package pve-manager is not configured yet.

dpkg: error processing package proxmox-ve (--configure):
dependency problems - leaving unconfigured
Errors were encountered while processing:
proxmox-ve


dpkg -l:
iU proxmox-ve 4.4-103 all The Proxmox Virtual Environment
ii psmisc 22.21-2 amd64 utilities that use the proc file system
ii pve-cluster 4.0-54 amd64 Cluster Infrastructure for Proxmox Virtual Environment
ii pve-container 1.0-104 all Proxmox VE Container management tool
ii pve-docs 4.4-4 all Proxmox VE Documentation
ii pve-firewall 2.0-33 amd64 Proxmox VE Firewall
ii pve-firmware 1.1-11 all Binary firmware code for the pve-kernel
ii pve-ha-manager 1.0-41 amd64 Proxmox VE HA Manager
ii pve-kernel-4.4.15-1-pve 4.4.15-60 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.4.16-1-pve 4.4.16-64 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.4.19-1-pve 4.4.19-66 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.4.24-1-pve 4.4.24-72 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.4.35-1-pve 4.4.35-77 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.4.35-2-pve 4.4.35-79 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.4.59-1-pve 4.4.59-87 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.4.6-1-pve 4.4.6-48 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.4.76-1-pve 4.4.76-94 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.4.98-3-pve 4.4.98-103 amd64 The Proxmox PVE Kernel Image
ii pve-libspice-server1 0.12.8-2 amd64 SPICE remote display system server library
iF pve-manager 4.4-21 amd64 The Proxmox Virtual Environment
ii pve-qemu-kvm 2.9.1-5~pve4 amd64 Full virtualization on x86 hardware
 
what happens if you kill the "0 0 2795 2761 20 0 13176 1544 poll_s S+ pts/20 0:00 \_ /bin/systemd-tty-ask-password-agent --watch" process?
 
"kill" and "kill -9" ... nothing... I have tried some other "things" to ... I have absolutely no idea!

4 0 25072 24520 20 0 22484 2600 poll_s S+ pts/21 0:00 \_ systemctl start pveproxy.service
0 0 25093 25072 20 0 0 0 exit Z+ pts/21 0:00 \_ [systemd-tty-ask] <defunct>


What do you mean, is it possible and safe to reboot and try it again to configure?
... but I think there is no other option
 
it probably won't make the situation any worse than it already is..
 
Strange, the reboot solved my problem.

After that I could configure the two unconfigured packages:
dpkg --configure proxmox-ve:all pve-manager:amd64

Unitl now, the problem occured only on one host of 7 others ...

If somebody else has this issue, be sure that as many as possible or all other packages are configured before rebooting.
I have manually removed the dpkg locks and called the configure starting after the first failed package:
dpkg --configure proxmox-ve:all pve-kernel-4.4.76-1-pve:amd64 tcpdump:amd64