[SOLVED] No GUI nor SSH after upgrade 6.1 -> 6.3 - needs manual restart of services

Coffeeri

Member
Jun 8, 2019
27
4
8
28
Hello folks,
I just upgraded my single pve node from 6.1-3 to 6.3-2.
This process had me do apt upgrade twice, so I moved from 6.1-3 over 6.2-X to 6.3-2.

After the reboot I had to run those commands, since GUI and SSH were not working, nor was any VM/ LXC up:

Bash:
systemctl restart pvestatd
systemctl restart pvedaemon
systemctl restart pveproxy
pvesh create /nodes/pve/startall

Everything seemed working afterwards. I rebooted a second time just to be sure, then I noticed that the error was not resolved.

How can I fix this, so I do not have to run the command from above again and again?

Thank you!

Some useful info:
Code:
root@pve:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-4.15: 5.4-7
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-4.15.18-19-pve: 4.15.18-45
pve-kernel-4.15.18-12-pve: 4.15.18-36
ceph: 14.2.15-pve1
ceph-fuse: 14.2.15-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
I just upgraded my single pve node from 6.1-3 to 6.3-2.
This process had me do apt upgrade twice, so I moved from 6.1-3 over 6.2-X to 6.3-2.
Please always do the upgrade with dist-upgrade apt update && apt dist-upgrade see the FAQ [1]

Please post output of status services pveproxy,pvestatd and pvedaemon

Everything seemed working afterwards. I rebooted a second time just to be sure, then I noticed that the error was not resolved.

How can I fix this, so I do not have to run the command from above again and again?

[1] https://forum.proxmox.com/threads/proxmox-ve-6-3-available.79687/
 
  • Like
Reactions: Coffeeri
Anything relevant in the logs? (`journalctl -b` yields the complete journal since boot)
 
Thank you for the fast response @Moayad @Stoiko Ivanov - you rock!
I've just done a fresh reboot to get the logs.

pvestatd
Code:
● pvestatd.service - PVE Status Daemon
   Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2020-11-26 13:31:04 CET; 3min 26s ago
  Process: 4954 ExecStart=/usr/bin/pvestatd start (code=exited, status=111)

Nov 26 13:31:04 pve pvestatd[4954]: ipcc_send_rec[1] failed: Connection refused
Nov 26 13:31:04 pve pvestatd[4954]: ipcc_send_rec[1] failed: Connection refused
Nov 26 13:31:04 pve pvestatd[4954]: ipcc_send_rec[2] failed: Connection refused
Nov 26 13:31:04 pve pvestatd[4954]: ipcc_send_rec[2] failed: Connection refused
Nov 26 13:31:04 pve pvestatd[4954]: ipcc_send_rec[3] failed: Connection refused
Nov 26 13:31:04 pve pvestatd[4954]: ipcc_send_rec[3] failed: Connection refused
Nov 26 13:31:04 pve pvestatd[4954]: Unable to load access control list: Connection refused
Nov 26 13:31:04 pve systemd[1]: pvestatd.service: Control process exited, code=exited, status=111/n/a
Nov 26 13:31:04 pve systemd[1]: pvestatd.service: Failed with result 'exit-code'.
Nov 26 13:31:04 pve systemd[1]: Failed to start PVE Status Daemon.

pvedaemon
Code:
● pvedaemon.service - PVE API Daemon
   Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2020-11-26 13:31:06 CET; 3min 42s ago
  Process: 5139 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
Main PID: 5531 (pvedaemon)
    Tasks: 4 (limit: 4915)
   Memory: 131.1M
   CGroup: /system.slice/pvedaemon.service
           ├─5531 pvedaemon
           ├─5532 pvedaemon worker
           ├─5534 pvedaemon worker
           └─5535 pvedaemon worker

Nov 26 13:31:04 pve systemd[1]: Starting PVE API Daemon...
Nov 26 13:31:06 pve pvedaemon[5531]: starting server
Nov 26 13:31:06 pve pvedaemon[5531]: starting 3 worker(s)
Nov 26 13:31:06 pve pvedaemon[5531]: worker 5532 started
Nov 26 13:31:06 pve pvedaemon[5531]: worker 5534 started
Nov 26 13:31:06 pve pvedaemon[5531]: worker 5535 started
Nov 26 13:31:06 pve systemd[1]: Started PVE API Daemon.

pveproxy
Code:
● pveproxy.service - PVE API Proxy Server
   Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2020-11-26 13:31:08 CET; 2min 50s ago
  Process: 5537 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=111)
  Process: 5925 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Main PID: 6499 (pveproxy)
    Tasks: 4 (limit: 4915)
   Memory: 135.1M
   CGroup: /system.slice/pveproxy.service
           ├─6499 pveproxy
           ├─9563 pveproxy worker
           ├─9564 pveproxy worker
           └─9565 pveproxy worker

Nov 26 13:33:54 pve pveproxy[6499]: starting 2 worker(s)
Nov 26 13:33:54 pve pveproxy[6499]: worker 9563 started
Nov 26 13:33:54 pve pveproxy[6499]: worker 9564 started
Nov 26 13:33:54 pve pveproxy[9563]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1737.
Nov 26 13:33:54 pve pveproxy[9564]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1737.
Nov 26 13:33:54 pve pveproxy[9452]: worker exit
Nov 26 13:33:54 pve pveproxy[6499]: worker 9452 finished
Nov 26 13:33:54 pve pveproxy[6499]: starting 1 worker(s)
Nov 26 13:33:54 pve pveproxy[6499]: worker 9565 started
Nov 26 13:33:54 pve pveproxy[9565]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1737.

journalctl -b (only some relevant parts)
Code:
Nov 26 13:35:14 pve pveproxy[10711]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1737.
Nov 26 13:35:14 pve pveproxy[10712]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1737.
Nov 26 13:35:14 pve pveproxy[6499]: worker 10691 finished
Nov 26 13:35:14 pve pveproxy[6499]: starting 1 worker(s)
Nov 26 13:35:14 pve pveproxy[6499]: worker 10713 started
Nov 26 13:35:14 pve pveproxy[10713]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1737.

So it seems, that pvedaemon is fine, pveproxy and pvestatd have some problems with ssl keys?
Code:
root@pve:/etc/pve/local# ll /etc/pve/local/
total 2
drwxr-xr-x 2 root www-data    0 Jun  1  2019 ./
drwxr-xr-x 2 root www-data    0 Jun  1  2019 ../
-rw-r----- 1 root www-data    0 Oct 30 14:45 config
-rw-r----- 1 root www-data   86 Aug 30  2019 host.fw
-rw-r----- 1 root www-data   84 Nov 26 11:59 lrm_status
-rw-r----- 1 root www-data    0 Jan  5  2020 lrm_status.tmp.10779
-rw-r----- 1 root www-data    0 Jun 24  2019 lrm_status.tmp.3118
drwxr-xr-x 2 root www-data    0 Jun  1  2019 lxc/
drwxr-xr-x 2 root www-data    0 Jun  1  2019 openvz/
drwx------ 2 root www-data    0 Jun  1  2019 priv/
-rw-r----- 1 root www-data 1679 Jun  1  2019 pve-ssl.key
-rw-r----- 1 root www-data 1704 Jun  1  2019 pve-ssl.pem
drwxr-xr-x 2 root www-data    0 Jun  1  2019 qemu-server/

This seems weird since I use the usual self signed SSL key..
1606395203241.png
 
Last edited:
The log would have been a bit more helpful - but from the statusoutput it seems that pve-cluster.service (providing the cluster filesystem pmxcfs) did not start correctly - what's the output of `journalctl -u pve-cluster`

Thanks!
 
The log would have been a bit more helpful - but from the statusoutput it seems that pve-cluster.service (providing the cluster filesystem pmxcfs) did not start correctly - what's the output of `journalctl -u pve-cluster`

Thanks!
It seems short and ok
Code:
root@pve:~# journalctl -u pve-cluster
-- Logs begin at Thu 2020-11-26 13:30:47 CET, end at Thu 2020-11-26 13:57:00 CET. --
Nov 26 13:35:36 pve systemd[1]: Starting The Proxmox VE cluster filesystem...
Nov 26 13:35:37 pve systemd[1]: Started The Proxmox VE cluster filesystem.

I'm afraid to post sensitive information with journalctl -b
 
Last edited:
do you have an entry for your host inside /etc/hosts? if not, maybe at boot it does not resolve yet via DNS (causing pmxcfs to not start, which in turn causes the other services to fail)? then when you do the restart, DNS is reachable, pmxcfs starts, everything is fine.. the full log for journalctl -b -u pve-cluster -u pveproxy -u pvedaemon -u pvestatd would help and should not contain any sensitive information
 
do you have an entry for your host inside /etc/hosts? if not, maybe at boot it does not resolve yet via DNS (causing pmxcfs to not start, which in turn causes the other services to fail)? then when you do the restart, DNS is reachable, pmxcfs starts, everything is fine.. the full log for journalctl -b -u pve-cluster -u pveproxy -u pvedaemon -u pvestatd would help and should not contain any sensitive information

Thank you!
The log seems to be to post here. So here is it in a gist.

My /etc/hosts
Code:
# Your system has configured 'manage_etc_hosts' as True.
# As a result, if you wish for changes to this file to persist
# then you will need to either
# a.) make changes to the master file in /etc/cloud/templates/hosts.debian.tmpl
# b.) change or remove the value of 'manage_etc_hosts' in
#     /etc/cloud/cloud.cfg or cloud-config from user-data
#
127.0.0.1 localhost.leserver localhost
172.16.1.100 pve.leserver pve pvelocalhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

I had this hiccup in the past where I installed cloudinit on the proxmox host itself... but fixed it for the better.

I am noticing that 172.16.1.100 is not the current IP address anymore - it is 192.168.X.X.

my /etc/hostname is just pve.
 
Last edited:
I am noticing that 172.16.1.100 is the current IP address anymore - it is 192.168.X.X.
please change that entry to your current node IP address, then check with hostname --ip-address to see if it returns the correct one.
afterwards restart the services and it should start working
 
  • Like
Reactions: Coffeeri
please change that entry to your current node IP address, then check with hostname --ip-address to see if it returns the correct one.
afterwards restart the services and it should start working
I did. hostname --ip-address is now correct.
The problem persists after a reboot just now.
 
maybe cloud-init overwrote it again? could you check that? you might need to disable/remove it

do you see anything different in the journals/logs ?
 
  • Like
Reactions: Coffeeri
maybe cloud-init overwrote it again? could you check that? you might need to disable/remove it

do you see anything different in the journals/logs ?
The guest additions are uninstalled https://forum.proxmox.com/threads/a...up-now-lxc-etc-not-working.61970/#post-284104

Sadly I do not see anything specific - I get some Ceph errors (which I do not use ... single node) and postfix (currently wrong email in config)

EDIT: Well I just did a couple of things to fix the ones above: pveceph purge and removed my configuration in /etc/postfix/main.cf.
I did another reboot and it seems to have fixed things.. Still, I am confused why it crashed pve-services, since those configurations were not changed in months without any problems.

Thank you for everyone's help!
 
  • Like
Reactions: Stoiko Ivanov
There might be an issue with ceph's service ordering (we're investigating) - thanks for pointing us in the direction!!
 
  • Like
Reactions: niziak and Coffeeri
The issue should be fixed with ceph nautilus version 14.2.15-pve2 (available in all proxmox nautilus repositories)
 
  • Like
Reactions: Moayad
Big thanks for fast fix (I noticed it was available even yesterday evening). Indeed it was related somehow to systemd dependencies. On slower machines everything started correctly and on faster machines it was randomized (multiple reboots helps).
 
EDIT: I figured I entered this to a discussion with different problem that got solved already. Please move to a new topic if better!

Hi there, I am having the same issue. Won't show me there is an updated package version.
sources.list:
Code:
deb http://deb.debian.org/debian buster main contrib

deb http://deb.debian.org/debian buster-updates main contrib

# security updates
deb http://security.debian.org buster/updates main contrib

pveversion -v
Code:
proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

Neither in GUI or cmdline it would find anything, all packages up2date.

/etc/hosts
Code:
192.168.1.99 pve1.lan pve1

is also correct.

This is a single node, no cluster or so. Not sure why the cluster service is then running? (just tried even stopping it before to see, no change)
Code:
pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; disabled; vendor pre
   Active: active (running) since Sat 2020-11-28 07:17:51 CET; 5min ago
  Process: 31915 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 31923 (pmxcfs)
    Tasks: 6 (limit: 4915)
   Memory: 27.2M
   CGroup: /system.slice/pve-cluster.service
           └─31923 /usr/bin/pmxcfs
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!