Upgraded to 6.2-15 Host crashes now

sourceminer

Active Member
Jan 7, 2015
48
1
26
Hello, I have this strange issue. I have updated a host that has been in operation without incident for 3 years. I had to step through the documented upgrade process from 4.4. to 6.2.

Now however the host stops responding (Cannot SSH, cannot hit web admin) and its seeming to only affect 1 Windows Server, the other 2 VM's are still running (one windows 2019 server and a pfsense system). I have been going through the syslogs and don't see anything out of sorts.. but would be willing to send it to someone to inspect to see if anything stands out. The only way to bring back the system is to reboot from the host console.
 
Yes it crashed again this morning:

Linux pve 5.4.65-1-pve #1 SMP PVE 5.4.65-1 (Mon, 21 Sep 2020 15:40:22 +0200) x86 _64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Wed Nov 18 08:39:33 2020
root@pve:~# pveversion -v
proxmox-ve: 6.2-2 (running kernel: 5.4.65-1-pve)
pve-manager: 6.2-15 (running version: 6.2-15/48bd51b6)
pve-kernel-5.4: 6.2-7
pve-kernel-helper: 6.2-7
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.4.134-1-pve: 4.4.134-112
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.6-1-pve: 4.4.6-48
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-4
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-10
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.1-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-10
pve-cluster: 6.2-1
pve-container: 3.2-2
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-6
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-19
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve2


root@pve:~# ss -antlp State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 64 0.0.0.0:41917 0.0.0.0:* LISTEN 0 128 0.0.0.0:8006 0.0.0.0:* users:(("pveproxy worker",pid=1966,fd=6),("pveproxy worker",pid=1965,fd=6),("pveproxy worker",pid=1964,fd=6),("pveproxy",pid=1963,fd=6)) LISTEN 0 128 0.0.0.0:111 0.0.0.0:* users:(("rpcbind",pid=733,fd=4),("systemd",pid=1,fd=33)) LISTEN 0 128 127.0.0.1:85 0.0.0.0:* users:(("pvedaemon worke",pid=1257,fd=6),("pvedaemon worke",pid=1256,fd=6),("pvedaemon worke",pid=1255,fd=6),("pvedaemon",pid=1254,fd=6)) LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=972,fd=3)) LISTEN 0 128 0.0.0.0:3128 0.0.0.0:* users:(("spiceproxy work",pid=2524,fd=6),("spiceproxy",pid=2523,fd=6)) LISTEN 0 128 0.0.0.0:49433 0.0.0.0:* users:(("rpc.statd",pid=2998,fd=9)) LISTEN 0 100 127.0.0.1:25 0.0.0.0:* users:(("master",pid=1214,fd=13)) LISTEN 0 64 [::]:39935 [::]:* LISTEN 0 128 [::]:111 [::]:* users:(("rpcbind",pid=733,fd=6),("systemd",pid=1,fd=35)) LISTEN 0 128 [::]:48273 [::]:* users:(("rpc.statd",pid=2998,fd=11)) LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=972,fd=4)) LISTEN 0 100 [::1]:25 [::]:* users:(("master",pid=1214,fd=14))
 
Did you reboot the node after the upgrade?

but would be willing to send it to someone to inspect to see if anything stands out.
Yes please send the log and post output of the following commands:

Bash:
~ systemctl status pvedaemon.service
~ systemctl status pveproxy.service
~ systemctl status networking.service
~ curl -s -k https://localhost:8006 | grep title

EDIT:
LISTEN 0 128 0.0.0.0:8006 0.0.0.0:* users:(("pveproxy worker",pid=1966,fd=6),("pveproxy worker",pid=1965,fd=6),("pveproxy worker",pid=1964,fd=6),("pveproxy",pid=1963,fd=6))
8006 is open maybe you need to update node certs by doing this command: pvecm updatecerts --force
 
Last edited:
We have to force a reboot every morning so far. So yes its been rebooted several times.

Output of wanted commands:


root@pve:~# systemctl status pvedaemon.service
● pvedaemon.service - PVE API Daemon
Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-11-18 08:38:00 PST; 1 day 1h ago
Process: 1222 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
Main PID: 1254 (pvedaemon)
Tasks: 4 (limit: 4915)
Memory: 87.2M
CGroup: /system.slice/pvedaemon.service
├─1254 pvedaemon
├─1255 pvedaemon worker
├─1256 pvedaemon worker
└─1257 pvedaemon worker

Nov 18 16:00:11 pve pvedaemon[1256]: <root@pam> successful auth for user 'root@pam'
Nov 18 16:15:12 pve pvedaemon[1256]: <root@pam> successful auth for user 'root@pam'
Nov 18 16:30:12 pve pvedaemon[1255]: <root@pam> successful auth for user 'root@pam'
Nov 18 16:45:13 pve pvedaemon[1255]: <root@pam> successful auth for user 'root@pam'
Nov 18 17:00:14 pve pvedaemon[1257]: <root@pam> successful auth for user 'root@pam'
Nov 18 17:15:15 pve pvedaemon[1255]: <root@pam> successful auth for user 'root@pam'
Nov 18 17:30:16 pve pvedaemon[1256]: <root@pam> successful auth for user 'root@pam'
Nov 18 18:16:31 pve pvedaemon[1255]: <root@pam> successful auth for user 'root@pam'
Nov 18 18:31:32 pve pvedaemon[1255]: <root@pam> successful auth for user 'root@pam'
Nov 18 18:46:32 pve pvedaemon[1257]: <root@pam> successful auth for user 'root@pam'

root@pve:~# systemctl status pveproxy.service
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-11-18 08:38:05 PST; 1 day 1h ago
Process: 1258 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 1293 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Process: 17677 ExecReload=/usr/bin/pveproxy restart (code=exited, status=0/SUCCESS)
Main PID: 1963 (pveproxy)
Tasks: 4 (limit: 4915)
Memory: 142.9M
CGroup: /system.slice/pveproxy.service
├─ 1963 pveproxy
├─19392 pveproxy worker
├─19393 pveproxy worker
└─19394 pveproxy worker

Nov 19 00:00:10 pve pveproxy[1963]: starting 3 worker(s)
Nov 19 00:00:10 pve pveproxy[1963]: worker 19392 started
Nov 19 00:00:10 pve pveproxy[1963]: worker 19393 started
Nov 19 00:00:10 pve pveproxy[1963]: worker 19394 started
Nov 19 00:00:15 pve pveproxy[13136]: worker exit
Nov 19 00:00:15 pve pveproxy[3823]: worker exit
Nov 19 00:00:15 pve pveproxy[13262]: worker exit
Nov 19 00:00:16 pve pveproxy[1963]: worker 3823 finished
Nov 19 00:00:16 pve pveproxy[1963]: worker 13136 finished
Nov 19 00:00:16 pve pveproxy[1963]: worker 13262 finished

root@pve:~# systemctl status networking.service
● networking.service - Raise network interfaces
Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
Active: active (exited) since Wed 2020-11-18 08:37:56 PST; 1 day 1h ago
Docs: man:interfaces(5)
Process: 749 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)
Main PID: 749 (code=exited, status=0/SUCCESS)

Nov 18 08:37:54 pve systemd[1]: Starting Raise network interfaces...
Nov 18 08:37:55 pve ifup[749]: Waiting for vmbr0 to get ready (MAXWAIT is 2 seconds).
Nov 18 08:37:55 pve ifup[749]: Waiting for vmbr1 to get ready (MAXWAIT is 2 seconds).
Nov 18 08:37:56 pve ifup[749]: Waiting for vmbr2 to get ready (MAXWAIT is 2 seconds).
Nov 18 08:37:56 pve systemd[1]: Started Raise network interfaces.

root@pve:~# curl -s -k https://localhost:8006 | grep title
<title>pve - Proxmox Virtual Environment</title>
 
root@pve:~# curl -s -k https://localhost:8006 | grep title
<title>pve - Proxmox Virtual Environment</title>
is open locally, that means you need to check the firewall if enabled.

Have you tried to generate a new certificate?
Bash:
pvecm updatecerts --force
 
This forcing of Certs still has not kept the server from needing to be forcefully rebooted daily.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!