Random reboots with PVE 8

pma66271

Member
Oct 31, 2021
12
0
21
20
Hi community,

I just updated my PVE node from version 7 to 8 two weeks ago.
The server was running for years without major issues, but now I see some random reboots (every few days to every few hours).

I can see no reason for the reboots in the logs.
Machine is ~ 2,5 years old and Memtest passed ok. Only running a few VMs (Debian, Windows, Home Assistant - all updated) with low server load.

Have one of you guys an idea how I can get a hint to the cause of the reboots?
Would it help to switch back to Kernel 5.15, that I used before the upgrade or is it not compatible with pre 8?

The reboot happened today ~12:55. Left side: syslog from web management, right side: /var/log/syslog
SCR-20230905-qizc.png

pveversion -v
Code:
pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-10-pve)
root@pve1:~# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-10-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-4
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
proxmox-kernel-6.2: 6.2.16-10
proxmox-kernel-6.2.16-8-pve: 6.2.16-8
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.107-1-pve: 5.15.107-1
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-5
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

journalctl --list-boots
I updated on august 21, reboots before have been triggered manually (upgrades), reboots after happened randomly
SCR-20230905-qnpl.png
 
Hi,
I've deployed a new server in june (12 x Intel(R) Xeon(R) E-2386G CPU, 128GB RAM) and I have exactly same behaviour.
My previous server was running for at least 3 years without reboot...

root@adv1:~# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-10-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
proxmox-kernel-6.2: 6.2.16-12
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
pve-kernel-5.15.108-1-pve: 5.15.108-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.7
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-5
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

Rebooting randomly...

root@adv1:~# journalctl --list-boots
IDX BOOT ID FIRST ENTRY LAST ENTRY
-12 75d1d6a4aec4408f9614e4f5e732a162 Mon 2023-07-17 09:33:55 CEST Thu 2023-07-20 00:55:00 CEST
-11 bae052ce2e854c12ba1133ad87000b0b Thu 2023-07-20 00:59:16 CEST Fri 2023-07-21 05:16:24 CEST
-10 f9d75f7cda704177bfd5a73fcc74f870 Fri 2023-07-21 05:19:39 CEST Mon 2023-07-24 10:55:03 CEST
-9 97c1b58a33094c4fa98be84394e21380 Mon 2023-07-24 11:16:38 CEST Thu 2023-07-27 14:51:10 CEST
-8 3e892170d3db49a8b7fea16ec548fb54 Thu 2023-07-27 14:54:41 CEST Thu 2023-08-10 03:26:24 CEST
-7 6ee5c0a36da248a09ecacb9914ec6484 Thu 2023-08-10 03:29:50 CEST Fri 2023-08-11 22:29:33 CEST
-6 3b5d50267a0f4ba8a51654872fa6f22f Fri 2023-08-11 22:32:54 CEST Fri 2023-08-25 18:18:59 CEST
-5 165d66ba50054675bea0c82b2bc68a82 Fri 2023-08-25 18:22:23 CEST Fri 2023-08-25 21:47:49 CEST
-4 9fcb4d44491f45638100227caa12f44f Fri 2023-08-25 21:50:05 CEST Mon 2023-08-28 18:54:35 CEST
-3 92bdbb1b45c74443ae5c3e784ac413ac Mon 2023-08-28 18:59:02 CEST Sat 2023-09-02 09:30:02 CEST
-2 e55d97b640fa4a0abd319313ca0c7eb0 Sat 2023-09-02 09:33:28 CEST Mon 2023-09-04 02:34:11 CEST
-1 5f4e9baf1c23459f97ae1f98b23abd28 Mon 2023-09-04 02:49:58 CEST Mon 2023-09-04 20:37:12 CEST
0 48005d84f542443193a0505bf1f75272 Mon 2023-09-04 20:42:41 CEST Thu 2023-09-07 21:48:10 CEST
Someone else ? Something to try ?
 
Hi,

I also updated to PVE8 some weeks ago. At first only one node out of 4 was rebooting now and then. I migrated all VMs and lxc to another node as a second one also started to reboot randomly. I do not suspect a hardware failure as two nodes are affected by the same problem. The idle node is now running without any problems. I don’t have enough time to investigate but I can do some tests over time if you wish…
 
Hi,

Same issue. I tried many things. RAM, SSD replacement. (Another OS works fine...)

The final solution is to pinn Kernel 5.15. in boot. :(

Code:
proxmox-boot-tool kernel pin 5.15.126-1-pve
 
Hi,

Same issue. I tried many things. RAM, SSD replacement. (Another OS works fine...)

The final solution is to pinn Kernel 5.15. in boot. :(

Code:
proxmox-boot-tool kernel pin 5.15.126-1-pve
Hi! How did you get such an old kernel? I'm running PVE 8.0.4 and the completely random reboots are getting quite annoying. Sadly, my only available kernel versions are:

Code:
pve-firmware/stable 3.9-1 all [upgradable from: 3.8-2]
  Binary firmware code for the pve-kernel

pve-kernel-6.1/stable 7.3-4 all
  Latest Proxmox VE Kernel Image

pve-kernel-6.1.10-1-pve/stable 6.1.10-1 amd64
  Proxmox Kernel Image

pve-kernel-6.2/stable,now 8.0.5 all [installed]
  Proxmox Kernel Image for 6.2 series (transitional package)

pve-kernel-6.2.16-1-pve/stable 6.2.16-1 amd64
  Proxmox Kernel Image

pve-kernel-6.2.16-2-pve/stable 6.2.16-2 amd64
  Proxmox Kernel Image

pve-kernel-6.2.16-3-pve/stable,now 6.2.16-3 amd64 [installed]
  Proxmox Kernel Image

pve-kernel-6.2.16-4-pve/stable 6.2.16-5 amd64
  Proxmox Kernel Image

pve-kernel-6.2.16-5-pve/stable 6.2.16-6 amd64
  Proxmox Kernel Image
 
Hi Pierre,

- Open Shell windows.

- First intall the kernel:
Code:
apt install pve-kernel-5.15

- List installed kernels:
Code:
proxmox-boot-tool kernel list

- Pin startup kernel:
Code:
proxmox-boot-tool kernel pin 5.15.116-1-pve
or
Code:
proxmox-boot-tool kernel pin 5.15.126-1-pve
what is in your list.

- Update grub:
Code:
update-grub

- Check the pin is OK:
Code:
proxmox-boot-tool kernel list

Result something like this:

Manually selected kernels:
None.

Automatically selected kernels:
5.15.126-1-pve
6.2.16-19-pve
6.5.11-4-pve

Pinned kernel:
5.15.126-1-pve
- Reboot...

Good luck!
 
Hi Pierre,

- Open Shell windows.

- First intall the kernel:
Code:
apt install pve-kernel-5.15

- List installed kernels:
Code:
proxmox-boot-tool kernel list

- Pin startup kernel:
Code:
proxmox-boot-tool kernel pin 5.15.116-1-pve
or
Code:
proxmox-boot-tool kernel pin 5.15.126-1-pve
what is in your list.

- Update grub:
Code:
update-grub

- Check the pin is OK:
Code:
proxmox-boot-tool kernel list

Result something like this:


- Reboot...

Good luck!
Theoretically, that would be great - sadly none of my apt sources provide that kernel version. Is that limited to the enterprise repository server?
I'm currently using

Code:
deb http://ftp.debian.org/debian bookworm main contrib
deb http://ftp.debian.org/debian bookworm-updates main contrib
deb http://security.debian.org/debian-security bookworm-security main contrib

deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription

deb http://download.proxmox.com/debian/ceph-quincy bookworm no-subscription
 
Hi,
Last kernel upgrade seems to solve issue : Linux 6.5.11-4-pve (2023-11-20T10:19Z)
Now 10 days uptime, never reach that since server deployment in July ! I'm a little more confident now.
Hope issue is solved.
 
Hi,
Last kernel upgrade seems to solve issue : Linux 6.5.11-4-pve (2023-11-20T10:19Z)
Now 10 days uptime, never reach that since server deployment in July ! I'm a little more confident now.
Hope issue is solved.

Hi,

The 6.5.11-4, and the 6.5.11-7 didn't fix the error for me.

I had to go back to the 5.15.126-1-pve version. It is the last stabil on my system.