Random reboots with PVE 8

pma66271

Member
Oct 31, 2021
10
0
6
19
Hi community,

I just updated my PVE node from version 7 to 8 two weeks ago.
The server was running for years without major issues, but now I see some random reboots (every few days to every few hours).

I can see no reason for the reboots in the logs.
Machine is ~ 2,5 years old and Memtest passed ok. Only running a few VMs (Debian, Windows, Home Assistant - all updated) with low server load.

Have one of you guys an idea how I can get a hint to the cause of the reboots?
Would it help to switch back to Kernel 5.15, that I used before the upgrade or is it not compatible with pre 8?

The reboot happened today ~12:55. Left side: syslog from web management, right side: /var/log/syslog
SCR-20230905-qizc.png

pveversion -v
Code:
pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-10-pve)
root@pve1:~# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-10-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-4
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
proxmox-kernel-6.2: 6.2.16-10
proxmox-kernel-6.2.16-8-pve: 6.2.16-8
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.107-1-pve: 5.15.107-1
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-5
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

journalctl --list-boots
I updated on august 21, reboots before have been triggered manually (upgrades), reboots after happened randomly
SCR-20230905-qnpl.png
 
Hi,
I've deployed a new server in june (12 x Intel(R) Xeon(R) E-2386G CPU, 128GB RAM) and I have exactly same behaviour.
My previous server was running for at least 3 years without reboot...

root@adv1:~# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-10-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
proxmox-kernel-6.2: 6.2.16-12
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
pve-kernel-5.15.108-1-pve: 5.15.108-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.7
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-5
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

Rebooting randomly...

root@adv1:~# journalctl --list-boots
IDX BOOT ID FIRST ENTRY LAST ENTRY
-12 75d1d6a4aec4408f9614e4f5e732a162 Mon 2023-07-17 09:33:55 CEST Thu 2023-07-20 00:55:00 CEST
-11 bae052ce2e854c12ba1133ad87000b0b Thu 2023-07-20 00:59:16 CEST Fri 2023-07-21 05:16:24 CEST
-10 f9d75f7cda704177bfd5a73fcc74f870 Fri 2023-07-21 05:19:39 CEST Mon 2023-07-24 10:55:03 CEST
-9 97c1b58a33094c4fa98be84394e21380 Mon 2023-07-24 11:16:38 CEST Thu 2023-07-27 14:51:10 CEST
-8 3e892170d3db49a8b7fea16ec548fb54 Thu 2023-07-27 14:54:41 CEST Thu 2023-08-10 03:26:24 CEST
-7 6ee5c0a36da248a09ecacb9914ec6484 Thu 2023-08-10 03:29:50 CEST Fri 2023-08-11 22:29:33 CEST
-6 3b5d50267a0f4ba8a51654872fa6f22f Fri 2023-08-11 22:32:54 CEST Fri 2023-08-25 18:18:59 CEST
-5 165d66ba50054675bea0c82b2bc68a82 Fri 2023-08-25 18:22:23 CEST Fri 2023-08-25 21:47:49 CEST
-4 9fcb4d44491f45638100227caa12f44f Fri 2023-08-25 21:50:05 CEST Mon 2023-08-28 18:54:35 CEST
-3 92bdbb1b45c74443ae5c3e784ac413ac Mon 2023-08-28 18:59:02 CEST Sat 2023-09-02 09:30:02 CEST
-2 e55d97b640fa4a0abd319313ca0c7eb0 Sat 2023-09-02 09:33:28 CEST Mon 2023-09-04 02:34:11 CEST
-1 5f4e9baf1c23459f97ae1f98b23abd28 Mon 2023-09-04 02:49:58 CEST Mon 2023-09-04 20:37:12 CEST
0 48005d84f542443193a0505bf1f75272 Mon 2023-09-04 20:42:41 CEST Thu 2023-09-07 21:48:10 CEST
Someone else ? Something to try ?
 
Hi,

I also updated to PVE8 some weeks ago. At first only one node out of 4 was rebooting now and then. I migrated all VMs and lxc to another node as a second one also started to reboot randomly. I do not suspect a hardware failure as two nodes are affected by the same problem. The idle node is now running without any problems. I don’t have enough time to investigate but I can do some tests over time if you wish…
 
Hi,

Same issue. I tried many things. RAM, SSD replacement. (Another OS works fine...)

The final solution is to pinn Kernel 5.15. in boot. :(

Code:
proxmox-boot-tool kernel pin 5.15.126-1-pve
 
Hi,

Same issue. I tried many things. RAM, SSD replacement. (Another OS works fine...)

The final solution is to pinn Kernel 5.15. in boot. :(

Code:
proxmox-boot-tool kernel pin 5.15.126-1-pve
Hi! How did you get such an old kernel? I'm running PVE 8.0.4 and the completely random reboots are getting quite annoying. Sadly, my only available kernel versions are:

Code:
pve-firmware/stable 3.9-1 all [upgradable from: 3.8-2]
  Binary firmware code for the pve-kernel

pve-kernel-6.1/stable 7.3-4 all
  Latest Proxmox VE Kernel Image

pve-kernel-6.1.10-1-pve/stable 6.1.10-1 amd64
  Proxmox Kernel Image

pve-kernel-6.2/stable,now 8.0.5 all [installed]
  Proxmox Kernel Image for 6.2 series (transitional package)

pve-kernel-6.2.16-1-pve/stable 6.2.16-1 amd64
  Proxmox Kernel Image

pve-kernel-6.2.16-2-pve/stable 6.2.16-2 amd64
  Proxmox Kernel Image

pve-kernel-6.2.16-3-pve/stable,now 6.2.16-3 amd64 [installed]
  Proxmox Kernel Image

pve-kernel-6.2.16-4-pve/stable 6.2.16-5 amd64
  Proxmox Kernel Image

pve-kernel-6.2.16-5-pve/stable 6.2.16-6 amd64
  Proxmox Kernel Image
 
Hi Pierre,

- Open Shell windows.

- First intall the kernel:
Code:
apt install pve-kernel-5.15

- List installed kernels:
Code:
proxmox-boot-tool kernel list

- Pin startup kernel:
Code:
proxmox-boot-tool kernel pin 5.15.116-1-pve
or
Code:
proxmox-boot-tool kernel pin 5.15.126-1-pve
what is in your list.

- Update grub:
Code:
update-grub

- Check the pin is OK:
Code:
proxmox-boot-tool kernel list

Result something like this:

Manually selected kernels:
None.

Automatically selected kernels:
5.15.126-1-pve
6.2.16-19-pve
6.5.11-4-pve

Pinned kernel:
5.15.126-1-pve
- Reboot...

Good luck!
 
Hi Pierre,

- Open Shell windows.

- First intall the kernel:
Code:
apt install pve-kernel-5.15

- List installed kernels:
Code:
proxmox-boot-tool kernel list

- Pin startup kernel:
Code:
proxmox-boot-tool kernel pin 5.15.116-1-pve
or
Code:
proxmox-boot-tool kernel pin 5.15.126-1-pve
what is in your list.

- Update grub:
Code:
update-grub

- Check the pin is OK:
Code:
proxmox-boot-tool kernel list

Result something like this:


- Reboot...

Good luck!
Theoretically, that would be great - sadly none of my apt sources provide that kernel version. Is that limited to the enterprise repository server?
I'm currently using

Code:
deb http://ftp.debian.org/debian bookworm main contrib
deb http://ftp.debian.org/debian bookworm-updates main contrib
deb http://security.debian.org/debian-security bookworm-security main contrib

deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription

deb http://download.proxmox.com/debian/ceph-quincy bookworm no-subscription
 
Hi,
Last kernel upgrade seems to solve issue : Linux 6.5.11-4-pve (2023-11-20T10:19Z)
Now 10 days uptime, never reach that since server deployment in July ! I'm a little more confident now.
Hope issue is solved.
 
Hi,
Last kernel upgrade seems to solve issue : Linux 6.5.11-4-pve (2023-11-20T10:19Z)
Now 10 days uptime, never reach that since server deployment in July ! I'm a little more confident now.
Hope issue is solved.

Hi,

The 6.5.11-4, and the 6.5.11-7 didn't fix the error for me.

I had to go back to the 5.15.126-1-pve version. It is the last stabil on my system.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!