[SOLVED] Proxmox VE crashes randomly since upgrade

ZyX

New Member
Aug 15, 2023
4
1
3
Hello,

Since I switched to Proxmox VE 8, my host crashes randomly (sometimes once a day, other times once every 3 days).

Initially, I upgraded by changing the repositories as in the following procedure: https://pve.proxmox.com/wiki/Upgrade_from_7_to_8#In-place_upgrade, but after 2-3 weeks of crashes I thought I'd done the upgrade wrong, so I completely reinstalled Proxmox VE 8 from the ISO, but the crashes were still there.

Following this complete reinstallation I didn't reactivate any particular option (automatic backup or GPU Passthrought) to make sure they weren't also the cause of the crashes, but obviously this isn't the case as it continues to crash.

I'd also like to point out that during the crashes, the fans and LEDs are still working (so the PC hasn't completely shut down) and when I plug in a monitor, I get the proxmox CLI, but it's totally frozen (no interaction possible).

I've also made a bash script run in nohup with an infinite loop that writes the time to a file every minute to see if some processes continue to run after the crash, but the script also stops at the moment of the crash.

In addition you will find the syslog file with the logs 1 minute before the crash.

I hope someone can help me, thank you in advance :)
 

Attachments

  • syslog.txt
    71.7 KB · Views: 12
Hello,

Thank you for the syslog!

Can you please also provide us with the output of `pveversion -v`?

Have you booted from an older kernel?

I would also check if any new updates for the BIOS or Firmware devices
 
Hello,

Thank you for the syslog!

Can you please also provide us with the output of `pveversion -v`?

Have you booted from an older kernel?

I would also check if any new updates for the BIOS or Firmware devices

Hello,

Here is the pveversion -v :

Code:
proxmox-ve: 8.0.2 (running kernel: 6.2.16-6-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
proxmox-kernel-6.2.16-6-pve: 6.2.16-7
proxmox-kernel-6.2: 6.2.16-7
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.4
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.7
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-4
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

I boot with kernel version 6.2.16-6 :

Code:
Linux version 6.2.16-6-pve (fgruenbichler@yuna) (gcc (Debian 12.2.0-14) 12.2>

I don't have the latest version of the BIOS, I'll update it and keep you informed :)
 
Last edited:
Please try to boot from older kernel if you can, plus checking if there is any available update for the BIOS.
 
Please try to boot from older kernel if you can, plus checking if there is any available update for the BIOS.
I'm going to update the BIOS first and wait a few days to see if there are any more crashes. If there are still crashes, I'll try booting with an older version of the kernel.

I'll keep the topic updated if anything new comes up, thanks for the help! :)
 
  • Like
Reactions: Moayad
Same here on a R740xd with an A2000 - it never happened before on Proxmox 7, but now does: the VM suddenly hangs (no further logs), CPU pegged 100%. A hard stop of the VM + a restart of the VM fixes the issue temporarily. It occurs daily.

The following can be observed in the host's logs at the time of the crash:

Code:
Aug 24 09:22:02 r740xd kernel: kvm_msr_ignored_check: 5 callbacks suppressed
Aug 24 09:22:02 r740xd kernel: kvm [2736995]: ignored rdmsr: 0x4e data 0x0
Aug 24 09:22:02 r740xd kernel: kvm [2736995]: ignored wrmsr: 0x4e data 0x2
Aug 24 09:22:02 r740xd kernel: kvm [2736995]: ignored rdmsr: 0x4e data 0x0
Aug 24 09:22:02 r740xd kernel: kvm [2736995]: ignored rdmsr: 0x1c9 data 0x0
Aug 24 09:22:02 r740xd kernel: kvm [2736995]: ignored wrmsr: 0x1c9 data 0x3
Aug 24 09:22:02 r740xd kernel: kvm [2736995]: ignored rdmsr: 0x1c9 data 0x0
Aug 24 09:22:02 r740xd kernel: kvm [2736995]: ignored rdmsr: 0x1a6 data 0x0
Aug 24 09:22:02 r740xd kernel: kvm [2736995]: ignored wrmsr: 0x1a6 data 0x11
Aug 24 09:22:02 r740xd kernel: kvm [2736995]: ignored rdmsr: 0x1a6 data 0x0
Aug 24 09:22:02 r740xd kernel: kvm [2736995]: ignored rdmsr: 0x1a7 data 0x0
Aug 24 09:22:09 r740xd kernel: kvm_msr_ignored_check: 17 callbacks suppressed
Aug 24 09:22:09 r740xd kernel: kvm [2736995]: ignored rdmsr: 0xc5 data 0x0
Aug 24 09:22:09 r740xd kernel: kvm [2736995]: ignored rdmsr: 0xc6 data 0x0
Aug 24 09:22:09 r740xd kernel: kvm [2736995]: ignored rdmsr: 0xc7 data 0x0
Aug 24 09:22:09 r740xd kernel: kvm [2736995]: ignored rdmsr: 0xc8 data 0x0
Aug 24 09:22:09 r740xd kernel: kvm [2736995]: ignored rdmsr: 0x18a data 0x0
Aug 24 09:22:09 r740xd kernel: kvm [2736995]: ignored rdmsr: 0x18b data 0x0
Aug 24 09:22:09 r740xd kernel: kvm [2736995]: ignored rdmsr: 0x18c data 0x0
Aug 24 09:22:09 r740xd kernel: kvm [2736995]: ignored rdmsr: 0x18d data 0x0
Aug 24 09:22:09 r740xd kernel: kvm [2736995]: ignored rdmsr: 0xb0 data 0x0
Aug 24 09:22:09 r740xd kernel: kvm [2736995]: ignored rdmsr: 0xb1 data 0x0

Code:
proxmox-ve: 8.0.2 (running kernel: 6.2.16-8-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.2.16-8-pve: 6.2.16-8
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.4
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.7
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-4
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

No kernel stack traces sadly.
 
Last edited:
Hello,

The problem does not seem to have recurred after more than a week following the BIOS update. I will continue to monitor for 1 more week and mark the issue as resolved if there has been no further crash :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!