[SOLVED] KERNEL PANIC without any load

abhi11verma

New Member
May 23, 2024
17
1
3
System Config :
I am using an old HP Pavilion laptop

Ram : 8GB
Processor : i7
HDD : 1TB

Proxmox version : 8.2.2
kernel : 6.8.4-3

After restart system runs fine for sometime and then stops responding, some kind of kernel panic.

Is there a way to know what's going wrong.
Where to start debugging ?

There is only 1 lxc container running connecting cloudflare tunnel

`pveversion --verbose` output

Code:
proxmox-ve: 8.2.0 (running kernel: 6.8.4-3-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.4-3
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.2
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.2-1
proxmox-backup-file-restore: 3.2.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.1.10
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.7
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2
 

Attachments

  • 20240529_211119.jpg
    20240529_211119.jpg
    988.9 KB · Views: 22
  • IMG-20240530-WA0001.jpg
    IMG-20240530-WA0001.jpg
    768.2 KB · Views: 16
Last edited:
Start diagnosing by shutting-down LXC (or better ; disable Start on boot for LXC & reboot server) & see if you still get another Kernel panic.
Checked this, and yes it crashed.
Left it over night with just proxmox running and it broke.
 
We've been running our test cluster for over 9 days without any crashes (after being affected about 3 weeks ago). Here are the details of our setup:
  • Kernel: Linux 6.8.4-3-pve
  • PVE Manager Version: pve-manager/8.2.2/9355359cd7afbae4
  • Hardware:
    • Nodes:
      • 2 nodes with AMD EPYC 7313P processors (IOMMU enabled)
      • 1 node with an Intel E5 1650v3 processor (IOMMU enabled)
  • Storage:
    • Ceph with 4 NVMe OSDs per node
    • NVMe storage via PCIe adapter with bifurcation
  • Other Details:
    • Partially using NVMe drives as boot disks
    • One AMD node is heavily utilizing PCIe passthrough
So far, this setup has proven to be stable and reliable.
 
Last edited:
Try pinning to kernel 6.5 & retest node. My hunch - it will not panic. IMBW.

See wiki.

Kernel did panic on load, will try without load once.

Not sure if its a hardware related issue or not, as the system is quite old and not used for a while now.
memtest was fine.
HDD SMART test also healthy
 

Attachments

Update , surprisingly i am able to run a windows VM for 8 hours straight without any kernel panic.
Observed a high I/O delay > 25% sometimes - could be because of an old spinning HDD. Will be replacing it with SSD

Will keep the thread updated.

Any pointers - will be helpful
 
Update :
I am able to run straight for 1+days now , even with some load
1 container
1 vm
and a true NAS VM


Kernel : 6.5

Note : I was getting an error message before kernel panic
`usb 1-1.7: reset full -speed USB device number 4 using ehci-pci`

Have disabled autosuspend for this device and since then its running completely fine(seems). Still need to confirm if this was the cause of kernel panic or not.
The device is the broadcom bluetooth adapter, connected to usb


1717733912125.png
 
Last edited:
Update:

I have changed the spinning HDD to SSD.
Reinstalled the proxmox on SSD.

I was initially getting kernel panics still after this change, tried using kernerl 6.5 and 6.8 both.

-----
Finally it seems working

Have removed the HDD from proxmox, though still connected on sata port on drive bay in laptop.

Probably solution :
Disabling autosuspend usb for the broadcom webcam and using kernel : Linux 6.8.4-3-pve , (could be faulty hdd replacement also solved the issue)

Will close the issue for now. Thanks for all the help!!
 
  • Like
Reactions: gfngfn256

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!