Hi,
I have a reoccurring problem of my Proxmox server getting kernel panic and crashing. I have not been able to reproduce it in a consistent manner. Somethimes it can be two weeks between a crash sometimes less than 24 hours.
I have tried changing HW components like RAM, CPU, MB and so on. I think I have tried changing all components except the raid controller. I also did a fresh install of Proxmox and loaded the containers and VM from backup into the the fresh install.
I tried to enable crash dump, but after crashes I get no crash dumps in the /var/crash - I followed the steps and verification in this guide:
https://www.cyberciti.biz/faq/how-to-on-enable-kernel-crash-dump-on-debian-linux/
I have attached a photo of the kernel panic if anyone can make sense of it.
Any tips on how to debug this? Or how to get crash dump to create dumps?
To me the journal doesn't show anything usefull:
Thank you
I have a reoccurring problem of my Proxmox server getting kernel panic and crashing. I have not been able to reproduce it in a consistent manner. Somethimes it can be two weeks between a crash sometimes less than 24 hours.
I have tried changing HW components like RAM, CPU, MB and so on. I think I have tried changing all components except the raid controller. I also did a fresh install of Proxmox and loaded the containers and VM from backup into the the fresh install.
I tried to enable crash dump, but after crashes I get no crash dumps in the /var/crash - I followed the steps and verification in this guide:
https://www.cyberciti.biz/faq/how-to-on-enable-kernel-crash-dump-on-debian-linux/
I have attached a photo of the kernel panic if anyone can make sense of it.
Any tips on how to debug this? Or how to get crash dump to create dumps?
To me the journal doesn't show anything usefull:
Code:
Sep 21 23:09:07 pve kernel: audit: type=1400 audit(1695330547.115:129): apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13>
Sep 21 23:17:01 pve CRON[810666]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 21 23:17:01 pve CRON[810667]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 21 23:17:01 pve CRON[810666]: pam_unix(cron:session): session closed for user root
Sep 21 23:38:15 pve kernel: mce: [Hardware Error]: Machine check events logged
Sep 21 23:39:50 pve audit[906578]: AVC apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13 profile="lxc-107_</var/lib/lxc>">
Sep 21 23:39:50 pve kernel: audit: type=1400 audit(1695332390.318:130): apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13>
Sep 22 00:00:07 pve systemd[1]: Starting dpkg-db-backup.service - Daily dpkg database backup service...
Sep 22 00:00:07 pve systemd[1]: Starting logrotate.service - Rotate log files...
Sep 22 00:00:07 pve systemd[1]: dpkg-db-backup.service: Deactivated successfully.
Sep 22 00:00:07 pve systemd[1]: Finished dpkg-db-backup.service - Daily dpkg database backup service.
Sep 22 00:00:07 pve systemd[1]: Reloading pveproxy.service - PVE API Proxy Server...
Sep 22 00:00:07 pve pveproxy[986078]: send HUP to 3421
Sep 22 00:00:07 pve pveproxy[3421]: received signal HUP
Sep 22 00:00:07 pve pveproxy[3421]: server closing
Sep 22 00:00:07 pve pveproxy[3421]: server shutdown (restart)
Sep 22 00:00:07 pve systemd[1]: Reloaded pveproxy.service - PVE API Proxy Server.
Sep 22 00:00:07 pve systemd[1]: Reloading spiceproxy.service - PVE SPICE Proxy Server...
Sep 22 00:00:07 pve spiceproxy[986085]: send HUP to 3435
Sep 22 00:00:07 pve spiceproxy[3435]: received signal HUP
Sep 22 00:00:07 pve spiceproxy[3435]: server closing
Sep 22 00:00:07 pve spiceproxy[3435]: server shutdown (restart)
Sep 22 00:00:07 pve systemd[1]: Reloaded spiceproxy.service - PVE SPICE Proxy Server.
Sep 22 00:00:07 pve pvefw-logger[2283]: received terminate request (signal)
Sep 22 00:00:07 pve systemd[1]: Stopping pvefw-logger.service - Proxmox VE firewall logger...
Sep 22 00:00:07 pve pvefw-logger[2283]: stopping pvefw logger
Sep 22 00:00:07 pve spiceproxy[3435]: restarting server
Sep 22 00:00:07 pve spiceproxy[3435]: starting 1 worker(s)
Sep 22 00:00:07 pve spiceproxy[3435]: worker 986094 started
Sep 22 00:00:07 pve pveproxy[3421]: restarting server
Sep 22 00:00:07 pve pveproxy[3421]: starting 3 worker(s)
Sep 22 00:00:07 pve pveproxy[3421]: worker 986095 started
Sep 22 00:00:07 pve pveproxy[3421]: worker 986096 started
Sep 22 00:00:07 pve pveproxy[3421]: worker 986097 started
Sep 22 00:00:08 pve systemd[1]: pvefw-logger.service: Deactivated successfully.
Sep 22 00:00:08 pve systemd[1]: Stopped pvefw-logger.service - Proxmox VE firewall logger.
Sep 22 00:00:08 pve systemd[1]: Starting pvefw-logger.service - Proxmox VE firewall logger...
Sep 22 00:00:08 pve pvefw-logger[986104]: starting pvefw logger
Sep 22 00:00:08 pve systemd[1]: Started pvefw-logger.service - Proxmox VE firewall logger.
Sep 22 00:00:08 pve systemd[1]: logrotate.service: Deactivated successfully.
Sep 22 00:00:08 pve systemd[1]: Finished logrotate.service - Rotate log files.
Sep 22 00:00:12 pve spiceproxy[3436]: worker exit
Sep 22 00:00:12 pve spiceproxy[3435]: worker 3436 finished
Sep 22 00:00:12 pve pveproxy[241796]: worker exit
Sep 22 00:00:12 pve pveproxy[231159]: worker exit
Sep 22 00:00:12 pve pveproxy[257281]: worker exit
Sep 22 00:00:12 pve pveproxy[3421]: worker 231159 finished
Sep 22 00:00:12 pve pveproxy[3421]: worker 257281 finished
Sep 22 00:00:12 pve pveproxy[3421]: worker 241796 finished
-- Boot d3ba03c45a0d4a08926f1f6a037c0185 --
Sep 22 05:59:29 pve kernel: Linux version 6.2.16-12-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT>
Sep 22 05:59:29 pve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.16-12-pve root=/dev/mapper/pve-root ro nomodeset crashkernel=384M-:128M
Sep 22 05:59:29 pve kernel: KERNEL supported cpus:
Sep 22 05:59:29 pve kernel: Intel GenuineIntel
Sep 22 05:59:29 pve kernel: AMD AuthenticAMD
Sep 22 05:59:29 pve kernel: Hygon HygonGenuine
Sep 22 05:59:29 pve kernel: Centaur CentaurHauls
Sep 22 05:59:29 pve kernel: zhaoxin Shanghai
Sep 22 05:59:29 pve kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Sep 22 05:59:29 pve kernel: BIOS-provided physical RAM map:
Sep 22 05:59:29 pve kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000005dfff] usable
Thank you