Constant Kernel Panics on PVE 9 fresh install

pakyrs

Active Member
Jan 12, 2020
27
0
41
45
Hi All,

I keep on getting Kernel Panics and Hangs with the Latest PVE 9 freshly installed on my server.



1763133713208.png

Journalctl doesn't give me any info, this is the log from an hour ago, altough it crashed 10min ago, I have omitted the new boot logs.

Bash:
journalctl --since "1 hour ago"
Nov 14 14:35:01 nibbler CRON[78122]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 14:35:01 nibbler CRON[78124]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 14:35:01 nibbler CRON[78122]: pam_unix(cron:session): session closed for user root
Nov 14 14:45:01 nibbler CRON[82544]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 14:45:01 nibbler CRON[82546]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 14:45:01 nibbler CRON[82544]: pam_unix(cron:session): session closed for user root
Nov 14 14:55:01 nibbler CRON[86916]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 14:55:01 nibbler CRON[86918]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 14:55:01 nibbler CRON[86916]: pam_unix(cron:session): session closed for user root
Nov 14 14:59:21 nibbler pvedaemon[3106]: <root@pam> successful auth for user 'root@pam'
Nov 14 14:59:22 nibbler pvedaemon[3106]: <root@pam> starting task UPID:nibbler:00015AF7:00106197:691743CA:vncproxy:107:root@pam:
Nov 14 14:59:22 nibbler pvedaemon[88823]: starting vnc proxy UPID:nibbler:00015AF7:00106197:691743CA:vncproxy:107:root@pam:
Nov 14 14:59:35 nibbler pvedaemon[3107]: worker exit
Nov 14 14:59:37 nibbler pvedaemon[3105]: worker 3107 finished
Nov 14 14:59:37 nibbler pvedaemon[3105]: starting 1 worker(s)
Nov 14 14:59:37 nibbler pvedaemon[3105]: worker 88919 started
Nov 14 15:00:12 nibbler pvedaemon[3108]: worker exit
Nov 14 15:00:12 nibbler pvedaemon[3105]: worker 3108 finished
Nov 14 15:00:12 nibbler pvedaemon[3105]: starting 1 worker(s)
Nov 14 15:00:12 nibbler pvedaemon[3105]: worker 89181 started
Nov 14 15:00:38 nibbler pvedaemon[3106]: worker exit
Nov 14 15:00:38 nibbler pvedaemon[3105]: worker 3106 finished
Nov 14 15:00:38 nibbler pvedaemon[3105]: starting 1 worker(s)
Nov 14 15:00:38 nibbler pvedaemon[3105]: worker 89355 started
Nov 14 15:00:39 nibbler smartd[2598]: Device: /dev/sde [SAT], SMART Usage Attribute: 204 Soft_ECC_Correction changed from 95 to 96
Nov 14 15:05:01 nibbler CRON[91763]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 15:05:01 nibbler CRON[91765]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 15:05:01 nibbler CRON[91763]: pam_unix(cron:session): session closed for user root
Nov 14 15:05:04 nibbler pvedaemon[88919]: <root@pam> update VM 107: -balloon 4096 -delete shares -memory 10240
Nov 14 15:05:04 nibbler pvedaemon[88919]: cannot delete 'shares' - not set in current configuration!
 
Hi!
Journalctl doesn't give me any info, this is the log from an hour ago, altough it crashed 10min ago, I have omitted the new boot logs.
journalctl with PAGER set will usually open in that pager (typically it's less) and therefore you need to page down for journalctl --since="1 hour ago". For a better view of the log from the end of the last boot, you can run journalctl -b -1 -e. What is the output there?
 
Either way, I'd first check if your hardware is alright, e.g. all cables and the RAM modules are connected correctly and aren't loose, there BIOS configuration is reset (no overlocking, etc.), and so forth. If it doesn't fix the problem, then a full boot log would be beneficial in finding the cause of the problem.
 
  • Like
Reactions: Kingneutron
Bash:
journalctl -b -1 -e
Nov 14 13:27:06 nibbler pveproxy[3121]: worker 47899 started
Nov 14 13:27:07 nibbler pveproxy[47898]: worker exit
Nov 14 13:27:46 nibbler pveproxy[3121]: worker 19895 finished
Nov 14 13:27:46 nibbler pveproxy[3121]: starting 1 worker(s)
Nov 14 13:27:46 nibbler pveproxy[3121]: worker 48239 started
Nov 14 13:27:49 nibbler pveproxy[48237]: got inotify poll request in wrong process - disabling inotify
Nov 14 13:30:39 nibbler smartd[2598]: Device: /dev/sdd [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 63 to 62
Nov 14 13:30:39 nibbler smartd[2598]: Device: /dev/sde [SAT], SMART Usage Attribute: 204 Soft_ECC_Correction changed from 93 to 94
Nov 14 13:32:41 nibbler pvedaemon[3108]: <root@pam> successful auth for user 'root@pam'
Nov 14 13:35:01 nibbler CRON[51405]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 13:35:01 nibbler CRON[51407]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 13:35:01 nibbler CRON[51405]: pam_unix(cron:session): session closed for user root
Nov 14 13:37:29 nibbler pveproxy[25196]: worker exit
Nov 14 13:37:29 nibbler pveproxy[3121]: worker 25196 finished
Nov 14 13:37:29 nibbler pveproxy[3121]: starting 1 worker(s)
Nov 14 13:37:29 nibbler pveproxy[3121]: worker 52516 started
Nov 14 13:44:40 nibbler pvedaemon[3106]: <root@pam> end task UPID:nibbler:000066ED:00040021:69172418:vncproxy:107:root@pam: OK
Nov 14 13:44:40 nibbler pveproxy[48237]: worker exit
Nov 14 13:45:01 nibbler CRON[55986]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 13:45:01 nibbler CRON[55991]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 13:45:01 nibbler CRON[55986]: pam_unix(cron:session): session closed for user root
Nov 14 13:55:01 nibbler CRON[60284]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 13:55:01 nibbler CRON[60286]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 13:55:01 nibbler CRON[60284]: pam_unix(cron:session): session closed for user root
Nov 14 14:00:39 nibbler smartd[2598]: Device: /dev/sdd [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 62 to 63
Nov 14 14:00:39 nibbler smartd[2598]: Device: /dev/sde [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 26 to 25
Nov 14 14:00:39 nibbler smartd[2598]: Device: /dev/sde [SAT], SMART Usage Attribute: 204 Soft_ECC_Correction changed from 94 to 95
Nov 14 14:05:01 nibbler CRON[64773]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 14:05:01 nibbler CRON[64775]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 14:05:01 nibbler CRON[64773]: pam_unix(cron:session): session closed for user root
Nov 14 14:15:01 nibbler CRON[69076]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 14:15:01 nibbler CRON[69078]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 14:15:01 nibbler CRON[69076]: pam_unix(cron:session): session closed for user root
Nov 14 14:17:01 nibbler CRON[69991]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 14:17:01 nibbler CRON[69993]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 14 14:17:01 nibbler CRON[69991]: pam_unix(cron:session): session closed for user root
Nov 14 14:25:01 nibbler CRON[73752]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 14:25:01 nibbler CRON[73754]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 14:25:01 nibbler CRON[73752]: pam_unix(cron:session): session closed for user root
Nov 14 14:35:01 nibbler CRON[78122]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 14:35:01 nibbler CRON[78124]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 14:35:01 nibbler CRON[78122]: pam_unix(cron:session): session closed for user root
Nov 14 14:45:01 nibbler CRON[82544]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 14:45:01 nibbler CRON[82546]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 14:45:01 nibbler CRON[82544]: pam_unix(cron:session): session closed for user root
Nov 14 14:55:01 nibbler CRON[86916]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 14:55:01 nibbler CRON[86918]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 14:55:01 nibbler CRON[86916]: pam_unix(cron:session): session closed for user root
Nov 14 14:59:21 nibbler pvedaemon[3106]: <root@pam> successful auth for user 'root@pam'
Nov 14 14:59:22 nibbler pvedaemon[3106]: <root@pam> starting task UPID:nibbler:00015AF7:00106197:691743CA:vncproxy:107:root@pam:
Nov 14 14:59:22 nibbler pvedaemon[88823]: starting vnc proxy UPID:nibbler:00015AF7:00106197:691743CA:vncproxy:107:root@pam:
Nov 14 14:59:35 nibbler pvedaemon[3107]: worker exit
Nov 14 14:59:37 nibbler pvedaemon[3105]: worker 3107 finished
Nov 14 14:59:37 nibbler pvedaemon[3105]: starting 1 worker(s)
Nov 14 14:59:37 nibbler pvedaemon[3105]: worker 88919 started
Nov 14 15:00:12 nibbler pvedaemon[3108]: worker exit
Nov 14 15:00:12 nibbler pvedaemon[3105]: worker 3108 finished
Nov 14 15:00:12 nibbler pvedaemon[3105]: starting 1 worker(s)
Nov 14 15:00:12 nibbler pvedaemon[3105]: worker 89181 started
Nov 14 15:00:38 nibbler pvedaemon[3106]: worker exit
Nov 14 15:00:38 nibbler pvedaemon[3105]: worker 3106 finished
Nov 14 15:00:38 nibbler pvedaemon[3105]: starting 1 worker(s)
Nov 14 15:00:38 nibbler pvedaemon[3105]: worker 89355 started
Nov 14 15:00:39 nibbler smartd[2598]: Device: /dev/sde [SAT], SMART Usage Attribute: 204 Soft_ECC_Correction changed from 95 to 96
Nov 14 15:05:01 nibbler CRON[91763]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 14 15:05:01 nibbler CRON[91765]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 14 15:05:01 nibbler CRON[91763]: pam_unix(cron:session): session closed for user root
Nov 14 15:05:04 nibbler pvedaemon[88919]: <root@pam> update VM 107: -balloon 4096 -delete shares -memory 10240
Nov 14 15:05:04 nibbler pvedaemon[88919]: cannot delete 'shares' - not set in current configuration!

Thanks for stepping in, I removed basically every device attached to the system that I thought could add a layer of problems. I also tested the ram with memtest86 and it is sound.

I don't see anything in the logs that point me to that. Before I reinstalled proxmox fresh, I had a kdump from previous crash they are rather large not sure how to share these.
 
Today crashed with a different screen, mostly on PiKVM it just hangs, sometimes purple kernel panic, today a new one!

1763306972904.png
 
It would be helpful for the kernel stacktrace to be a bit longer, e.g. upload a whole boot and crash log (journalctl can also export a log from a whole boot). I can only see that ZFS seems involved, maybe some ZFS I/O worker panicked? Which CPU is installed on that system?
 
  • Like
Reactions: news
It would be helpful for the kernel stacktrace to be a bit longer, e.g. upload a whole boot and crash log (journalctl can also export a log from a whole boot). I can only see that ZFS seems involved, maybe some ZFS I/O worker panicked? Which CPU is installed on that system?
Screen was frozen so couldn't scroll and only be able to grab that screen. CPU is AMD Ryzen 7 5700U with Radeon Graphics. Full boot journalctl below:

Bash:
journalctl --list-boots
IDX BOOT ID                          FIRST ENTRY                 LAST ENTRY
 -7 85dcc8a8ae044501943ca199c8eeb69c Fri 2025-11-14 11:14:04 GMT Fri 2025-11-14 11:54:47 GMT
 -6 c664f7d059de4e858ca07ec22e98a4d4 Fri 2025-11-14 11:55:13 GMT Fri 2025-11-14 11:57:07 GMT
 -5 8d52fad641f5441297dd4fb5b20658bf Fri 2025-11-14 12:00:32 GMT Fri 2025-11-14 15:05:04 GMT
 -4 7285d92a6c8046d28836c2e44763cc0c Fri 2025-11-14 15:17:30 GMT Sat 2025-11-15 01:01:39 GMT
 -3 e8945a02c8d74b8e893ae622ee1297b4 Sat 2025-11-15 07:00:24 GMT Sun 2025-11-16 01:01:43 GMT
 -2 4ba997d4d49c4298937fd85fb41c866a Sun 2025-11-16 07:00:24 GMT Sun 2025-11-16 08:09:18 GMT
 -1 d7af3dc083974ef8ac0f2a3a84e09726 Sun 2025-11-16 15:45:43 GMT Mon 2025-11-17 01:01:40 GMT
  0 f63db914583143f280d5f0f56ced8b82 Mon 2025-11-17 07:00:24 GMT Mon 2025-11-17 09:27:51 GMT

Based on the above, the multiple daily entries is when I would have had to hard stop it and start again which logs are attached here.

Thanks for helping!

P
 

Attachments

Hm, unfortunately it doesn't seem like there's anything valuable in those logs either. I guess that 4ba997d4d49c4298937fd85fb41c866a was the boot which ended in the kernel stack trace above (at least from the timestamps and the abrupt end of log)? As this seems block-related, you could try to setup a netconsole, which is the easiest way to retrieve a stack trace which isn't being written on disk. See here [0] and here [1] for more details.

I could only see that you run zfs 2.3.3-pve1, does this also happen with zfs 2.3.4-pve1?

[0] https://pve.proxmox.com/wiki/Kernel_Crash_Trace_Log
[1] https://www.kernel.org/doc/Documentation/networking/netconsole.txt