Kernel panic related to storage?

jdruwe · Jul 3, 2020

Hey guys, I am yet again experiencing a kernel panic, this is what I was able to see on the screen that is directly configured to my NUC, I am not sure if there is another way to get a full log of this event happening:

Output from pveversion -v:

Bash:

root@pve:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-8
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

I am using an USB for backup, a Zigbee USB in pass-through mode for home automation and a Kingston A2000 M.2 NVMe SSD. I was thinking it was related to this post: https://forum.proxmox.com/threads/nvme-ssd-driver-or-kernel-problem.31845/ but that seems to be solved 3 years ago already. Can anyone help me from keeping my nuc from freezing?

t.lamprecht · Jul 3, 2020

jdruwe said:
Hey guys, I am yet again experiencing a kernel panic, this is what I was able to see on the screen that is directly configured to my NUC, I am not sure if there is another way to get a full log of this event happening:

You have IO errors on the dm-1 blockdevice, which is probably your root dev, the kernel panic is then probably just a result of those errors, not the error per se.

What are you using as main disk? Is smartctl showing any errors/problems? From those errors above it seems pretty faulty.

jdruwe · Jul 3, 2020

t.lamprecht said:
You have IO errors on the dm-1 blockdevice, which is probably your root dev, the kernel panic is then probably just a result of those errors, not the error per se.

What are you using as main disk? Is smartctl showing any errors/problems? From those errors above it seems pretty faulty.

I have 2 disks:

/dev/nvme0n1 is used for my main storage:

Code:

root@pve:~# smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.44-2-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       KINGSTON SA2000M8500G
Serial Number:                      50026B7282536DB4
Firmware Version:                   S5Z42105
PCI Vendor/Subsystem ID:            0x2646
IEEE OUI Identifier:                0x0026b7
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          500,107,862,016 [500 GB]
Namespace 1 Utilization:            44,674,641,920 [44.6 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            0026b7 282536db45
Local Time is:                      Fri Jul  3 10:54:24 2020 CEST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     75 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
0 +     9.00W       -        -    0  0  0  0        0       0
1 +     4.60W       -        -    1  1  1  1        0       0
2 +     3.80W       -        -    2  2  2  2        0       0
3 -   0.0450W       -        -    3  3  3  3     2000    2000
4 -   0.0040W       -        -    4  4  4  4    15000   15000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        23 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    4,545,172 [2.32 TB]
Data Units Written:                 1,149,848 [588 GB]
Host Read Commands:                 31,699,751
Host Write Commands:                50,846,125
Controller Busy Time:               2,074
Power Cycles:                       76
Power On Hours:                     2,387
Unsafe Shutdowns:                   40
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged

There seems to be no errors for the disk.

/dev/sda is used as vm/container backup

MarvinE · Nov 21, 2020

Hello,

it seams to be a problem with the Kingston A2000 M.2 NVMe SSD. We would like to setup a homelab and have the same issues...
Problem found on pve 6.2-15

jdruwe · Nov 21, 2020

MarvinE said:
Hello,

it seams to be a problem with the Kingston A2000 M.2 NVMe SSD. We would like to setup a homelab and have the same issues...
Problem found on pve 6.2-15

Yes indeed, replaced the SSD with another one from crucial and haven't seen the issue since.

Kernel panic related to storage?

jdruwe

Active Member

t.lamprecht

Proxmox Staff Member

jdruwe

Active Member

MarvinE

Well-Known Member

jdruwe

Active Member

We value your privacy