Hello together,
I got a few ZFS and NVMe errors the last 3 days since I've build my server.
Here is my server config:
Here are a few log files:
Todays boot log:
general info about nvme:
some info about: nvme0:
nvme0 errors (nvme1 has the same):
some info about nvme1:
Do you need more logs or infos?
I hope someone can help me out!
Thanks you!
I got a few ZFS and NVMe errors the last 3 days since I've build my server.
Here is my server config:
Here are a few log files:
Todays boot log:
Bash:
[...]
Oct 20 10:02:04 proxmox systemd[1]: Finished Helper to synchronize boot up for ifupdown.
Oct 20 10:02:04 proxmox systemd[1]: Finished Wait for udev To Complete Device Initialization.
Oct 20 10:02:04 proxmox systemd[1]: Starting Import ZFS pools by cache file...
Oct 20 10:02:04 proxmox systemd[1]: Condition check resulted in Import ZFS pools by device scanning being skipped.
Oct 20 10:02:04 proxmox systemd[1]: Starting Import ZFS pool nvmepool...
Oct 20 10:02:04 proxmox zpool[1223]: cannot import 'nvmepool': no such pool available
Oct 20 10:02:04 proxmox systemd[1]: zfs-import@nvmepool.service: Main process exited, code=exited, status=1/FAILURE
Oct 20 10:02:04 proxmox systemd[1]: zfs-import@nvmepool.service: Failed with result 'exit-code'.
Oct 20 10:02:04 proxmox systemd[1]: Failed to start Import ZFS pool nvmepool.
Oct 20 10:02:04 proxmox kernel: zd0: p1 p2 p3
Oct 20 10:02:04 proxmox kernel: zd16: p1 p2
Oct 20 10:02:04 proxmox kernel: zd32: p1 p2
Oct 20 10:02:04 proxmox kernel: zd48: p1 p2 < p5 >
Oct 20 10:02:04 proxmox kernel: zd64: p1 p2 < p5 >
Oct 20 10:02:04 proxmox systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Oct 20 10:02:04 proxmox systemd[1]: Starting LVM event activation on device 230:3...
Oct 20 10:02:04 proxmox systemd[1]: Finished Import ZFS pools by cache file.
Oct 20 10:02:04 proxmox lvm[1982]: pvscan[1982] /dev/zd0p3 excluded by filters: device is rejected by filter config.
Oct 20 10:02:04 proxmox systemd[1]: Reached target ZFS pool import target.
Oct 20 10:02:04 proxmox systemd[1]: Starting LVM event activation on device 230:18...
Oct 20 10:02:04 proxmox systemd[1]: Starting LVM event activation on device 230:34...
Oct 20 10:02:04 proxmox systemd[1]: Starting Mount ZFS filesystems...
Oct 20 10:02:04 proxmox systemd[1]: Starting Wait for ZFS Volume (zvol) links in /dev...
Oct 20 10:02:04 proxmox lvm[1983]: pvscan[1983] /dev/zd16p2 excluded by filters: device is rejected by filter config.
Oct 20 10:02:04 proxmox lvm[1984]: pvscan[1984] /dev/zd32p2 excluded by filters: device is rejected by filter config.
Oct 20 10:02:04 proxmox zvol_wait[1986]: Testing 5 zvol links
Oct 20 10:02:04 proxmox zvol_wait[1986]: All zvol links are now present.
Oct 20 10:02:04 proxmox systemd[1]: Finished Wait for ZFS Volume (zvol) links in /dev.
Oct 20 10:02:04 proxmox systemd[1]: Reached target ZFS volumes are ready.
Oct 20 10:02:04 proxmox systemd[1]: Finished Mount ZFS filesystems.
Oct 20 10:02:04 proxmox systemd[1]: Reached target Local File Systems.
[...]
general info about nvme:
Bash:
root@proxmox:~# nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 50026B7686069248 KINGSTON SKC3000S1024G 1 1.02 TB / 1.02 TB 512 B + 0 B EIFK31.6
/dev/nvme1n1 50026B7685EFFF01 KINGSTON SKC3000S1024G 1 1.02 TB / 1.02 TB 512 B + 0 B EIFK31.6
some info about: nvme0:
Bash:
root@proxmox:~# smartctl -a /dev/nvme0
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.30-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: KINGSTON SKC3000S1024G
Serial Number: 50026B7686069248
Firmware Version: EIFK31.6
PCI Vendor/Subsystem ID: 0x2646
IEEE OUI Identifier: 0x0026b7
Total NVM Capacity: 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 0026b7 6860692485
Local Time is: Wed Oct 19 14:41:46 2022 CEST
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d): Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x08): Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 84 Celsius
Critical Comp. Temp. Threshold: 89 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 8.80W - - 0 0 0 0 0 0
1 + 7.10W - - 1 1 1 1 0 0
2 + 5.20W - - 2 2 2 2 0 0
3 - 0.0620W - - 3 3 3 3 2500 7500
4 - 0.0620W - - 4 4 4 4 2500 7500
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 22 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 15,285 [7.82 GB]
Data Units Written: 348,449 [178 GB]
Host Read Commands: 699,273
Host Write Commands: 2,315,277
Controller Busy Time: 6
Power Cycles: 43
Power On Hours: 32
Unsafe Shutdowns: 36
Media and Data Integrity Errors: 0
Error Information Log Entries: 22
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 2: 51 Celsius
Error Information (NVMe Log 0x01, 16 of 63 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 22 0 0x2001 0x4004 - 0 0 -
1 21 0 0x1001 0x4004 0x028 0 0 -
nvme0 errors (nvme1 has the same):
Bash:
root@proxmox:~# nvme error-log /dev/nvme0
Error Log Entries for device:nvme0 entries:63
.................
Entry[ 0]
.................
error_count : 24
sqid : 0
cmdid : 0x2011
status_field : 0x4004(INVALID_FIELD: A reserved coded value or an unsupported value in a defined field)
parm_err_loc : 0xffff
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
some info about nvme1:
Bash:
root@proxmox:~# smartctl -a /dev/nvme1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.30-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: KINGSTON SKC3000S1024G
Serial Number: 50026B7685EFFF01
Firmware Version: EIFK31.6
PCI Vendor/Subsystem ID: 0x2646
IEEE OUI Identifier: 0x0026b7
Total NVM Capacity: 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 0026b7 685efff015
Local Time is: Wed Oct 19 14:42:53 2022 CEST
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d): Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x08): Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 84 Celsius
Critical Comp. Temp. Threshold: 89 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 8.80W - - 0 0 0 0 0 0
1 + 7.10W - - 1 1 1 1 0 0
2 + 5.20W - - 2 2 2 2 0 0
3 - 0.0620W - - 3 3 3 3 2500 7500
4 - 0.0620W - - 4 4 4 4 2500 7500
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 22 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 15,415 [7.89 GB]
Data Units Written: 348,471 [178 GB]
Host Read Commands: 706,784
Host Write Commands: 2,309,367
Controller Busy Time: 6
Power Cycles: 20
Power On Hours: 32
Unsafe Shutdowns: 14
Media and Data Integrity Errors: 0
Error Information Log Entries: 21
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 2: 47 Celsius
Error Information (NVMe Log 0x01, 16 of 63 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 21 0 0x301d 0x4004 - 0 0 -
1 20 0 0x101d 0x4004 0x028 0 0 -
Do you need more logs or infos?
I hope someone can help me out!
Thanks you!