Anything newer then 6.8.4-2-pve breaks all my nvme drives

deniax

New Member
Jan 23, 2023
15
0
1
Hi,

I tried 6.8.12-4-pve and 6.11.0-1-pve , but when I use them , my nvme drives are not detected anymore for some reason.
Well, the controller and the drives were detected, but not the block devices.

Doing modprobe nvme showed up 1 drive (from the 4 nvme's), which was the WD Black 850X, the others, 3x Samsung_SSD_990 were still nowhere to be seen

Anyone has similar problems?

Output from both 6.8.12-4-pve and 6.11.0-1-pve:

Code:
root@pve:~# dmesg | grep -i nvme
[ 183.266530] nvme nvme0: pci function 10000:e1:00.0
[ 183.266577] nvme 10000:e1:00.0: PCI INT A: no GSI
[ 183.295141] nvme nvme0: 18/0/0 default/read/poll queues
[ 183.303056] nvme0n1: p1
[ 222.613160] nvme0n1: p1

root@pve:~# lspci -nnk | grep Samsung

10000:e2:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller S4LV008[Pascal] [144d:a80c] Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller S4LV008[Pascal] [144d:a801] 10000:e3:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller S4LV008[Pascal] [144d:a80c] Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller S4LV008[Pascal] [144d:a801] 10000:e4:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a] Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO (SSD 980 PRO) [144d:a801]

When I revert back to 6.8.4-2-pve, all works without any problems:

Code:
root@pve:/mnt/pve/iscsi# dmesg | grep -i nvme

[   11.368970] nvme nvme0: pci function 10000:e3:00.0
[   11.368984] nvme nvme2: pci function 10000:e1:00.0
[   11.368984] nvme 10000:e3:00.0: PCI INT A: no GSI
[   11.368987] nvme nvme1: pci function 10000:e2:00.0
[   11.368991] nvme nvme3: pci function 10000:e4:00.0
[   11.369011] nvme 10000:e1:00.0: PCI INT A: no GSI
[   11.369016] nvme 10000:e2:00.0: PCI INT A: no GSI
[   11.369019] nvme 10000:e4:00.0: PCI INT A: not connected
[   11.373580] nvme nvme0: Shutdown timeout set to 10 seconds
[   11.373591] nvme nvme1: Shutdown timeout set to 10 seconds
[   11.376289] nvme nvme0: 16/0/0 default/read/poll queues
[   11.376331] nvme nvme1: 16/0/0 default/read/poll queues
[   11.378850]  nvme0n1: p1
[   11.378977]  nvme1n1: p1
[   11.381763] nvme nvme3: Shutdown timeout set to 10 seconds
[   11.385780] nvme nvme3: 18/0/0 default/read/poll queues
[   11.388003]  nvme3n1: p1
[   11.394798] nvme nvme2: 18/0/0 default/read/poll queues
[   11.398594]  nvme2n1: p1
[   32.433475]  nvme0n1: p1
[   32.445581]  nvme1n1: p1
[   32.458656]  nvme3n1: p1
[   39.450691]  nvme2n1: p1
 
Hi!

Could you provide the full system boot log for both kernel boots (6.8.4-2-pve and 6.8.12-4-pve) and the output of lspci -nnk for both? I can see that there are some changes in between those versions regarding the NVMe and PCI subsystems in the kernel, but I couldn't pinpoint if there was a regression that causes you problem.