Switched to pve-kernel-6.1 yesterday, from 5.15. Apparently, something in 6.1 messes with the device IDs of one single storage device:
After restart, I noticed a delay in booting the system, so I looked at the console and noticed a
"A start job is running for dev-disk-by-id /dev/disk/by-id/nvme-eui.6479a7311269019b-part4"
notification from systemd, with a timeout of 1m30s. After this expired, boot continued fine - but the delay shows up on subsequent reboots with 6.1. Rebooting with the previous 5.15 kernel fixed this immediately.
I have a ZFS root pool
rpool
consisting of two mirrored NVMe devices, and another SATA SSD-based storage pool. Both pools were set up with disk IDs, i.e.
rpool
has been using
nvme-eui.0026b7683b8e8485-part4
and
nvme-eui.6479a7311269019b-part4
so far. When looking up the mentioned disk IDs, I only see one of the two NVMe devices showing up with the
nvme-eui.
syntax and the other one appearing as
nvme-nvme.<aveeeeerylongserial>
all of a sudden:
Code:
» ls -l /dev/disk/by-id/nvme*
lrwxrwxrwx 1 root root 13 Dec 18 12:25 /dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B7683B8E848 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B7683B8E848-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B7683B8E848-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B7683B8E848-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B7683B8E848-part4 -> ../../nvme0n1p4
lrwxrwxrwx 1 root root 13 Dec 18 12:25 /dev/disk/by-id/nvme-PNY_CS3030_250GB_SSD_PNY09200003790100411 -> ../../nvme1n1
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-PNY_CS3030_250GB_SSD_PNY09200003790100411-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-PNY_CS3030_250GB_SSD_PNY09200003790100411-part2 -> ../../nvme1n1p2
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-PNY_CS3030_250GB_SSD_PNY09200003790100411-part3 -> ../../nvme1n1p3
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-PNY_CS3030_250GB_SSD_PNY09200003790100411-part4 -> ../../nvme1n1p4
lrwxrwxrwx 1 root root 13 Dec 18 12:25 /dev/disk/by-id/nvme-eui.0026b7683b8e8485 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-eui.0026b7683b8e8485-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-eui.0026b7683b8e8485-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-eui.0026b7683b8e8485-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-eui.0026b7683b8e8485-part4 -> ../../nvme0n1p4
lrwxrwxrwx 1 root root 13 Dec 18 12:25 /dev/disk/by-id/nvme-nvme.1987-504e593039323030303033373930313030343131-504e592043533330333020323530474220535344-00000001 -> ../../nvme1n1
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-nvme.1987-504e593039323030303033373930313030343131-504e592043533330333020323530474220535344-00000001-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-nvme.1987-504e593039323030303033373930313030343131-504e592043533330333020323530474220535344-00000001-part2 -> ../../nvme1n1p2
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-nvme.1987-504e593039323030303033373930313030343131-504e592043533330333020323530474220535344-00000001-part3 -> ../../nvme1n1p3
lrwxrwxrwx 1 root root 15 Dec 18 12:25 /dev/disk/by-id/nvme-nvme.1987-504e593039323030303033373930313030343131-504e592043533330333020323530474220535344-00000001-part4 -> ../../nvme1n1p4
This leads to the situation that the rpool is no longer imported using device IDs for both vdevs (i.e. under 5.15 both imported with nvme-eui.xxx):
Code:
» zpool status rpool
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 00:00:17 with 0 errors on Sun Dec 11 00:24:18 2022
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme-eui.0026b7683b8e8485-part3 ONLINE 0 0 0
nvme1n1p3 ONLINE 0 0 0
errors: No known data errors
The other storage pool is behaving just fine. Any ideas a) why this one device changes its naming convention unter 6.1, b1) how to potentially revert this, OR b2) how to fix the ongoing boot delay by switching to other device IDs (e.g. the ones with
nvme-KINGSTON_SA20...
and
nvme-PNY_CS30
, considering is the root pool which cannot be easily exported and re-imported?
Thanks and regards