With proxmox 6 linux 5.0 the same command which used to work under proxmox 5.4 is broken. Trying to connect to an NVMe-oF RDMA target provided by spdk fails.
probable cause:
If "nr-io-queues" is not set by nvme cli the kernel uses the logical cores count instead (core count also defines the upper limit of nr-io-queues inside the kernel). This becomes a problem when the nvmeof target provides fewer Io-queues then logical cores are available on the initiator (pve) system.
In my case an Intel i7 6700 system was the target which has only 8 logical cores and the amount of queues is usually for best performance kept blow or equal this number (parameter p in nvmf_create_transport with spdk).
With an old 4.15.18 kernel this is not a problem as the "nr-io-queues" as chosen by the kernel would match the number of queues-1 assigned to spdk (target) with the p parameter. (nr-io-queues=p-1)
On proxmox 6 this is not the case instead. Somehow a larger number than provided by sdpk must internally be choose.
Even though the logging prints the right number (spdk has p=2) kernel would need to choose nr-io-queues=1 this happens:
Theoretically after looking at the logs both commands should be equivalent.
But they are not with a 5.0 kernel.
This causes a kernel NULL pointer de-reference and the used network interfaces is in an undefined state which prevents a normal shutdown.
This can only be traced when logical_cores_ initiator > p-1 // (target _queues-1)
opts->nr_io_queues = min_t(unsigned int,num_online_cpus(), token);
https://elixir.bootlin.com/linux/v5.0.21/source/drivers/nvme/host/fabrics.c#L730
https://elixir.bootlin.com/linux/v5.0.21/source/drivers/nvme/host/fabrics.c#L636
lukas@pve-master:~$ sudo modprobe nvme-rdma
lukas@pve-master:~$ sudo nvme discover -t rdma -a 10.0.0.20 -s 4420
Discovery Log Number of Records 1, Generation counter 3
=====Discovery Log Entry 0======
trtype: rdma
adrfam: ipv4
subtype: nvme subsystem
treq: not specified
portid: 0
trsvcid: 4420
subnqn: nqn.2016-06.io.spdk:cnode1
traddr: 10.0.0.20
rdma_prtype: not specified
rdma_qptype: connected
rdma_cms: rdma-cm
rdma_pkey: 0x0000
lukas@pve-master:~$ sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 10.0.0.20 -s 4420
Killed
Dmesg:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
lukas@pve-master:~$ sudo nvme discover -t rdma -a 10.0.0.20 -s 4420
Discovery Log Number of Records 1, Generation counter 3
=====Discovery Log Entry 0======
trtype: rdma
adrfam: ipv4
subtype: nvme subsystem
treq: not specified
portid: 0
trsvcid: 4420
subnqn: nqn.2016-06.io.spdk:cnode1
traddr: 10.0.0.20
rdma_prtype: not specified
rdma_qptype: connected
rdma_cms: rdma-cm
rdma_pkey: 0x0000
lukas@pve-master:~$ sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 10.0.0.20 -s 4420
Killed
Dmesg:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
probable cause:
If "nr-io-queues" is not set by nvme cli the kernel uses the logical cores count instead (core count also defines the upper limit of nr-io-queues inside the kernel). This becomes a problem when the nvmeof target provides fewer Io-queues then logical cores are available on the initiator (pve) system.
In my case an Intel i7 6700 system was the target which has only 8 logical cores and the amount of queues is usually for best performance kept blow or equal this number (parameter p in nvmf_create_transport with spdk).
With an old 4.15.18 kernel this is not a problem as the "nr-io-queues" as chosen by the kernel would match the number of queues-1 assigned to spdk (target) with the p parameter. (nr-io-queues=p-1)
On proxmox 6 this is not the case instead. Somehow a larger number than provided by sdpk must internally be choose.
Even though the logging prints the right number (spdk has p=2) kernel would need to choose nr-io-queues=1 this happens:
sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 10.0.0.20 -s 4420
[ 4381.574059] nvme nvme0: creating 1 I/O queues.
[ 4381.624318] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 4381.574059] nvme nvme0: creating 1 I/O queues.
[ 4381.624318] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 10.0.0.20 -s 4420 -i 1
[ 4348.394350] nvme nvme0: creating 1 I/O queues.
[ 4348.444315] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode1", addr 10.0.0.20:4420
[ 4348.394350] nvme nvme0: creating 1 I/O queues.
[ 4348.444315] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode1", addr 10.0.0.20:4420
Theoretically after looking at the logs both commands should be equivalent.
But they are not with a 5.0 kernel.
This causes a kernel NULL pointer de-reference and the used network interfaces is in an undefined state which prevents a normal shutdown.
This can only be traced when logical_cores_ initiator > p-1 // (target _queues-1)
opts->nr_io_queues = min_t(unsigned int,num_online_cpus(), token);
https://elixir.bootlin.com/linux/v5.0.21/source/drivers/nvme/host/fabrics.c#L730
https://elixir.bootlin.com/linux/v5.0.21/source/drivers/nvme/host/fabrics.c#L636
Last edited: