PVE 8.4.1 / nvmf-autoconnect.service / nvme connect-all fails at boot / timing issues

May 16, 2025
24
0
1
Hello,

I am attempting to add nvme over tcp storage to a PVE 8.4.1 cluster.

i've seen two other threads in the forum regarding this


Neither has a resolution to the issue.

The basics:
1. Three node cluster running PVE 8.4.1
2. Storage appliance is a Dell Powerstore 1000T

Configure and mount the storage following Dell's documentation NVMe over TCP connectivity.
Once the configuration has been completed, the storage is available and usable with PVE.

After reboot, the storage fails to auto-mount.

Code:
root@pve03:~# nvme show-topology
root@pve03:~#

I can however run nvme connect-all, the storage will mount.

If I look through the logs:

Code:
journalctl -b --no-pager | grep nvm

Jun 02 10:17:45 pve03 systemd-modules-load[797]: Inserted module 'nvme_tcp'
Jun 02 10:17:49 pve03 systemd[1]: nvmefc-boot-connections.service - Auto-connect to subsystems on FC-NVME devices found during boot was skipped because of an unmet condition check (ConditionPathExists=/sys/class/fc/fc_udev_device/nvme_discovery).
Jun 02 10:18:13 pve03 systemd[1]: Starting nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot...
Jun 02 10:18:16 pve03 nvme[1902]: Failed to write to /dev/nvme-fabrics: Connection timed out
Jun 02 10:18:16 pve03 kernel: nvme nvme0: failed to connect socket: -110
Jun 02 10:18:19 pve03 kernel: nvme nvme0: failed to connect socket: -110
Jun 02 10:18:19 pve03 nvme[1902]: Failed to write to /dev/nvme-fabrics: Connection timed out
Jun 02 10:18:19 pve03 systemd[1]: nvmf-autoconnect.service: Deactivated successfully.
Jun 02 10:18:19 pve03 systemd[1]: Finished nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot.

Some additional hardware background.
  1. All hosts are Dell R630
  2. All hosts have eight (8) network interfaces as follows
    1. Four (4) 1Gbps (eno[1-4]) - built in Dell network card
      1. eno1 -> vmbr0
    2. Four (4) 10Gpbs (enp130s0f0np[0-1], enp4s0f0np[0-1]), Intel X710 dual-port card(s)
      1. bond0 -> storage -> enp130s0f0np0, enp4s0f0np0
      2. bond1 -> networking -> enp120s0f0np1, enp4s0f0np1
  3. The cluster is using Ceph as well as the powerstore appliance on bond0 - not ideal, but this is dev/test
    1. Ceph is running on VLAN0405
    2. The powerstore is using VLAN1010, VLAN1020
  4. The cluster has Proxmox SDN configured for Virtual Machines
    1. The SDN has 2 Zones configured
      1. Type: Simple with 22 VNets
      2. Type: vxlan with 1 VNet
In one of the threads listed above, bbgeek17 suggested it could be a dependency/timing issue.

Some network details.
  1. Powerstore
    1. discover address(es): 10.10.0.80, 10.10.0.8
    2. I/O address(es): 10.10.0.81,10.10.0.82,10.10.1.80,10.10.1.81
  2. Proxmox
    1. pve01, 10.10.0.133, 10.10.1.133
    2. pve02, 10.10.0.134, 10.10.1.134
    3. pve03, 10.10.0.134, 10.10.1.135

Let's add an ExecStartPre script to /lib/systemd/system/nvmf-autoconnect.service

Bash:
#!/bin/sh

LOGFILE=/tmp/prenvme.check

STORAGECNTRL0=10.10.0.80
STORAGECNTRL1=10.10.1.80

ECHO=/usr/bin/echo
PING=/usr/bin/ping
DATE=/usr/bin/date
IP=/usr/sbin/ip
LSMOD=/usr/sbin/lsmod

check_prenvme() {
  IFACE=$1
  $ECHO -e "\n$IP link show dev $IFACE" >> $LOGFILE
  $IP link show dev $IFACE >> $LOGFILE
  $ECHO -e "\n$IP addr show dev $IFACE" >> $LOGFILE
  $IP addr show dev $IFACE >> $LOGFILE
  case "$IFACE" in
    vlan1010)
      $ECHO -e "\n$PING -c 3 $STORAGECNTRL0" >> $LOGFILE
      $PING -c 3 $STORAGECNTRL0 >> $LOGFILE
      ;;
    vlan1020)
      $ECHO -e "\n$PING -c 3 $STORAGECNTRL1" >> $LOGFILE
      $PING -c 3 $STORAGECNTRL1 >> $LOGFILE
      ;;
    *)
      ;;
  esac

}

$DATE > $LOGFILE
$ECHO "$LSMOD | grep nvme" >> $LOGFILE
$LSMOD | grep nvme >> $LOGFILE
check_prenvme vlan1010
check_prenvme vlan1020
exit 0

Now I'll reboot the host.

What did the script output?

Code:
Mon Jun  2 11:08:13 AM EDT 2025
/usr/sbin/lsmod | grep nvme
nvme_tcp               53248  0
nvme_fabrics           36864  1 nvme_tcp
nvme_keyring           20480  2 nvme_tcp,nvme_fabrics
nvme_core             204800  2 nvme_tcp,nvme_fabrics
nvme_auth              24576  1 nvme_core

/usr/sbin/ip link show dev vlan1010
17: vlan1010@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 10
00
    link/ether 6c:fe:54:86:06:50 brd ff:ff:ff:ff:ff:ff

/usr/sbin/ip addr show dev vlan1010
17: vlan1010@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    link/ether 6c:fe:54:86:06:50 brd ff:ff:ff:ff:ff:ff
    inet 10.10.0.135/24 scope global vlan1010
       valid_lft forever preferred_lft forever
    inet6 fe80::6efe:54ff:fe86:650/64 scope link
       valid_lft forever preferred_lft forever

/usr/bin/ping -c 3 10.10.0.80
PING 10.10.0.80 (10.10.0.80) 56(84) bytes of data.
From 10.10.0.135 icmp_seq=1 Destination Host Unreachable
From 10.10.0.135 icmp_seq=2 Destination Host Unreachable
From 10.10.0.135 icmp_seq=3 Destination Host Unreachable

--- 10.10.0.80 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2084ms
pipe 3

/usr/sbin/ip link show dev vlan1020
18: vlan1020@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 10
00
    link/ether 6c:fe:54:86:06:50 brd ff:ff:ff:ff:ff:ff

/usr/sbin/ip addr show dev vlan1020
18: vlan1020@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    link/ether 6c:fe:54:86:06:50 brd ff:ff:ff:ff:ff:ff
    inet 10.10.1.135/24 scope global vlan1020
       valid_lft forever preferred_lft forever
    inet6 fe80::6efe:54ff:fe86:650/64 scope link
       valid_lft forever preferred_lft forever

/usr/bin/ping -c 3 10.10.1.80
PING 10.10.1.80 (10.10.1.80) 56(84) bytes of data.
From 10.10.1.135 icmp_seq=1 Destination Host Unreachable
From 10.10.1.135 icmp_seq=2 Destination Host Unreachable
From 10.10.1.135 icmp_seq=3 Destination Host Unreachable

--- 10.10.1.80 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2009ms
pipe 3

So, it looks like:
  1. The interfaces are up
  2. They have an assigned IP address
But, the storage isn't reachable at this point of the boot process (ExecStartPre in nvmf-autoconnect.service)

If I check journalctl -b --no-pager | grep nvm,

Code:
Jun 02 11:07:45 pve03 systemd-modules-load[789]: Inserted module 'nvme_tcp'
Jun 02 11:07:49 pve03 systemd[1]: nvmefc-boot-connections.service - Auto-connect to subsystems on FC-NVME devices found during boot was skipped because of an unmet condition check (ConditionPathExists=/sys/class/fc/fc_udev_device/nvme_discovery).
Jun 02 11:08:13 pve03 systemd[1]: Starting nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot...
Jun 02 11:08:22 pve03 nvme[2085]: Failed to write to /dev/nvme-fabrics: Connection timed out
Jun 02 11:08:22 pve03 kernel: nvme nvme0: failed to connect socket: -110
Jun 02 11:08:24 pve03 kernel: nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.10.1.80:8009, hostnqn: nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-5a10-804d-c6c04f504432
Jun 02 11:08:24 pve03 kernel: nvme nvme1: creating 32 I/O queues.
Jun 02 11:08:24 pve03 kernel: nvme nvme1: mapped 32/0/0 default/read/poll queues.
Jun 02 11:08:25 pve03 kernel: nvme nvme1: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:92252bdda3d64105ABB4", addr 10.10.1.82:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-5a10-804d-c6c04f504432
Jun 02 11:08:25 pve03 kernel: nvme nvme2: creating 32 I/O queues.
Jun 02 11:08:25 pve03 kernel: nvme nvme2: mapped 32/0/0 default/read/poll queues.
Jun 02 11:08:25 pve03 kernel: nvme nvme2: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:92252bdda3d64105ABB4", addr 10.10.1.81:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-5a10-804d-c6c04f504432
Jun 02 11:08:25 pve03 kernel: nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Jun 02 11:08:25 pve03 systemd[1]: nvmf-autoconnect.service: Deactivated successfully.
Jun 02 11:08:25 pve03 systemd[1]: Finished nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot.
Jun 02 11:08:25 pve03 lvm[2095]: PV /dev/nvme1n1 online, VG nvme_vgcrawx is complete.
Jun 02 11:08:25 pve03 systemd[1]: Started lvm-activate-nvme_vgcrawx.service - /sbin/lvm vgchange -aay --autoactivation event nvme_vgcrawx.
Jun 02 11:08:25 pve03 lvm[2118]:   2 logical volume(s) in volume group "nvme_vgcrawx" now active
Jun 02 11:08:25 pve03 systemd[1]: lvm-activate-nvme_vgcrawx.service: Deactivated successfully.

That is interesting, something now auto-connected!

What does the topology show?
Code:
root@pve03:~# nvme show-topology
nvme-subsys1 - NQN=nqn.1988-11.com.dell:powerstore:00:92252bdda3d64105ABB4
\
 +- ns 13
 \
  +- nvme1 tcp traddr=10.10.1.82,trsvcid=4420,src_addr=10.10.1.135 live non-optimized
  +- nvme2 tcp traddr=10.10.1.81,trsvcid=4420,src_addr=10.10.1.135 live optimized

This is now looking more likely to be a timing issue during boot? What happens if I increase the number of pings to 5 (ping -c 5)?

Reboot, and:

NOTE: The first set of pings (vlan1010 in the script) still fails, but I do see a response from the last two pings on vlan1020.

Code:
root@pve03:~# nvme show-topology
nvme-subsys1 - NQN=nqn.1988-11.com.dell:powerstore:00:92252bdda3d64105ABB4
\
 +- ns 13
 \
  +- nvme1 tcp traddr=10.10.0.82,trsvcid=4420,src_addr=10.10.0.135 live non-optimized
  +- nvme2 tcp traddr=10.10.0.81,trsvcid=4420,src_addr=10.10.0.135 live optimized
  +- nvme3 tcp traddr=10.10.1.82,trsvcid=4420,src_addr=10.10.1.135 live non-optimized
  +- nvme4 tcp traddr=10.10.1.81,trsvcid=4420,src_addr=10.10.1.135 live optimized

I can also change the ExecStartPre line calling my script to instead be:

ExecStartPre=/usr/bin/sleep 10

After a reboot the storage will now auto-mount. I'll call this a 'hack', not a 'fix' for the issue.

Questions/possibilities?
  1. Could there be too many interfaces being configured by ifupdown2 on boot causing a delay?
    1. 8 physical
    2. 2 bond
    3. 5 vlans
    4. 2 bridges
    5. 2 Zones + 23 VNets
  2. A conflict between systemd and ifupdown2?
    1. nvmf-autoconnect has After=network-online.target, is the network really up? systemd claims it is before nvmf-autoconnect runs:
    2. Code:
      Jun 02 14:24:16 pve03 systemd[1]: Finished networking.service - Network initialization.
      Jun 02 14:24:16 pve03 systemd[1]: Reached target network.target - Network.
      Jun 02 14:24:16 pve03 systemd[1]: Reached target network-online.target - Network is Online.
      Jun 02 14:24:16 pve03 systemd[1]: Starting chrony.service - chrony, an NTP client/server...
      Jun 02 14:24:16 pve03 systemd[1]: Starting dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server.>
      Jun 02 14:24:16 pve03 systemd[1]: Started lxc-monitord.service - LXC Container Monitoring Daemon.
      Jun 02 14:24:16 pve03 systemd[1]: Starting lxc-net.service - LXC network bridge setup...
      Jun 02 14:24:16 pve03 systemd[1]: Starting nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically du

Any help/comments appreciated.

Thanks.
 
Thanks for running with the experiment I suggested, and collecting comprehensive results of your test.

I would try this first: instead of pinging the storage, ping the GW or another well-known node on the network. This will scope the issue to network in general and exclude the storage.

My hunch is that this is VLAN/ARP/spanning tree/switch related. Or, this could be network card specific where the firmware may not be fully initialized. Some users reported that disabling dual-personality helped with weird networking issues (IP/Infiniband).

On RHEL based systems there used to be LINKDELAY variable for ifconfig files. It was used for some buggy NICs that took too long to initialize.

Frankly, none of this sounds like a "PVE Installation and configuration" issue. I'd ping Dell Server side for suggestions (if you are using Dell NIC).


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited: