Hello,
I am attempting to add
i've seen two other threads in the forum regarding this
Neither has a resolution to the issue.
The basics:
1. Three node cluster running PVE 8.4.1
2. Storage appliance is a Dell Powerstore 1000T
Configure and mount the storage following Dell's documentation NVMe over TCP connectivity.
Once the configuration has been completed, the storage is available and usable with PVE.
After reboot, the storage fails to auto-mount.
I can however run
If I look through the logs:
Some additional hardware background.
Some network details.
Let's add an ExecStartPre script to
Now I'll reboot the host.
What did the script output?
So, it looks like:
If I check
That is interesting, something now auto-connected!
What does the topology show?
This is now looking more likely to be a timing issue during boot? What happens if I increase the number of pings to 5 (
Reboot, and:
NOTE: The first set of pings (vlan1010 in the script) still fails, but I do see a response from the last two pings on vlan1020.
I can also change the
After a reboot the storage will now auto-mount. I'll call this a 'hack', not a 'fix' for the issue.
Questions/possibilities?
Any help/comments appreciated.
Thanks.
I am attempting to add
nvme over tcp storage to a PVE 8.4.1 cluster. i've seen two other threads in the forum regarding this
Hi Everyone,
I am facing a strange nvmf-autoconnect.service issue. During system boot it seems that nvmf-autoconnect.service trying to start before networking services and thus it fails. After system starts I can see the status below:
I am facing a strange nvmf-autoconnect.service issue. During system boot it seems that nvmf-autoconnect.service trying to start before networking services and thus it fails. After system starts I can see the status below:
Code:
-- Boot 9cb70a47c64046a99ef1803e2975ce8c --
May 15 11:20:52 at-pve02 systemd[1]: Starting nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot...
May 15 11:20:55 at-pve02 nvme[2872]: Failed to write to /dev/nvme-fabrics: Connection timed out
May 15 11:20:55 at-pve02 systemd[1]: nvmf-autoconnect.service: Deactivated successfully...
- pgro
- Replies: 10
- Forum: Proxmox VE: Installation and configuration
I have configured shared storage for Proxmox using a Dell PowerStore 1200T. The protocol in use is NVMe over TCP. Everything works fine, except the connection does not persist after a reboot.
discovery.conf Configuration:
--transport=tcp --traddr=192.168.5.5 --trsvcid=4420
--transport=tcp --traddr=192.168.5.6 --trsvcid=4420
--transport=tcp --traddr=192.168.6.5 --trsvcid=4420
--transport=tcp --traddr=192.168.6.6 --trsvcid=4420
nvmf-autoconnect.service Configuration:
[Unit]
Description=Connect NVMe-oF subsystems automatically during boot...
discovery.conf Configuration:
--transport=tcp --traddr=192.168.5.5 --trsvcid=4420
--transport=tcp --traddr=192.168.5.6 --trsvcid=4420
--transport=tcp --traddr=192.168.6.5 --trsvcid=4420
--transport=tcp --traddr=192.168.6.6 --trsvcid=4420
nvmf-autoconnect.service Configuration:
[Unit]
Description=Connect NVMe-oF subsystems automatically during boot...
- Ironhide526
- bug? dell powerstore nvme/tcp nvmf-autoconnect.service persistent volume shared storage storage
- Replies: 8
- Forum: Proxmox VE: Installation and configuration
Neither has a resolution to the issue.
The basics:
1. Three node cluster running PVE 8.4.1
2. Storage appliance is a Dell Powerstore 1000T
Configure and mount the storage following Dell's documentation NVMe over TCP connectivity.
Once the configuration has been completed, the storage is available and usable with PVE.
After reboot, the storage fails to auto-mount.
Code:
root@pve03:~# nvme show-topology
root@pve03:~#
I can however run
nvme connect-all, the storage will mount.If I look through the logs:
Code:
journalctl -b --no-pager | grep nvm
Jun 02 10:17:45 pve03 systemd-modules-load[797]: Inserted module 'nvme_tcp'
Jun 02 10:17:49 pve03 systemd[1]: nvmefc-boot-connections.service - Auto-connect to subsystems on FC-NVME devices found during boot was skipped because of an unmet condition check (ConditionPathExists=/sys/class/fc/fc_udev_device/nvme_discovery).
Jun 02 10:18:13 pve03 systemd[1]: Starting nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot...
Jun 02 10:18:16 pve03 nvme[1902]: Failed to write to /dev/nvme-fabrics: Connection timed out
Jun 02 10:18:16 pve03 kernel: nvme nvme0: failed to connect socket: -110
Jun 02 10:18:19 pve03 kernel: nvme nvme0: failed to connect socket: -110
Jun 02 10:18:19 pve03 nvme[1902]: Failed to write to /dev/nvme-fabrics: Connection timed out
Jun 02 10:18:19 pve03 systemd[1]: nvmf-autoconnect.service: Deactivated successfully.
Jun 02 10:18:19 pve03 systemd[1]: Finished nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot.
Some additional hardware background.
- All hosts are Dell R630
- All hosts have eight (8) network interfaces as follows
- Four (4) 1Gbps (eno[1-4]) - built in Dell network card
- eno1 -> vmbr0
- Four (4) 10Gpbs (enp130s0f0np[0-1], enp4s0f0np[0-1]), Intel X710 dual-port card(s)
- bond0 -> storage -> enp130s0f0np0, enp4s0f0np0
- bond1 -> networking -> enp120s0f0np1, enp4s0f0np1
- Four (4) 1Gbps (eno[1-4]) - built in Dell network card
- The cluster is using Ceph as well as the powerstore appliance on bond0 - not ideal, but this is dev/test
- Ceph is running on VLAN0405
- The powerstore is using VLAN1010, VLAN1020
- The cluster has Proxmox SDN configured for Virtual Machines
- The SDN has 2 Zones configured
- Type: Simple with 22 VNets
- Type: vxlan with 1 VNet
- The SDN has 2 Zones configured
Some network details.
- Powerstore
- discover address(es): 10.10.0.80, 10.10.0.8
- I/O address(es): 10.10.0.81,10.10.0.82,10.10.1.80,10.10.1.81
- Proxmox
- pve01, 10.10.0.133, 10.10.1.133
- pve02, 10.10.0.134, 10.10.1.134
- pve03, 10.10.0.134, 10.10.1.135
Let's add an ExecStartPre script to
/lib/systemd/system/nvmf-autoconnect.service
Bash:
#!/bin/sh
LOGFILE=/tmp/prenvme.check
STORAGECNTRL0=10.10.0.80
STORAGECNTRL1=10.10.1.80
ECHO=/usr/bin/echo
PING=/usr/bin/ping
DATE=/usr/bin/date
IP=/usr/sbin/ip
LSMOD=/usr/sbin/lsmod
check_prenvme() {
IFACE=$1
$ECHO -e "\n$IP link show dev $IFACE" >> $LOGFILE
$IP link show dev $IFACE >> $LOGFILE
$ECHO -e "\n$IP addr show dev $IFACE" >> $LOGFILE
$IP addr show dev $IFACE >> $LOGFILE
case "$IFACE" in
vlan1010)
$ECHO -e "\n$PING -c 3 $STORAGECNTRL0" >> $LOGFILE
$PING -c 3 $STORAGECNTRL0 >> $LOGFILE
;;
vlan1020)
$ECHO -e "\n$PING -c 3 $STORAGECNTRL1" >> $LOGFILE
$PING -c 3 $STORAGECNTRL1 >> $LOGFILE
;;
*)
;;
esac
}
$DATE > $LOGFILE
$ECHO "$LSMOD | grep nvme" >> $LOGFILE
$LSMOD | grep nvme >> $LOGFILE
check_prenvme vlan1010
check_prenvme vlan1020
exit 0
Now I'll reboot the host.
What did the script output?
Code:
Mon Jun 2 11:08:13 AM EDT 2025
/usr/sbin/lsmod | grep nvme
nvme_tcp 53248 0
nvme_fabrics 36864 1 nvme_tcp
nvme_keyring 20480 2 nvme_tcp,nvme_fabrics
nvme_core 204800 2 nvme_tcp,nvme_fabrics
nvme_auth 24576 1 nvme_core
/usr/sbin/ip link show dev vlan1010
17: vlan1010@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 10
00
link/ether 6c:fe:54:86:06:50 brd ff:ff:ff:ff:ff:ff
/usr/sbin/ip addr show dev vlan1010
17: vlan1010@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
link/ether 6c:fe:54:86:06:50 brd ff:ff:ff:ff:ff:ff
inet 10.10.0.135/24 scope global vlan1010
valid_lft forever preferred_lft forever
inet6 fe80::6efe:54ff:fe86:650/64 scope link
valid_lft forever preferred_lft forever
/usr/bin/ping -c 3 10.10.0.80
PING 10.10.0.80 (10.10.0.80) 56(84) bytes of data.
From 10.10.0.135 icmp_seq=1 Destination Host Unreachable
From 10.10.0.135 icmp_seq=2 Destination Host Unreachable
From 10.10.0.135 icmp_seq=3 Destination Host Unreachable
--- 10.10.0.80 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2084ms
pipe 3
/usr/sbin/ip link show dev vlan1020
18: vlan1020@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 10
00
link/ether 6c:fe:54:86:06:50 brd ff:ff:ff:ff:ff:ff
/usr/sbin/ip addr show dev vlan1020
18: vlan1020@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
link/ether 6c:fe:54:86:06:50 brd ff:ff:ff:ff:ff:ff
inet 10.10.1.135/24 scope global vlan1020
valid_lft forever preferred_lft forever
inet6 fe80::6efe:54ff:fe86:650/64 scope link
valid_lft forever preferred_lft forever
/usr/bin/ping -c 3 10.10.1.80
PING 10.10.1.80 (10.10.1.80) 56(84) bytes of data.
From 10.10.1.135 icmp_seq=1 Destination Host Unreachable
From 10.10.1.135 icmp_seq=2 Destination Host Unreachable
From 10.10.1.135 icmp_seq=3 Destination Host Unreachable
--- 10.10.1.80 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2009ms
pipe 3
So, it looks like:
- The interfaces are up
- They have an assigned IP address
nvmf-autoconnect.service)If I check
journalctl -b --no-pager | grep nvm,
Code:
Jun 02 11:07:45 pve03 systemd-modules-load[789]: Inserted module 'nvme_tcp'
Jun 02 11:07:49 pve03 systemd[1]: nvmefc-boot-connections.service - Auto-connect to subsystems on FC-NVME devices found during boot was skipped because of an unmet condition check (ConditionPathExists=/sys/class/fc/fc_udev_device/nvme_discovery).
Jun 02 11:08:13 pve03 systemd[1]: Starting nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot...
Jun 02 11:08:22 pve03 nvme[2085]: Failed to write to /dev/nvme-fabrics: Connection timed out
Jun 02 11:08:22 pve03 kernel: nvme nvme0: failed to connect socket: -110
Jun 02 11:08:24 pve03 kernel: nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.10.1.80:8009, hostnqn: nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-5a10-804d-c6c04f504432
Jun 02 11:08:24 pve03 kernel: nvme nvme1: creating 32 I/O queues.
Jun 02 11:08:24 pve03 kernel: nvme nvme1: mapped 32/0/0 default/read/poll queues.
Jun 02 11:08:25 pve03 kernel: nvme nvme1: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:92252bdda3d64105ABB4", addr 10.10.1.82:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-5a10-804d-c6c04f504432
Jun 02 11:08:25 pve03 kernel: nvme nvme2: creating 32 I/O queues.
Jun 02 11:08:25 pve03 kernel: nvme nvme2: mapped 32/0/0 default/read/poll queues.
Jun 02 11:08:25 pve03 kernel: nvme nvme2: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:92252bdda3d64105ABB4", addr 10.10.1.81:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-5a10-804d-c6c04f504432
Jun 02 11:08:25 pve03 kernel: nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Jun 02 11:08:25 pve03 systemd[1]: nvmf-autoconnect.service: Deactivated successfully.
Jun 02 11:08:25 pve03 systemd[1]: Finished nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot.
Jun 02 11:08:25 pve03 lvm[2095]: PV /dev/nvme1n1 online, VG nvme_vgcrawx is complete.
Jun 02 11:08:25 pve03 systemd[1]: Started lvm-activate-nvme_vgcrawx.service - /sbin/lvm vgchange -aay --autoactivation event nvme_vgcrawx.
Jun 02 11:08:25 pve03 lvm[2118]: 2 logical volume(s) in volume group "nvme_vgcrawx" now active
Jun 02 11:08:25 pve03 systemd[1]: lvm-activate-nvme_vgcrawx.service: Deactivated successfully.
That is interesting, something now auto-connected!
What does the topology show?
Code:
root@pve03:~# nvme show-topology
nvme-subsys1 - NQN=nqn.1988-11.com.dell:powerstore:00:92252bdda3d64105ABB4
\
+- ns 13
\
+- nvme1 tcp traddr=10.10.1.82,trsvcid=4420,src_addr=10.10.1.135 live non-optimized
+- nvme2 tcp traddr=10.10.1.81,trsvcid=4420,src_addr=10.10.1.135 live optimized
This is now looking more likely to be a timing issue during boot? What happens if I increase the number of pings to 5 (
ping -c 5)?Reboot, and:
NOTE: The first set of pings (vlan1010 in the script) still fails, but I do see a response from the last two pings on vlan1020.
Code:
root@pve03:~# nvme show-topology
nvme-subsys1 - NQN=nqn.1988-11.com.dell:powerstore:00:92252bdda3d64105ABB4
\
+- ns 13
\
+- nvme1 tcp traddr=10.10.0.82,trsvcid=4420,src_addr=10.10.0.135 live non-optimized
+- nvme2 tcp traddr=10.10.0.81,trsvcid=4420,src_addr=10.10.0.135 live optimized
+- nvme3 tcp traddr=10.10.1.82,trsvcid=4420,src_addr=10.10.1.135 live non-optimized
+- nvme4 tcp traddr=10.10.1.81,trsvcid=4420,src_addr=10.10.1.135 live optimized
I can also change the
ExecStartPre line calling my script to instead be:ExecStartPre=/usr/bin/sleep 10After a reboot the storage will now auto-mount. I'll call this a 'hack', not a 'fix' for the issue.
Questions/possibilities?
- Could there be too many interfaces being configured by ifupdown2 on boot causing a delay?
- 8 physical
- 2 bond
- 5 vlans
- 2 bridges
- 2 Zones + 23 VNets
- A conflict between systemd and ifupdown2?
- nvmf-autoconnect has
After=network-online.target, is the network really up? systemd claims it is before nvmf-autoconnect runs: -
Code:
Jun 02 14:24:16 pve03 systemd[1]: Finished networking.service - Network initialization. Jun 02 14:24:16 pve03 systemd[1]: Reached target network.target - Network. Jun 02 14:24:16 pve03 systemd[1]: Reached target network-online.target - Network is Online. Jun 02 14:24:16 pve03 systemd[1]: Starting chrony.service - chrony, an NTP client/server... Jun 02 14:24:16 pve03 systemd[1]: Starting dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server.> Jun 02 14:24:16 pve03 systemd[1]: Started lxc-monitord.service - LXC Container Monitoring Daemon. Jun 02 14:24:16 pve03 systemd[1]: Starting lxc-net.service - LXC network bridge setup... Jun 02 14:24:16 pve03 systemd[1]: Starting nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically du
- nvmf-autoconnect has
Any help/comments appreciated.
Thanks.