Proxmox 8.2.2 / nvmf-autoconnect.service not starting

pgro

Member
Oct 20, 2022
56
2
13
Hi Everyone,

I am facing a strange nvmf-autoconnect.service issue. During system boot it seems that nvmf-autoconnect.service trying to start before networking services and thus it fails. After system starts I can see the status below:

Code:
-- Boot 9cb70a47c64046a99ef1803e2975ce8c --
May 15 11:20:52 at-pve02 systemd[1]: Starting nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot...
May 15 11:20:55 at-pve02 nvme[2872]: Failed to write to /dev/nvme-fabrics: Connection timed out
May 15 11:20:55 at-pve02 systemd[1]: nvmf-autoconnect.service: Deactivated successfully.
May 15 11:20:55 at-pve02 systemd[1]: Finished nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot.
May 15 11:32:40 at-pve02 systemd[1]: Starting nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot...
May 15 11:32:40 at-pve02 systemd[1]: nvmf-autoconnect.service: Deactivated successfully.
May 15 11:32:40 at-pve02 systemd[1]: Finished nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot.
-- Boot f987796a4add4f14b9934b74737839e9 --

root@at-pve02:~# systemctl status nvmf-autoconnect.service
○ nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot
     Loaded: loaded (/lib/systemd/system/nvmf-autoconnect.service; enabled; preset: enabled)
     Active: inactive (dead) since Wed 2024-05-15 13:45:56 EEST; 9min ago
    Process: 2871 ExecStartPre=/sbin/modprobe nvme-fabrics (code=exited, status=0/SUCCESS)
    Process: 2880 ExecStart=/usr/sbin/nvme connect-all (code=exited, status=0/SUCCESS)
   Main PID: 2880 (code=exited, status=0/SUCCESS)
        CPU: 30ms

May 15 13:45:53 at-pve02 systemd[1]: Starting nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot...
May 15 13:45:56 at-pve02 nvme[2880]: Failed to write to /dev/nvme-fabrics: Connection timed out
May 15 13:45:56 at-pve02 systemd[1]: nvmf-autoconnect.service: Deactivated successfully.
May 15 13:45:56 at-pve02 systemd[1]: Finished nvmf-autoconnect.service - Connect NVMe-oF subsystems automatically during boot.

by typing

Code:
systemctl list-dependencies

default.target
○ ├─display-manager.service
○ ├─nvmefc-boot-connections.service
○ ├─nvmf-autoconnect.service
● ├─rrdcached.service
○ ├─systemd-update-utmp-runlevel.service
● └─multi-user.target
●   ├─chrony.service
●   ├─console-setup.service
●   ├─corosync.service
●   ├─cron.service
●   ├─dbus.service
○   ├─e2scrub_reap.service
●   ├─ksmtuned.service
●   ├─lxc-monitord.service
●   ├─lxc-net.service
●   ├─lxc.service
●   ├─lxcfs.service
●   ├─networking.service
●   ├─postfix.service
○   ├─proxmox-boot-cleanup.service
●   ├─proxmox-firewall.service
●   ├─pve-cluster.service
●   ├─pve-firewall.service
●   ├─pve-guests.service
●   ├─pve-ha-crm.service
●   ├─pve-ha-lrm.service
●   ├─pve-lxc-syscalld.service
●   ├─pvedaemon.service
...

You will notice that nvmf-autoconnect.service is above networking.service

Code:
root@at-pve02:~# systemctl list-dependencies network-online.target
network-online.target
● └─networking.service

while nfs-client.target for example

Code:
root@at-pve02:~# systemctl list-dependencies nfs-client.target
nfs-client.target
○ ├─auth-rpcgss-module.service
● ├─rpc-statd-notify.service
● └─remote-fs-pre.target


somehow I need to direct the nvmf-autoconnect.service I to boot last one or after nfs-client.target

For now in order to fix this. I have to restart the service after it's system boot

Thanx
 
Hi @bbgeek17

I have already checked this but it's a little bit more complicated.

Code:
root@at-pve02:~# cat /lib/systemd/system/nvmf-autoconnect.service
[Unit]
Description=Connect NVMe-oF subsystems automatically during boot
ConditionPathExists=|!/etc/nvme/config.json
ConditionPathExists=|!/etc/nvme/discovery.conf
After=network-online.target
Before=remote-fs-pre.target

[Service]
Type=oneshot
ExecStartPre=/sbin/modprobe nvme-fabrics
ExecStart=/usr/sbin/nvme connect-all

[Install]
WantedBy=default.target

As you can see network-online.target already exist on the .service file BUT, there are no network-online.target on the system as we know it, but instead if you check by

Code:
root@at-pve02:~# systemctl list-dependencies network-online.target
network-online.target
● └─networking.service

root@at-pve02:~# systemctl list-units networking.service
  UNIT               LOAD   ACTIVE SUB    DESCRIPTION           
  networking.service loaded active exited Network initialization

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
1 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
root@at-pve02:~# systemctl list-units network-online.target
  UNIT                  LOAD   ACTIVE SUB    DESCRIPTION      
  network-online.target loaded active active Network is Online

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
1 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

So why it doesn't start after network initialization?
 
Hard to say without knowing all the details about your system. Maybe there is a delay in some of the components of the networking and so there is a race.

I'd advise you to keep playing with dependencies, and delays and, perhaps, add some debug output.

Take a look here https://unix.stackexchange.com/ques...ork-interface-to-be-up-before-running-service

Seems like if you have multiple interfaces - there could be "gotchas" in the start-up.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Hard to say without knowing all the details about your system. Maybe there is a delay in some of the components of the networking and so there is a race.

I'd advise you to keep playing with dependencies, and delays and, perhaps, add some debug output.

Take a look here https://unix.stackexchange.com/ques...ork-interface-to-be-up-before-running-service

Seems like if you have multiple interfaces - there could be "gotchas" in the start-up.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Thank you,

I would really appreciate if there is a way to troubleshoot this. The only tweak for now is to add ExecStartPre=/bin/sleep 30 but this is not they I'd like to work. Could be a bug within Debian 12 Bookworm?
 
I would really appreciate if there is a way to troubleshoot this. The only tweak for now is to add ExecStartPre=/bin/sleep 30 but this is not they I'd like to work. Could be a bug within Debian 12 Bookworm?
It could be. This service is a relatively new addition. There could be an edge case specific to your setup that it does not expect.

Replace "sleep" with a bash script that collects data, ie state of interfaces, IP connectivity, module load status, etc. Have it run in a loop every 5 seconds for 30 seconds. Perhaps you can spot a difference in state that might explain the behavior.
An obvious suspect is that while the system thinks that the network is online, in reality, it's not fully up.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
It could be. This service is a relatively new addition. There could be an edge case specific to your setup that it does not expect.

Replace "sleep" with a bash script that collects data, ie state of interfaces, IP connectivity, module load status, etc. Have it run in a loop every 5 seconds for 30 seconds. Perhaps you can spot a difference in state that might explain the behavior.
An obvious suspect is that while the system thinks that the network is online, in reality, it's not fully up.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Thank you @bbgeek17 , is there any reference I can see regarding this bash script?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!