Issue on nodes pvestatd[616102]: storage 'PROXMOX_SSD_1' is not online

them00n

New Member
Oct 1, 2025
10
3
3
Hello.

We just installed Proxmox Cluster from 4 nodes. We have mounted NFS storage from NETAPP to use in Proxmox.
Launched some instances and we see 2 of 4 nodes have issue with NFS storage and every 20-30 minutes we see these errors in the journalctl -u pvestatd --no-pager -n 200

PHP:
Oct 16 17:41:40 proxmox03 pvestatd[5251]: status update time (9.203 seconds)
Oct 16 18:21:41 proxmox03 pvestatd[5251]: storage 'PROXMOX-SATA-1' is not online
Oct 16 18:21:51 proxmox03 pvestatd[5251]: storage 'PROXMOX_SSD_1' is not online
Oct 16 18:21:51 proxmox03 pvestatd[5251]: status update time (20.233 seconds)
Oct 16 18:21:57 proxmox03 pvestatd[5251]: got timeout
Oct 16 18:21:57 proxmox03 pvestatd[5251]: unable to activate storage 'PROXMOX_SSD_1' - directory '/mnt/pve/PROXMOX_SSD_1' does not exist or is unreachable
Oct 16 18:21:57 proxmox03 pvestatd[5251]: status update time (5.341 seconds)
Oct 16 18:22:03 proxmox03 pvestatd[5251]: got timeout
Oct 16 18:22:03 proxmox03 pvestatd[5251]: unable to activate storage 'PROXMOX_SSD_1' - directory '/mnt/pve/PROXMOX_SSD_1' does not exist or is unreachable
Oct 16 18:22:13 proxmox03 pvestatd[5251]: got timeout
Oct 16 18:22:13 proxmox03 pvestatd[5251]: unable to activate storage 'PROXMOX_SSD_1' - directory '/mnt/pve/PROXMOX_SSD_1' does not exist or is unreachable
Oct 16 18:37:01 proxmox03 pvestatd[5251]: storage 'PROXMOX_SSD_1' is not online
Oct 16 18:37:11 proxmox03 pvestatd[5251]: storage 'PROXMOX-SATA-1' is not online
Oct 16 18:37:12 proxmox03 pvestatd[5251]: status update time (20.242 seconds)
Oct 16 18:37:22 proxmox03 pvestatd[5251]: storage 'PROXMOX_SSD_1' is not online
Oct 16 18:37:22 proxmox03 pvestatd[5251]: status update time (10.256 seconds)
Oct 16 18:46:15 proxmox03 systemd[1]: Stopping pvestatd.service - PVE Status Daemon...
Oct 16 18:46:16 proxmox03 pvestatd[5251]: received signal TERM
Oct 16 18:46:16 proxmox03 pvestatd[5251]: server closing
Oct 16 18:46:16 proxmox03 pvestatd[5251]: server stopped
Oct 16 18:46:17 proxmox03 systemd[1]: pvestatd.service: Deactivated successfully.
Oct 16 18:46:17 proxmox03 systemd[1]: Stopped pvestatd.service - PVE Status Daemon.
Oct 16 18:46:17 proxmox03 systemd[1]: pvestatd.service: Consumed 34min 41.046s CPU time.
Oct 16 18:46:17 proxmox03 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Oct 16 18:46:17 proxmox03 pvestatd[616102]: starting server
Oct 16 18:46:18 proxmox03 systemd[1]: Started pvestatd.service - PVE Status Daemon.
Oct 16 18:57:37 proxmox03 pvestatd[616102]: storage 'PROXMOX_SSD_1' is not online
Oct 16 18:57:47 proxmox03 pvestatd[616102]: storage 'PROXMOX-SATA-1' is not online
Oct 16 18:57:47 proxmox03 pvestatd[616102]: status update time (20.256 seconds)
Oct 16 18:57:58 proxmox03 pvestatd[616102]: storage 'PROXMOX-SATA-1' is not online
Oct 16 18:58:07 proxmox03 pvestatd[616102]: got timeout
Oct 16 18:58:07 proxmox03 pvestatd[616102]: unable to activate storage 'PROXMOX_SSD_1' - directory '/mnt/pve/PROXMOX_SSD_1' does not exist or is unreachable
Oct 16 18:58:07 proxmox03 pvestatd[616102]: status update time (19.416 seconds)
Oct 16 18:58:09 proxmox03 pvestatd[616102]: got timeout
Oct 16 18:58:09 proxmox03 pvestatd[616102]: unable to activate storage 'PROXMOX_SSD_1' - directory '/mnt/pve/PROXMOX_SSD_1' does not exist or is unreachable
Oct 16 19:11:48 proxmox03 systemd[1]: Stopping pvestatd.service - PVE Status Daemon...
Oct 16 19:11:49 proxmox03 pvestatd[616102]: received signal TERM
Oct 16 19:11:49 proxmox03 pvestatd[616102]: server closing
Oct 16 19:11:49 proxmox03 pvestatd[616102]: server stopped
Oct 16 19:11:50 proxmox03 systemd[1]: pvestatd.service: Deactivated successfully.
Oct 16 19:11:50 proxmox03 systemd[1]: Stopped pvestatd.service - PVE Status Daemon.
Oct 16 19:11:50 proxmox03 systemd[1]: pvestatd.service: Consumed 35.951s CPU time.
Oct 16 19:11:50 proxmox03 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Oct 16 19:11:51 proxmox03 pvestatd[625844]: starting server
Oct 16 19:11:51 proxmox03 systemd[1]: Started pvestatd.service - PVE Status Daemon.
Oct 16 19:38:21 proxmox03 pvestatd[625844]: storage 'PROXMOX-SATA-1' is not online
Oct 16 19:38:31 proxmox03 pvestatd[625844]: storage 'PROXMOX_SSD_1' is not online
Oct 16 19:38:31 proxmox03 pvestatd[625844]: status update time (20.240 seconds)
Oct 16 19:38:38 proxmox03 pvestatd[625844]: got timeout
Oct 16 19:38:39 proxmox03 pvestatd[625844]: status update time (7.384 seconds)
Oct 16 19:38:43 proxmox03 pvestatd[625844]: got timeout

We use INTEL 10G SFP+ network cards with DAC cables.
Servers Dell R640
Using NFS v3 - options vers=3,rsize=65536,wsize=65536,proto=tcp,timeo=600,retrans=3

Anybody has this issue before? Ping is having packets loss in that time. It seems to me as STP Spanning Tree issue where port is blocked for several seconds from switch side.

Netapp storage is working with 10%-20% load it servers other UBUNTU and XENSERVER clients with no issues.

Kernel information:
Linux proxmox01 6.8.12-9-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-9 (2025-03-16T19:18Z) x86_64 GNU/Linux

Proxmox 8.4.0




Screenshot 2025-10-18 at 18.23.12.jpg
 
Last edited:
Hello.

Today we upgraded all Firmwares on Dell Server using DRAC and downloads.dell.com - issue still persists.
Today we changed also DAC cables. The same story.
Any help would be appreaciated. Maybe somebody form moderators can help us with this weird issue.


Our network config is the following
Code:
auto lo
iface lo inet loopback

auto eno1np0
iface eno1np0 inet manual
    mtu 1500

auto eno2np1
iface eno2np1 inet manual
    mtu 1500

auto ens3f0np0
iface ens3f0np0 inet manual
    mtu 1500

auto ens3f1np1
iface ens3f1np1 inet manual
    mtu 1500

auto bond0
iface bond0 inet manual
    bond-slaves ens3f0np0 ens3f1np1
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2+3
    mtu 1500

auto bond1
iface bond1 inet manual
    bond-slaves eno1np0 eno2np1
    bond-miimon 100
    bond-mode active-backup
    mtu 1500

# Main bridge for high-speed bond0: Handles VMs/containers with VLAN tagging
# Allowed VLANs: 81,86,805,806 (for VMs; host IPs moved to raw VLAN ifaces)
auto vmbr0
iface vmbr0 inet manual
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 81 86 805 806
    mtu 1500

# Secondary bridge for bond1: Dedicated to VLANs 807/808 for isolation (e.g., management VMs)
auto vmbr1
iface vmbr1 inet manual
    bridge-ports bond1
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 807 808
    mtu 1500

# Standalone VLAN subinterface and bridge for VLAN 130 (isolated)
auto bond0.130
iface bond0.130 inet manual
    vlan-raw-device bond0
    vlan-id 130
    mtu 1500

auto vmbr130
iface vmbr130 inet manual
    bridge-ports bond0.130
    bridge-stp off
    bridge-fd 0
    mtu 1500

# Standalone VLAN subinterface and bridge for VLAN 1111 (isolated)
auto bond0.1111
iface bond0.1111 inet manual
    vlan-raw-device bond0
    vlan-id 1111
    mtu 1500

auto vmbr1111
iface vmbr1111 inet manual
    bridge-ports bond0.1111
    bridge-stp off
    bridge-fd 0
    mtu 1500

# Host management IPs: On raw VLAN interfaces (no bridges for host-only to avoid conflicts)

# VLAN 805: Main management with gateway
auto bond0.805
iface bond0.805 inet static
    vlan-raw-device bond0
    vlan-id 805
    address 10.10.48.64/24
    mtu 1500

# VLAN 86: Additional host IP
auto bond0.86
iface bond0.86 inet static
    vlan-raw-device bond0
    vlan-id 86
    address 10.81.86.64/24
    mtu 1500

# VLAN 81: Additional host IP
auto bond0.81
iface bond0.81 inet static
    vlan-raw-device bond0
    vlan-id 81
    address 10.81.81.154/24
    mtu 1500

# Host IPs on bond1 VLANs (raw, no bridges)
auto bond1.807
iface bond1.807 inet static
    vlan-raw-device bond1
    vlan-id 807
    address 10.11.0.191/24
    mtu 1500

auto bond1.808
iface bond1.808 inet static
    vlan-raw-device bond1
    vlan-id 808
    address 10.12.0.191/24
    mtu 1500
 
Last edited:
So the issue is like this:

1. When you install empty node it works
2. ⁠if you add to cluster is gets ARP problems as we describe before
3. ⁠if you remove it from cluster it stills gets error because it was in cluster before
 
I like this forum very much like nobody is answering even the owners. Nice community, anyway we found solution
We are solving this like this, need to test


no more errors, no more drops.
1. config rx rings buffer:
Code:
/usr/sbin/ethtool -G eno1np0 rx 4096 tx 4096
/usr/sbin/ethtool -G eno2np1 rx 4096 tx 4096
/usr/sbin/ethtool -G ens3f0np0 rx 4096 tx 4096
/usr/sbin/ethtool -G ens3f1np1 rx 4096 tx 4096
2. modify bond1 for arp aging:
--solve --
nano /etc/network/if-up.d/bond1-arp

Code:
#!/bin/bash
if [ "$IFACE" = "bond1" ]; then
echo 1000 > /sys/class/net/bond1/bonding/arp_interval
echo all > /sys/class/net/bond1/bonding/arp_validate
echo +10.11.0.180 > /sys/class/net/bond1/bonding/arp_ip_target
echo +10.11.0.181 > /sys/class/net/bond1/bonding/arp_ip_target
fi

chmod +x /etc/network/if-up.d/bond1-arp
 
Last edited:
  • Like
Reactions: waltar