On a multi-node cluster with an EVPN SDN zone, the zone status is permanently shown as pending on a single node, while all other nodes show available.
It appears that the root cause is pvestatd failing to parse the output of ifquery -a -c -o json, because on that one node ifquery aborts with:
The SDN configuration and package versions are identical across all nodes. The only difference I can find between the failing node and the working ones is the physical NIC name (enp0s31f6 on the failing node vs eno1 / nic0 on the working ones). The data plane is completely unaffected — this appears to be cosmetic, but the status flag and the 10-second log spam are persistent and survive reboots.
root@prox1:~# pveversion -v
proxmox-ve: 9.2.0 (running kernel: 7.0.6-2-pve)
pve-manager: 9.2.3 (running version: 9.2.3/d0fde103346cf89a)
proxmox-kernel-helper: 9.2.0
proxmox-kernel-7.0: 7.0.6-2
proxmox-kernel-7.0.6-2-pve-signed: 7.0.6-2
proxmox-kernel-6.17: 6.17.13-13
proxmox-kernel-6.17.13-13-pve-signed: 6.17.13-13
proxmox-kernel-6.17.13-2-pve-signed: 6.17.13-2
proxmox-kernel-6.17.2-1-pve-signed: 6.17.2-1
ceph: 20.2.1-pve1
ceph-fuse: 20.2.1-pve1
corosync: 3.1.10-pve2
criu: 4.1.1-1
frr-pythontools: 10.6.1-1+pve2
ifupdown2: 3.3.0-1+pmx12
intel-microcode: 3.20260227.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.1
libproxmox-backup-qemu0: 2.0.2
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.1.1
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.1.6
libpve-cluster-perl: 9.1.6
libpve-common-perl: 9.1.13
libpve-guest-common-perl: 6.0.3
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.6.6
libpve-notify-perl: 9.1.6
libpve-rs-perl: 0.15.3
libpve-storage-perl: 9.1.5
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 7.0.0-2
lxcfs: 7.0.0-pve1
novnc-pve: 1.7.0-1
proxmox-backup-client: 4.2.1-1
proxmox-backup-file-restore: 4.2.1-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.3
proxmox-kernel-helper: 9.2.0
proxmox-mail-forward: 1.0.3
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.4
proxmox-widget-toolkit: 5.2.3
pve-cluster: 9.1.6
pve-container: 6.1.10
pve-docs: 9.2.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.18-4
pve-ha-manager: 5.2.4
pve-i18n: 3.7.5
pve-qemu-kvm: 11.0.0-4
pve-xtermjs: 6.0.0-1
qemu-server: 9.1.16
smartmontools: 7.5-pve2
spiceterm: 3.4.2
swtpm: 0.8.0+pve3
vncterm: 1.9.2
zfsutils-linux: 2.4.2-pve1
prox5 is the failing node:
root@prox5:~# pveversion -v
proxmox-ve: 9.2.0 (running kernel: 7.0.6-2-pve)
pve-manager: 9.2.3 (running version: 9.2.3/d0fde103346cf89a)
proxmox-kernel-helper: 9.2.0
proxmox-kernel-7.0: 7.0.6-2
proxmox-kernel-7.0.6-2-pve-signed: 7.0.6-2
proxmox-kernel-6.17: 6.17.13-13
proxmox-kernel-6.17.13-13-pve-signed: 6.17.13-13
proxmox-kernel-6.17.13-2-pve-signed: 6.17.13-2
proxmox-kernel-6.14: 6.14.11-9
proxmox-kernel-6.14.11-9-pve-signed: 6.14.11-9
proxmox-kernel-6.8: 6.8.12-15
proxmox-kernel-6.8.12-15-pve-signed: 6.8.12-15
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph: 20.2.1-pve1
ceph-fuse: 20.2.1-pve1
corosync: 3.1.10-pve2
criu: 4.1.1-1
frr-pythontools: 10.6.1-1+pve2
ifupdown2: 3.3.0-1+pmx12
intel-microcode: 3.20240813.2
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.1
libproxmox-backup-qemu0: 2.0.2
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.1.1
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.1.6
libpve-cluster-perl: 9.1.6
libpve-common-perl: 9.1.13
libpve-guest-common-perl: 6.0.3
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.6.6
libpve-notify-perl: 9.1.6
libpve-rs-perl: 0.15.3
libpve-storage-perl: 9.1.5
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 7.0.0-2
lxcfs: 7.0.0-pve1
novnc-pve: 1.7.0-1
proxmox-backup-client: 4.2.1-1
proxmox-backup-file-restore: 4.2.1-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.3
proxmox-kernel-helper: 9.2.0
proxmox-mail-forward: 1.0.3
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.4
proxmox-widget-toolkit: 5.2.3
pve-cluster: 9.1.6
pve-container: 6.1.10
pve-docs: 9.2.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.18-4
pve-ha-manager: 5.2.4
pve-i18n: 3.7.5
pve-qemu-kvm: 11.0.0-4
pve-xtermjs: 6.0.0-1
qemu-server: 9.1.16
smartmontools: 7.5-pve2
spiceterm: 3.4.2
swtpm: 0.8.0+pve3
vncterm: 1.9.2
zfsutils-linux: 2.4.2-pve1
In Datacenter → SDN, the public zone shows pending on one node (here prox5) and available on all others.
pvestatd logs the following every poll cycle, on the affected node only:
pvestatd[...]: sdn status update error: malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 200.
pvestatd[...]: network status update error: malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 200.
Zones.pm line 200 is the ifquery_check path, which runs ifquery -a -c -o json.
The "(end of string)" at offset 0 is an empty stdout — ifquery produced no JSON because it aborted.
root@prox5:~# ifquery -a -c -o json
error: main exception: cycle found involving iface dmz (indegree 1)
On every other node the identical command returns valid JSON (truncated):
root@prox1:~# ifquery -a -c -o json | head
[
{
"name": "lo",
"addr_method": "loopback",
"addr_family": "inet",
"auto": true,
"config": {},
"config_status": {},
"status": "pass"
}
etc
root@prox5:~# ifquery --print-dependency=list dmz
lo : []
enp0s31f6 : []
vmbr0 : ['enp0s31f6']
vmbr21 : ['enp0s31f6.21']
vmbr20 : ['enp0s31f6.20']
vmbr12 : ['enp0s31f6.12']
vmbr10 : ['enp0s31f6.10']
vmbr11 : ['enp0s31f6.11']
vmbr14 : ['enp0s31f6.14']
dmz : ['vxlan_dmz']
vrf_public : ['dmz', 'vrfbr_public']
vrfbr_public : ['vrfvx_public']
vrfvx_public : []
vxlan_dmz : []
enp0s31f6.21 : ['enp0s31f6']
enp0s31f6.20 : ['enp0s31f6']
enp0s31f6.12 : ['enp0s31f6']
enp0s31f6.10 : ['enp0s31f6']
enp0s31f6.11 : ['enp0s31f6']
enp0s31f6.14 : ['enp0s31f6']
root@prox5:~# ip -d link show dmz | grep master
17: dmz: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf_public state UP mode DEFAULT group default qlen 1000
Working node (prox1, NIC = nic0) — identical dependency relationships, only the NIC name differs:
root@prox1:~# ifquery --print-dependency=list dmz
lo : []
nic0 : []
wls2f3 : []
vmbr0 : ['nic0']
vmbr21 : ['nic0.21']
vmbr20 : ['nic0.20']
vmbr12 : ['nic0.12']
vmbr10 : ['nic0.10']
vmbr11 : ['nic0.11']
vmbr14 : ['nic0.14']
dmz : ['vxlan_dmz']
vrf_public : ['dmz', 'vrfbr_public']
vrfbr_public : ['vrfvx_public']
vrfvx_public : []
vxlan_dmz : []
nic0.21 : ['nic0']
nic0.20 : ['nic0']
nic0.12 : ['nic0']
nic0.10 : ['nic0']
nic0.11 : ['nic0']
nic0.14 : ['nic0']
root@prox1:~# ip -d link show dmz | grep master
18: dmz: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf_public state UP mode DEFAULT group default qlen 1000
The only thing I can see that is different is the name of the NIC.
Anyone have any suggestions? Thanks.
It appears that the root cause is pvestatd failing to parse the output of ifquery -a -c -o json, because on that one node ifquery aborts with:
error: main exception: cycle found involving iface dmz (indegree 1)
The SDN configuration and package versions are identical across all nodes. The only difference I can find between the failing node and the working ones is the physical NIC name (enp0s31f6 on the failing node vs eno1 / nic0 on the working ones). The data plane is completely unaffected — this appears to be cosmetic, but the status flag and the 10-second log spam are persistent and survive reboots.
Environment
- Proxmox VE: 9.2.3
- ifupdown2: 3.3.0-1+pmx12 (identical on all nodes)
- FRRouting: 10.6.1
- 4-node cluster, EVPN SDN zone with an external EVPN gateway (VyOS) as the L3 gateway
root@prox1:~# pveversion -v
proxmox-ve: 9.2.0 (running kernel: 7.0.6-2-pve)
pve-manager: 9.2.3 (running version: 9.2.3/d0fde103346cf89a)
proxmox-kernel-helper: 9.2.0
proxmox-kernel-7.0: 7.0.6-2
proxmox-kernel-7.0.6-2-pve-signed: 7.0.6-2
proxmox-kernel-6.17: 6.17.13-13
proxmox-kernel-6.17.13-13-pve-signed: 6.17.13-13
proxmox-kernel-6.17.13-2-pve-signed: 6.17.13-2
proxmox-kernel-6.17.2-1-pve-signed: 6.17.2-1
ceph: 20.2.1-pve1
ceph-fuse: 20.2.1-pve1
corosync: 3.1.10-pve2
criu: 4.1.1-1
frr-pythontools: 10.6.1-1+pve2
ifupdown2: 3.3.0-1+pmx12
intel-microcode: 3.20260227.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.1
libproxmox-backup-qemu0: 2.0.2
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.1.1
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.1.6
libpve-cluster-perl: 9.1.6
libpve-common-perl: 9.1.13
libpve-guest-common-perl: 6.0.3
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.6.6
libpve-notify-perl: 9.1.6
libpve-rs-perl: 0.15.3
libpve-storage-perl: 9.1.5
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 7.0.0-2
lxcfs: 7.0.0-pve1
novnc-pve: 1.7.0-1
proxmox-backup-client: 4.2.1-1
proxmox-backup-file-restore: 4.2.1-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.3
proxmox-kernel-helper: 9.2.0
proxmox-mail-forward: 1.0.3
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.4
proxmox-widget-toolkit: 5.2.3
pve-cluster: 9.1.6
pve-container: 6.1.10
pve-docs: 9.2.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.18-4
pve-ha-manager: 5.2.4
pve-i18n: 3.7.5
pve-qemu-kvm: 11.0.0-4
pve-xtermjs: 6.0.0-1
qemu-server: 9.1.16
smartmontools: 7.5-pve2
spiceterm: 3.4.2
swtpm: 0.8.0+pve3
vncterm: 1.9.2
zfsutils-linux: 2.4.2-pve1
prox5 is the failing node:
root@prox5:~# pveversion -v
proxmox-ve: 9.2.0 (running kernel: 7.0.6-2-pve)
pve-manager: 9.2.3 (running version: 9.2.3/d0fde103346cf89a)
proxmox-kernel-helper: 9.2.0
proxmox-kernel-7.0: 7.0.6-2
proxmox-kernel-7.0.6-2-pve-signed: 7.0.6-2
proxmox-kernel-6.17: 6.17.13-13
proxmox-kernel-6.17.13-13-pve-signed: 6.17.13-13
proxmox-kernel-6.17.13-2-pve-signed: 6.17.13-2
proxmox-kernel-6.14: 6.14.11-9
proxmox-kernel-6.14.11-9-pve-signed: 6.14.11-9
proxmox-kernel-6.8: 6.8.12-15
proxmox-kernel-6.8.12-15-pve-signed: 6.8.12-15
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph: 20.2.1-pve1
ceph-fuse: 20.2.1-pve1
corosync: 3.1.10-pve2
criu: 4.1.1-1
frr-pythontools: 10.6.1-1+pve2
ifupdown2: 3.3.0-1+pmx12
intel-microcode: 3.20240813.2
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.1
libproxmox-backup-qemu0: 2.0.2
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.1.1
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.1.6
libpve-cluster-perl: 9.1.6
libpve-common-perl: 9.1.13
libpve-guest-common-perl: 6.0.3
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.6.6
libpve-notify-perl: 9.1.6
libpve-rs-perl: 0.15.3
libpve-storage-perl: 9.1.5
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 7.0.0-2
lxcfs: 7.0.0-pve1
novnc-pve: 1.7.0-1
proxmox-backup-client: 4.2.1-1
proxmox-backup-file-restore: 4.2.1-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.3
proxmox-kernel-helper: 9.2.0
proxmox-mail-forward: 1.0.3
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.4
proxmox-widget-toolkit: 5.2.3
pve-cluster: 9.1.6
pve-container: 6.1.10
pve-docs: 9.2.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.18-4
pve-ha-manager: 5.2.4
pve-i18n: 3.7.5
pve-qemu-kvm: 11.0.0-4
pve-xtermjs: 6.0.0-1
qemu-server: 9.1.16
smartmontools: 7.5-pve2
spiceterm: 3.4.2
swtpm: 0.8.0+pve3
vncterm: 1.9.2
zfsutils-linux: 2.4.2-pve1
Symptom
In Datacenter → SDN, the public zone shows pending on one node (here prox5) and available on all others.
pvestatd logs the following every poll cycle, on the affected node only:
pvestatd[...]: sdn status update error: malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 200.
pvestatd[...]: network status update error: malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 200.
Zones.pm line 200 is the ifquery_check path, which runs ifquery -a -c -o json.
The "(end of string)" at offset 0 is an empty stdout — ifquery produced no JSON because it aborted.
The actual failure
Running the command manually on the affected node:root@prox5:~# ifquery -a -c -o json
error: main exception: cycle found involving iface dmz (indegree 1)
On every other node the identical command returns valid JSON (truncated):
root@prox1:~# ifquery -a -c -o json | head
[
{
"name": "lo",
"addr_method": "loopback",
"addr_family": "inet",
"auto": true,
"config": {},
"config_status": {},
"status": "pass"
}
etc
Failing node (prox5, NIC = enp0s31f6):root@prox5:~# ifquery --print-dependency=list dmz
lo : []
enp0s31f6 : []
vmbr0 : ['enp0s31f6']
vmbr21 : ['enp0s31f6.21']
vmbr20 : ['enp0s31f6.20']
vmbr12 : ['enp0s31f6.12']
vmbr10 : ['enp0s31f6.10']
vmbr11 : ['enp0s31f6.11']
vmbr14 : ['enp0s31f6.14']
dmz : ['vxlan_dmz']
vrf_public : ['dmz', 'vrfbr_public']
vrfbr_public : ['vrfvx_public']
vrfvx_public : []
vxlan_dmz : []
enp0s31f6.21 : ['enp0s31f6']
enp0s31f6.20 : ['enp0s31f6']
enp0s31f6.12 : ['enp0s31f6']
enp0s31f6.10 : ['enp0s31f6']
enp0s31f6.11 : ['enp0s31f6']
enp0s31f6.14 : ['enp0s31f6']
root@prox5:~# ip -d link show dmz | grep master
17: dmz: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf_public state UP mode DEFAULT group default qlen 1000
Working node (prox1, NIC = nic0) — identical dependency relationships, only the NIC name differs:
root@prox1:~# ifquery --print-dependency=list dmz
lo : []
nic0 : []
wls2f3 : []
vmbr0 : ['nic0']
vmbr21 : ['nic0.21']
vmbr20 : ['nic0.20']
vmbr12 : ['nic0.12']
vmbr10 : ['nic0.10']
vmbr11 : ['nic0.11']
vmbr14 : ['nic0.14']
dmz : ['vxlan_dmz']
vrf_public : ['dmz', 'vrfbr_public']
vrfbr_public : ['vrfvx_public']
vrfvx_public : []
vxlan_dmz : []
nic0.21 : ['nic0']
nic0.20 : ['nic0']
nic0.12 : ['nic0']
nic0.10 : ['nic0']
nic0.11 : ['nic0']
nic0.14 : ['nic0']
root@prox1:~# ip -d link show dmz | grep master
18: dmz: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf_public state UP mode DEFAULT group default qlen 1000
The only thing I can see that is different is the name of the NIC.
Anyone have any suggestions? Thanks.