100G Mellanox NICs show as 'degraded'

stevehughes · Sep 16, 2024

Hi,

I've noticed that all of the 100G Mellanox NICs in my hosts are shown as 'degraded' by networkctl, although they seem to be working just fine. I understand that a bond can be degraded, but a single NIC? and all of the 100G NICs? Could this be a driver limitation? The enp197... and the enp1.... NICs are the 100G.

The enp67... NICs are Intel 10G and they display as expected.

root@ld-pve1:~# networkctl
IDX LINK TYPE OPERATIONAL SETUP
1 lo loopback carrier unmanaged
2 enp129s0f0 ether off unmanaged
3 enp129s0f1 ether off unmanaged
4 enp67s0f0 ether enslaved unmanaged
5 enp67s0f1 ether enslaved unmanaged
6 enp197s0f0np0 ether degraded unmanaged
7 enp197s0f1np1 ether degraded unmanaged
8 enxfecb34ba9faa ether off unmanaged
9 enp1s0f0np0 ether degraded unmanaged
10 enp1s0f1np1 ether degraded unmanaged

waltar · Sep 16, 2024

Could it be that these 100Gb are older pcie3 x16 cards and you have them in pci4 x8 slots so than you have degraded mode as the card isn't able to drive pcie mode4 performance and cannot reach full bandwidth as there are just 8 lanes ?!

Blackice504 · Sep 16, 2024

waltar said:
Could it be that these 100Gb are older pcie3 x16 cards and you have them in pci4 x8 slots so than you have degraded mode as the card isn't able to drive pcie mode4 performance and cannot reach full bandwidth as there are just 8 lanes ?!

Pcie is backwards compatible this should not be the issue i do know where you might have got that idea from i was reading an article about PCIe 7.0 may not be backwards compatible but now the big wigs at PCIsig have confirmed that even PCIe 7.0 aka ludicrous speed go "Delivering 128 GT/s data rate and up to 512 GB/s bi-directionally via x16 configuration" this would not be an issue, I would say its something else....

Also i not sure if you did not include the message or it might not have come up but in proxmox 8.2.2 I get the following

root@VMSERVER01:~# networkctl
WARNING: systemd-networkd is not running, output will be incomplete. < could be just me but my setup is just fresh and i have one nic in passthrough for opnsense WAN and two 10G

IDX LINK TYPE OPERATIONAL SETUP
1 lo loopback - unmanaged
2 eno0 ether - unmanaged
4 ens3f0 ether - unmanaged
5 ens3f1 ether - unmanaged
6 vmbr0 bridge - unmanaged
7 tap100i0 ether - unmanaged
8 tap101i0 ether - unmanaged

But that warning of systemd-networkd is not running, output will be incomplete. <<< raises my eyebrows now i am new to proxmox so take it or leave it.

I would try ethtool -i %eth%

see what ethtools says, the other would be heat i would think those cards generate some heat, so try the lm-sensors package run through the setup check temps. clearly sensors for HOST.

I have some ghost cards showing up.

waltar · Sep 16, 2024

Haha, yes pcie is backward compatible but old hardware is not speed forward compatible ...

We had omnipath (pcie 3 x16) cards in pcie 4 x8 slots running with degraded performance and after moving into pcie 4 x16 slots they were run as expected.

Blackice504 · Sep 16, 2024

waltar said:
Haha, yes pcie is backward compatible but old hardware is not speed forward compatible ... We had omnipath (pcie 3 x16) cards in pcie 4 x8 slots running with degraded performance and after moving into pcie 4 x16 slots they were run as expected.

Yes that's expected, i should have added that, but did not think about it as its well known, one thing that shits me off about board makers "ALL of them"

1st slot for pcie5.0 that is shared with nvme googus.

If you have a gpu or other pcie4.0 and pcie4 nvme they share port that is on the pcie 5.0 port thus bifirication funtime why does it have to reduce the 16x pcie4.0 to 8x @pcie4.0 the nvme stays the same, ok fine i get that but i wish they would have had some sort of downsteam pcie 5.0 switch to allow both running at full speed, but nope they just direct despite not much is out for pci-e v5.0

me personally i would have made extra for that function.

My solution was to pretend the nvme port does not exist.

however i found that another slot 8x was also shared so i ended up with the same issue anyway, gpu 8 / nic 8 now the nic is pci-e v3.0 but gpu pci-e v4. and they run at there full v level but 8 pipes which was nice.

I would have easily paid extra for a pcie-5.0 switchchip the funny thing is they did it in the past.

when i figure out my issue with only pissant 2Gbits on 10G i will reboot into linux on my workstation and see what it says about the 10g card that shares with my gpu, but i doubt it will show up degraded, i think the OP might have some hardware issue as those cards run hot and the sfp run hot and as much as i love Debian they never have added lm-sensors or some sort of sensor so the OS relies on Bios for fan funtime which is fine in some cases but those fans are normally regulated by cpu temp including the case fans plus since they are hardware nics cpu usage could be low enough that there might be a case of high nic traffic and the cpu barely gets to 4% thus not trigging the fans while the nics are roasting.

Plus some people do change the fan settings to be silent or lower for x reason especially in home labs.

Blackice504 · Sep 16, 2024

stevehughes said:
Hi,

I've noticed that all of the 100G Mellanox NICs in my hosts are shown as 'degraded' by networkctl, although they seem to be working just fine. I understand that a bond can be degraded, but a single NIC? and all of the 100G NICs? Could this be a driver limitation? The enp197... and the enp1.... NICs are the 100G.

The enp67... NICs are Intel 10G and they display as expected.

root@ld-pve1:~# networkctl
IDX LINK TYPE OPERATIONAL SETUP
1 lo loopback carrier unmanaged
2 enp129s0f0 ether off unmanaged
3 enp129s0f1 ether off unmanaged
4 enp67s0f0 ether enslaved unmanaged
5 enp67s0f1 ether enslaved unmanaged
6 enp197s0f0np0 ether degraded unmanaged
7 enp197s0f1np1 ether degraded unmanaged
8 enxfecb34ba9faa ether off unmanaged
9 enp1s0f0np0 ether degraded unmanaged
10 enp1s0f1np1 ether degraded unmanaged

I found another thing you might want to run to lspci -s locationon pci -vvnnn example lspci -s 0000:02:00.1

stevehughes · Sep 17, 2024

Thanks all for your thoughts.

waltar - The cards are PCIe4 x16 and are running in PCIe4 x16 slots.

Blackice504 - yes a good point. It's not just the CPU that is temp sensitive. My fans are running fast even with idle CPU so the transceivers stay cool. The noise level is unimportant since the servers are in a datacentre. Also, I have systemd-networkd running so that I get more detail.

Digging a bit further I found This discussion which fully explains what I'm seeing. The 100G NICs are the storage network which is not routable. They had IPv6 link-local addresses but no connectivity on IPv6.

I created a new .conf file in /etc/sysctl.d/ directory with the below content.

net.ipv6.conf.enp197s0f0np0.disable_ipv6=1
net.ipv6.conf.enp197s0f1np1.disable_ipv6=1
net.ipv6.conf.enp1s0f0np0.disable_ipv6=1
net.ipv6.conf.enp1s0f1np1.disable_ipv6=1

This disables the IPv6 on the storage NICs and fixes the issue, and they now show as 'carrier' rather than 'degraded'. They show in white rather than green, but that's a whole lot better than 'degraded'.

Blackice504 · Sep 17, 2024

IPV6 is always causing some issues.

Search

Search

100G Mellanox NICs show as 'degraded'

stevehughes

Member

waltar

Active Member

Blackice504

New Member

waltar

Active Member

Blackice504

New Member

Blackice504

New Member

stevehughes

Member

Blackice504

New Member