Intel E810-C 100G + Dell S532F-ON = Headaches

AllanM

Well-Known Member
Oct 17, 2019
119
39
48
41
Hello Proxmox Community!

We are attempting to upgrade the core networking from 10Gb to 100Gb with a Dell S5232F at the helm for the cluster and a Dell S5248F for all of our network access switches to uplink to, intending to use 100Gb uplinks between the switches. The Switches have SONiC open source NOS installed and configured with all of our VLANS tagged on appropriate ports.

Info: 6 node cluster of SuperMicro 2113S-WTRT w/7402P and DWNRF (Dell variant) E810-C 100GbE NIC. We had to switch the FEC mode on the Dell switch to work with these E810 cards, but finally got link lights and then migrated the "bridge ports" of all of our Linux Bridges over from existing 10Gb infrastructure (Intel X710-T + Netgear XS748T) to the new 100Gb interfaces.

Everything appeared to be working fine at first , but the next day we started shutting down nodes 1 at a time for some housekeeping maintenance. Upon booting those machines again, many of the VM's had lost some or all network communication. DHCP responses would sometimes get through, and sometimes a blip of data here and there. Some Bridges on some nodes continued to work. We spent the day trying to understand it, moving workloads around. Luckily we had 2 nodes that we hadn't yet shut down, so we moved all business critical workloads to those nodes and survived that way as we tried to fix it, hoping not to have to revert to the 10Gb networking.

Our logs are currently extremely noisy because of the "auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring:" bug thing... I have read that this will probably be resolved in an update soon. I was able to find the following which might be related

After booting:
Code:
Mar 18 16:10:39 px3 kernel: tap10000i5: entered promiscuous mode
Mar 18 16:10:39 px3 kernel: vmbr104: port 2(tap10000i5) entered blocking state
Mar 18 16:10:39 px3 kernel: vmbr104: port 2(tap10000i5) entered disabled state
Mar 18 16:10:39 px3 kernel: tap10000i5: entered allmulticast mode
Mar 18 16:10:39 px3 kernel: vmbr104: port 2(tap10000i5) entered blocking state
Mar 18 16:10:39 px3 kernel: vmbr104: port 2(tap10000i5) entered forwarding state
(this pattern continues for all VMBR's)

A little while later:
Code:
Mar 18 16:20:56 px3 kernel: tap10000i5: left allmulticast mode
Mar 18 16:20:56 px3 kernel: vmbr104: port 2(tap10000i5) entered disabled state
(This repeats for SOME VMBR's)

I don't see anything obvious about "why" this is happening. Only that it is.

This may be a red herring though, as we also lost connectivity on the containers running on the 2 servers that hadn't been shut down yet, but most VM's on those continued to work. It's worth noting that any VM's on the same node could still communicate to each other. Communication from VM to VM on different nodes is where we had major problems, and also accessing those VM's from the rest of the network was hit and miss depending on which node they were hosted on at the time. We reviewed our network configs many times and saw nothing to explain the problems we were seeing.

Proxmox had no problem maintaining CEPH Public and CEPH Cluster and CoroSync (2) networks on the new 100Gb hardware throughout all this debacle. Any "directly" assigned IP's for the proxmox hosts to use on specific VLANs or Bridges worked flawlessly from what we could tell. The problem was only happening where the network had to connect to the VM or container.

------------

Currently I'm leaning into 3 possible areas.

1. SR-IOV is enabled on the motherboard BIOS and in the specific config for these network cards, however, I don't think we are using any of the features of SR-IOV, unless proxmox attempts to take advantage of this automatically at the Linux Bridge. Could having this enabled cause problems? I am tempted to disable on both motherboard and card settings in BIOS.
2. Does DCB "Data Center Bridging" support need to be working on the NIC for proper functionality in a Linux Bridge? I was reviewing our settings on these new 100Gb NICs and I believe DCB is currently being force-disabled due to another setting that we enabled while attempting to get link lights to the Dell switches.
3. I've read a lot about people having problems with older Mellanox cards and various VLAN offload and other capabilities not working properly that can cause issues like this. I noticed that we have had parallels with Mellanox cards in other areas with these E810 Intel cards, having to use different FEC mode on the switch for example. Could something like this be related to our problem?

------------

Some more info...

Code:
43:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
        Subsystem: Intel Corporation Ethernet 100G 2P E810-C Adapter
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 98
        IOMMU group: 32
        Region 0: Memory at 28078000000 (64-bit, prefetchable) [size=32M]
        Region 3: Memory at 2807e000000 (64-bit, prefetchable) [size=64K]
        Expansion ROM at b0300000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] MSI-X: Enable+ Count=1024 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00008000
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 512 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (downgraded), Width x16
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [e0] Vital Product Data
                Product Name: E810-C 100GbE Controller
                Read-only fields:
                        [V0] Vendor specific: FFV20.5.13\x00
                        [PN] Part number: DWNRF
                        [MN] Manufacture ID: 1028
                        [V1] Vendor specific: DSV1028VPDR.VER2.2
                        [V3] Vendor specific: DTINIC
                        [V4] Vendor specific: DCM1001FFFFFF2101FFFFFF
                        [VA] Vendor specific: \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
                        [V5] Vendor specific: NPY2
                        [V6] Vendor specific: PMTD
                        [V7] Vendor specific: NMVIntel Corp
                        [V8] Vendor specific: L1D0
                        [V9] Vendor specific: LNK164163
                        [RV] Reserved: checksum good, 2 byte(s) reserved
                Read/write fields:
                        [Y0] System specific: CCF1
                End
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [150 v1] Device Serial Number 40-a6-b7-ff-ff-95-9d-08
        Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- 10BitTagReq-
                IOVSta: Migration-
                Initial VFs: 128, Total VFs: 128, Number of VFs: 0, Function Dependency Link: 01
                VF offset: 135, stride: 1, Device ID: 1889
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 000002807c000000 (64-bit, prefetchable)
                Region 3: Memory at 000002807e020000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [1a0 v1] Transaction Processing Hints
                Device specific mode supported
                No steering table available
        Capabilities: [1b0 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [200 v1] Data Link Feature <?>
        Kernel driver in use: ice
        Kernel modules: ice

Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.13-1-pve)
pve-manager: 8.1.4 (running version: 8.1.4/ec5affc9e41f1d79)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
proxmox-kernel-6.5: 6.5.13-1
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-6-pve-signed: 6.5.11-6
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
proxmox-kernel-6.2.16-18-pve: 6.2.16-18
proxmox-kernel-6.2.16-15-pve: 6.2.16-15
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
proxmox-kernel-6.2.16-6-pve: 6.2.16-7
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.107-2-pve: 5.15.107-2
ceph: 18.2.1-pve2
ceph-fuse: 18.2.1-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: not correctly installed
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.2
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.1.0
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.4-1
proxmox-backup-file-restore: 3.1.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.5
proxmox-widget-toolkit: 4.1.4
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.4
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.9-2
pve-ha-manager: 4.0.3
pve-i18n: 3.2.1
pve-qemu-kvm: 8.1.5-3
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve2

Thanks for looking. If anyone has experience getting one of these E810 NIC's working on proxmox I could use any tips or advise! I'm sure we'll figure it out eventually but hoping someone may have a shortcut!


Regards,
-Eric
 
Last edited:
Please post your /etc/network/interfaces file and the output of: ethtool INTERFACENAME and
ethtool --show-priv-flags INTERFACENAME
 
Last edited:
Hi jsterr,

15 character limit on bridge port name forced use of linux vlans for this config. We tested the 2-digit vlans directly bridged to the the interface with .XX and it made no differences. We also tested SDN and it made no difference, same problems. I do not believe our interfaces file has anything do do with the problem.

This is a sanitized backup from when we were attempting to use the new 100Gb interfaces, which are enp670f0np0 / enp670f1np1. That has since been abandoned (we reverted back to eno and ens interfaces).


Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

source /etc/network/interfaces.d/*

auto lo
iface lo inet loopback

auto eno2np1
iface eno2np1 inet manual
#XXX

auto eno1np0
iface eno1np0 inet static
    address 10.XXX.XXX.XXX/24
#CORO1

auto enp67s0f0np0
iface enp67s0f0np0 inet manual
#FASTAF1

auto enp67s0f1np1
iface enp67s0f1np1 inet manual
#FASTAF2

iface ens8f1 inet manual

iface ens8f2 inet manual

iface ens8f3 inet manual

auto ens8f0
iface ens8f0 inet manual
#TEST

iface vmbr0 inet manual
    bridge-ports none
    bridge-stp off
    bridge-fd 0
#DIMENTIONAL PORTAL TO NOWHERE

auto vmbr1
iface vmbr1 inet static
    address 10.XXX.XXX.XXX/24
    gateway 10.XXX.XXX.XXX
    bridge-ports eno2np1
    bridge-stp off
    bridge-fd 0
#XXX

auto vmbr100
iface vmbr100 inet manual
    bridge-ports V100
    bridge-stp off
    bridge-fd 0
#XXXXX

auto vmbr101
iface vmbr101 inet manual
    bridge-ports V101
    bridge-stp off
    bridge-fd 0
#XXXXX

auto vmbr102
iface vmbr102 inet manual
    bridge-ports V102
    bridge-stp off
    bridge-fd 0
#XXXXX

auto vmbr103
iface vmbr103 inet manual
    bridge-ports V103
    bridge-stp off
    bridge-fd 0
#XXXXX

auto vmbr104
iface vmbr104 inet manual
    bridge-ports V104
    bridge-stp off
    bridge-fd 0
#XXXXXXXXXXX

auto vmbr105
iface vmbr105 inet manual
    bridge-ports V105
    bridge-stp off
    bridge-fd 0
#XXXXX

auto vmbr188
iface vmbr188 inet static
    address 10.XXX.XXX.XXX/24
    bridge-ports V188
    bridge-stp off
    bridge-fd 0
#XXXXXXXXX

auto vmbr201
iface vmbr201 inet manual
    bridge-ports V201
    bridge-stp off
    bridge-fd 0
#XXX

auto vmbr2010
iface vmbr2010 inet manual
    bridge-ports none
    bridge-stp off
    bridge-fd 0
    bridge_ageing 0
#XXX SPAN

auto vmbr1010
iface vmbr1010 inet manual
    bridge-ports V1010
    bridge-stp off
    bridge-fd 0
#XXX XXXX 1 IO

auto vmbr50
iface vmbr50 inet manual
    bridge-ports V50
    bridge-stp off
    bridge-fd 0
#XXXXXXX

auto vmbr202
iface vmbr202 inet manual
    bridge-ports V202
    bridge-stp off
    bridge-fd 0
#XXX

auto vmbr1000
iface vmbr1000 inet manual
    bridge-ports none
    bridge-stp off
    bridge-fd 0
    bridge_ageing 0
#XXX SPAN

auto vmbr191
iface vmbr191 inet manual
    bridge-ports V191
    bridge-stp off
    bridge-fd 0
#XXXXXX

auto vmbr1910
iface vmbr1910 inet manual
    bridge-ports none
    bridge-stp off
    bridge-fd 0
    bridge_ageing 0
#XXXXXXXXX SPAN

auto vmbr10
iface vmbr10 inet manual
    bridge-ports V10
    bridge-stp off
    bridge-fd 0
#XXX-XXXXXXXXXX

auto vmbr30
iface vmbr30 inet manual
    bridge-ports V30
    bridge-stp off
    bridge-fd 0
#XXX

auto vmbr301
iface vmbr301 inet manual
    bridge-ports V301
    bridge-stp off
    bridge-fd 0
#XXX Subnet 301

auto vmbr302
iface vmbr302 inet manual
    bridge-ports V302
    bridge-stp off
    bridge-fd 0
#XXX Subnet 302

auto vmbr1011
iface vmbr1011 inet manual
    bridge-ports V1011
    bridge-stp off
    bridge-fd 0
#XXX IO 1.1

auto vmbr1012
iface vmbr1012 inet manual
    bridge-ports V1012
    bridge-stp off
    bridge-fd 0
#XXX IO 1.2

auto vmbr1021
iface vmbr1021 inet manual
    bridge-ports V1021
    bridge-stp off
    bridge-fd 0
#XXX IO 2.1

auto vmbr1022
iface vmbr1022 inet manual
    bridge-ports V1022
    bridge-stp off
    bridge-fd 0
#XXX IO 2.2

auto vmbr1020
iface vmbr1020 inet manual
    bridge-ports V1020
    bridge-stp off
    bridge-fd 0
#XXX XXXX 2 IO

auto vmbr1111
iface vmbr1111 inet manual
    bridge-ports none
    bridge-stp off
    bridge-fd 0
    bridge_ageing 0
#XXXX SPAN

auto vmbr0000
iface vmbr0000 inet manual
    bridge-ports enp67s0f1np1
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
#SDN TEST

auto V104
iface V104 inet manual
    vlan-id 104
    vlan-raw-device enp67s0f1np1
#XXXXXXXXXXX

auto V10
iface V10 inet manual
    vlan-id 10
    vlan-raw-device enp67s0f1np1
#XXXXXXXXXXX

auto V50
iface V50 inet manual
    vlan-id 50
    vlan-raw-device enp67s0f1np1
#XXXXXX

auto V100
iface V100 inet manual
    vlan-id 100
    vlan-raw-device enp67s0f1np1
#XXXXX

auto V101
iface V101 inet manual
    vlan-id 101
    vlan-raw-device enp67s0f1np1
#XXXXX

auto V102
iface V102 inet manual
    vlan-id 102
    vlan-raw-device enp67s0f1np1
#XXXXX

auto V103
iface V103 inet manual
    vlan-id 103
    vlan-raw-device enp67s0f1np1
#XXXXX

auto V105
iface V105 inet manual
    vlan-id 105
    vlan-raw-device enp67s0f1np1
#XXXXXXX

auto V188
iface V188 inet manual
    vlan-id 188
    vlan-raw-device enp67s0f1np1
#XXX

auto V191
iface V191 inet manual
    vlan-id 191
    vlan-raw-device enp67s0f1np1
#XXXXXX

auto V201
iface V201 inet manual
    vlan-id 201
    vlan-raw-device enp67s0f1np1
#XXX

auto V202
iface V202 inet manual
    vlan-id 202
    vlan-raw-device enp67s0f1np1
#XXX

auto V301
iface V301 inet manual
    vlan-id 301
    vlan-raw-device enp67s0f1np1
#XXX XXX 301

auto V302
iface V302 inet manual
    vlan-id 302
    vlan-raw-device enp67s0f1np1
#XXX XXX 302

auto V1010
iface V1010 inet manual
    vlan-id 1010
    vlan-raw-device enp67s0f1np1
#XXX XXXX 1 IO

auto V1020
iface V1020 inet manual
    vlan-id 1020
    vlan-raw-device enp67s0f1np1
#XXX XXXX 2 IO

auto V1011
iface V1011 inet manual
    vlan-id 1011
    vlan-raw-device enp67s0f1np1
#XXX IO 1.1

auto V1012
iface V1012 inet manual
    vlan-id 1012
    vlan-raw-device enp67s0f1np1
#XXX IO 1.2

auto V1021
iface V1021 inet manual
    vlan-id 1021
    vlan-raw-device enp67s0f1np1
#XXX IO 2.1

auto V1022
iface V1022 inet manual
    vlan-id 1022
    vlan-raw-device enp67s0f1np1
#XXX IO 2.2

auto V12
iface V12 inet static
    address 10.XXX.XXX.XXX/24
    vlan-id 12
    vlan-raw-device enp67s0f1np1
#CORO2-FAF

auto V21
iface V21 inet static
    address 10.XXX.XXX.XXX/24
    vlan-id 21
    vlan-raw-device enp67s0f0np0
#CEPH1-FAF

auto V22
iface V22 inet static
    address 10.XXX.XXX.XXX/24
    vlan-id 22
    vlan-raw-device enp67s0f0np0
#CEPH2-FAF

auto V30
iface V30 inet manual
    vlan-id 30
    vlan-raw-device enp67s0f1np1
#XXX


Code:
ethtool enp67s0f1np1
Settings for enp67s0f1np1:
        Supported ports: [ FIBRE ]
        Supported link modes:   25000baseCR/Full
                                50000baseCR2/Full
                                100000baseCR4/Full
                                10000baseCR/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Supported FEC modes: None        RS
        Advertised link modes:  25000baseCR/Full
                                50000baseCR2/Full
                                100000baseCR4/Full
                                10000baseCR/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: None       RS
        Speed: 100000Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: Direct Attach Copper
        PHYAD: 0
        Transceiver: internal
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes


Code:
ethtool --show-priv-flags enp67s0f1np1
Private flags for enp67s0f1np1:
link-down-on-close     : off
fw-lldp-agent          : on
vf-true-promisc-support: off
mdd-auto-reset-vf      : off
vf-vlan-pruning        : off
legacy-rx              : off

^^^
I have a hunch that the Firmware LLDP Agent being enabled may be the problem. I enabled this while attempting to get a link light early on. I didn't think it would matter but after looking closer there's a note in the firmware about how enabling this feature will disable Data Center Bridging (DCB), which may be something used by the linux bridging on proxmox.
 
Yes I would recommend disabling lldp Agent. Lldp and Intel nics used to have problems back in the days of x710 nics.
 
Hi Jsterr,

We have disabled the LLDP agent on the E810's. Will make another attempt to move networks over later this week.

Anyone have any thoughts on FEC? Should we be using it (RS mode) or None? I wonder if that could cause issues here.... Will follow-up later when we know more. Thanks
 
Following up...

We disabled the LLDP agent and also disabled SR-IOV on both motherboards and cards since we don't use that feature anyway, then moved the networks back over to the new E810's.

The Cluster and all networking has been working now for several weeks and through a normal monthly maintenance (update/reboot all nodes) cycle. I believe the LLDP agents were causing problems.

We are enjoying much faster migration speeds and ceph is no longer network performance bound anywhere. A slight improvement in overall disk access performance for VM's has been observed (faster booting/rebooting service loading performance). We also created a new NVME pool of storage in ceph and moved a few VM's over to it that could use more disk performance.

Seems like things are good!
 
  • Like
Reactions: B.Otto
Another follow-up.

All of the same problems returned a few weeks later. Totally random loss of communication on some networks between nodes. Ceph and Corosync seem to work fine on the E810+S532F's, but VLANS are a disaster. Coms will work randomly on some nodes and not on others. It's like there's another system admin operating a portal we can't see randomly blocking traffic in the switch. We reverted all networking back to our trusty 10Gb configuration.

At this point we believe the issue more likely resides with the S532F and the SONIC OS we loaded on it. I don't believe that particular open source project has anywhere near the same level of attention/effort that Proxmox has.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!