Hello Proxmox Community!
We are attempting to upgrade the core networking from 10Gb to 100Gb with a Dell S5232F at the helm for the cluster and a Dell S5248F for all of our network access switches to uplink to, intending to use 100Gb uplinks between the switches. The Switches have SONiC open source NOS installed and configured with all of our VLANS tagged on appropriate ports.
Info: 6 node cluster of SuperMicro 2113S-WTRT w/7402P and DWNRF (Dell variant) E810-C 100GbE NIC. We had to switch the FEC mode on the Dell switch to work with these E810 cards, but finally got link lights and then migrated the "bridge ports" of all of our Linux Bridges over from existing 10Gb infrastructure (Intel X710-T + Netgear XS748T) to the new 100Gb interfaces.
Everything appeared to be working fine at first , but the next day we started shutting down nodes 1 at a time for some housekeeping maintenance. Upon booting those machines again, many of the VM's had lost some or all network communication. DHCP responses would sometimes get through, and sometimes a blip of data here and there. Some Bridges on some nodes continued to work. We spent the day trying to understand it, moving workloads around. Luckily we had 2 nodes that we hadn't yet shut down, so we moved all business critical workloads to those nodes and survived that way as we tried to fix it, hoping not to have to revert to the 10Gb networking.
Our logs are currently extremely noisy because of the "auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring:" bug thing... I have read that this will probably be resolved in an update soon. I was able to find the following which might be related
After booting:
(this pattern continues for all VMBR's)
A little while later:
(This repeats for SOME VMBR's)
I don't see anything obvious about "why" this is happening. Only that it is.
This may be a red herring though, as we also lost connectivity on the containers running on the 2 servers that hadn't been shut down yet, but most VM's on those continued to work. It's worth noting that any VM's on the same node could still communicate to each other. Communication from VM to VM on different nodes is where we had major problems, and also accessing those VM's from the rest of the network was hit and miss depending on which node they were hosted on at the time. We reviewed our network configs many times and saw nothing to explain the problems we were seeing.
Proxmox had no problem maintaining CEPH Public and CEPH Cluster and CoroSync (2) networks on the new 100Gb hardware throughout all this debacle. Any "directly" assigned IP's for the proxmox hosts to use on specific VLANs or Bridges worked flawlessly from what we could tell. The problem was only happening where the network had to connect to the VM or container.
------------
Currently I'm leaning into 3 possible areas.
1. SR-IOV is enabled on the motherboard BIOS and in the specific config for these network cards, however, I don't think we are using any of the features of SR-IOV, unless proxmox attempts to take advantage of this automatically at the Linux Bridge. Could having this enabled cause problems? I am tempted to disable on both motherboard and card settings in BIOS.
2. Does DCB "Data Center Bridging" support need to be working on the NIC for proper functionality in a Linux Bridge? I was reviewing our settings on these new 100Gb NICs and I believe DCB is currently being force-disabled due to another setting that we enabled while attempting to get link lights to the Dell switches.
3. I've read a lot about people having problems with older Mellanox cards and various VLAN offload and other capabilities not working properly that can cause issues like this. I noticed that we have had parallels with Mellanox cards in other areas with these E810 Intel cards, having to use different FEC mode on the switch for example. Could something like this be related to our problem?
------------
Some more info...
Thanks for looking. If anyone has experience getting one of these E810 NIC's working on proxmox I could use any tips or advise! I'm sure we'll figure it out eventually but hoping someone may have a shortcut!
Regards,
-Eric
We are attempting to upgrade the core networking from 10Gb to 100Gb with a Dell S5232F at the helm for the cluster and a Dell S5248F for all of our network access switches to uplink to, intending to use 100Gb uplinks between the switches. The Switches have SONiC open source NOS installed and configured with all of our VLANS tagged on appropriate ports.
Info: 6 node cluster of SuperMicro 2113S-WTRT w/7402P and DWNRF (Dell variant) E810-C 100GbE NIC. We had to switch the FEC mode on the Dell switch to work with these E810 cards, but finally got link lights and then migrated the "bridge ports" of all of our Linux Bridges over from existing 10Gb infrastructure (Intel X710-T + Netgear XS748T) to the new 100Gb interfaces.
Everything appeared to be working fine at first , but the next day we started shutting down nodes 1 at a time for some housekeeping maintenance. Upon booting those machines again, many of the VM's had lost some or all network communication. DHCP responses would sometimes get through, and sometimes a blip of data here and there. Some Bridges on some nodes continued to work. We spent the day trying to understand it, moving workloads around. Luckily we had 2 nodes that we hadn't yet shut down, so we moved all business critical workloads to those nodes and survived that way as we tried to fix it, hoping not to have to revert to the 10Gb networking.
Our logs are currently extremely noisy because of the "auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring:" bug thing... I have read that this will probably be resolved in an update soon. I was able to find the following which might be related
After booting:
Code:
Mar 18 16:10:39 px3 kernel: tap10000i5: entered promiscuous mode
Mar 18 16:10:39 px3 kernel: vmbr104: port 2(tap10000i5) entered blocking state
Mar 18 16:10:39 px3 kernel: vmbr104: port 2(tap10000i5) entered disabled state
Mar 18 16:10:39 px3 kernel: tap10000i5: entered allmulticast mode
Mar 18 16:10:39 px3 kernel: vmbr104: port 2(tap10000i5) entered blocking state
Mar 18 16:10:39 px3 kernel: vmbr104: port 2(tap10000i5) entered forwarding state
A little while later:
Code:
Mar 18 16:20:56 px3 kernel: tap10000i5: left allmulticast mode
Mar 18 16:20:56 px3 kernel: vmbr104: port 2(tap10000i5) entered disabled state
I don't see anything obvious about "why" this is happening. Only that it is.
This may be a red herring though, as we also lost connectivity on the containers running on the 2 servers that hadn't been shut down yet, but most VM's on those continued to work. It's worth noting that any VM's on the same node could still communicate to each other. Communication from VM to VM on different nodes is where we had major problems, and also accessing those VM's from the rest of the network was hit and miss depending on which node they were hosted on at the time. We reviewed our network configs many times and saw nothing to explain the problems we were seeing.
Proxmox had no problem maintaining CEPH Public and CEPH Cluster and CoroSync (2) networks on the new 100Gb hardware throughout all this debacle. Any "directly" assigned IP's for the proxmox hosts to use on specific VLANs or Bridges worked flawlessly from what we could tell. The problem was only happening where the network had to connect to the VM or container.
------------
Currently I'm leaning into 3 possible areas.
1. SR-IOV is enabled on the motherboard BIOS and in the specific config for these network cards, however, I don't think we are using any of the features of SR-IOV, unless proxmox attempts to take advantage of this automatically at the Linux Bridge. Could having this enabled cause problems? I am tempted to disable on both motherboard and card settings in BIOS.
2. Does DCB "Data Center Bridging" support need to be working on the NIC for proper functionality in a Linux Bridge? I was reviewing our settings on these new 100Gb NICs and I believe DCB is currently being force-disabled due to another setting that we enabled while attempting to get link lights to the Dell switches.
3. I've read a lot about people having problems with older Mellanox cards and various VLAN offload and other capabilities not working properly that can cause issues like this. I noticed that we have had parallels with Mellanox cards in other areas with these E810 Intel cards, having to use different FEC mode on the switch for example. Could something like this be related to our problem?
------------
Some more info...
Code:
43:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
Subsystem: Intel Corporation Ethernet 100G 2P E810-C Adapter
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 98
IOMMU group: 32
Region 0: Memory at 28078000000 (64-bit, prefetchable) [size=32M]
Region 3: Memory at 2807e000000 (64-bit, prefetchable) [size=64K]
Expansion ROM at b0300000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Count=1024 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00008000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (downgraded), Width x16
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [e0] Vital Product Data
Product Name: E810-C 100GbE Controller
Read-only fields:
[V0] Vendor specific: FFV20.5.13\x00
[PN] Part number: DWNRF
[MN] Manufacture ID: 1028
[V1] Vendor specific: DSV1028VPDR.VER2.2
[V3] Vendor specific: DTINIC
[V4] Vendor specific: DCM1001FFFFFF2101FFFFFF
[VA] Vendor specific: \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
[V5] Vendor specific: NPY2
[V6] Vendor specific: PMTD
[V7] Vendor specific: NMVIntel Corp
[V8] Vendor specific: L1D0
[V9] Vendor specific: LNK164163
[RV] Reserved: checksum good, 2 byte(s) reserved
Read/write fields:
[Y0] System specific: CCF1
End
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [150 v1] Device Serial Number 40-a6-b7-ff-ff-95-9d-08
Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- 10BitTagReq-
IOVSta: Migration-
Initial VFs: 128, Total VFs: 128, Number of VFs: 0, Function Dependency Link: 01
VF offset: 135, stride: 1, Device ID: 1889
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 000002807c000000 (64-bit, prefetchable)
Region 3: Memory at 000002807e020000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [1a0 v1] Transaction Processing Hints
Device specific mode supported
No steering table available
Capabilities: [1b0 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [200 v1] Data Link Feature <?>
Kernel driver in use: ice
Kernel modules: ice
Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.13-1-pve)
pve-manager: 8.1.4 (running version: 8.1.4/ec5affc9e41f1d79)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
proxmox-kernel-6.5: 6.5.13-1
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-6-pve-signed: 6.5.11-6
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
proxmox-kernel-6.2.16-18-pve: 6.2.16-18
proxmox-kernel-6.2.16-15-pve: 6.2.16-15
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
proxmox-kernel-6.2.16-6-pve: 6.2.16-7
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.107-2-pve: 5.15.107-2
ceph: 18.2.1-pve2
ceph-fuse: 18.2.1-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: not correctly installed
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.2
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.1.0
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.4-1
proxmox-backup-file-restore: 3.1.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.5
proxmox-widget-toolkit: 4.1.4
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.4
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.9-2
pve-ha-manager: 4.0.3
pve-i18n: 3.2.1
pve-qemu-kvm: 8.1.5-3
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve2
Thanks for looking. If anyone has experience getting one of these E810 NIC's working on proxmox I could use any tips or advise! I'm sure we'll figure it out eventually but hoping someone may have a shortcut!
Regards,
-Eric
Last edited: