Onboard nic resets continually after adding second nic

disgustipated

New Member
Oct 29, 2023
8
2
3
I added a second nic (Intel 82599(X520-DA1)) to my server which has an asus b650e-f motherboard and now it seems to be doing some sort of dump on the onboard nic every few seconds. it starts right after reboot and the login is shown. i posted this under install and configuration, but its more of a config thing since this system is working without this new card in there. im starting to suspect something with the motherboard, or the driver used for the card which was showing as the ixgbe driver with the onboard one showing igc

journalctl -b shows

Nov 30 05:36:47 tweedle-dee kernel: igc 0000:0b:00.0 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX Nov 30 05:36:47 tweedle-dee kernel: vmbr0: port 1(eno1) entered blocking state Nov 30 05:36:47 tweedle-dee kernel: vmbr0: port 1(eno1) entered forwarding state Nov 30 05:36:50 tweedle-dee pvestatd[1294]: storage 'TN1_NFS_vmtank' is not online Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: Register Dump Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: Register Name Value Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: CTRL 081c0641 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: STATUS 40280683 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: CTRL_EXT 100000c0 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: MDIC 180a3800 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: ICR 00000085 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: RCTL 0440803a Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: RDLEN[0-3] 00001000 00001000 00001000 00001000 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: RDH[0-3] 0000003f 00000024 00000029 0000001e Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: RDT[0-3] 000000ff 000000ff 000000ff 000000ff Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: RXDCTL[0-3] 02040808 02040808 02040808 02040808 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: RDBAL[0-3] ffffb000 ffffa000 ffff9000 ffff8000 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: RDBAH[0-3] 00000000 00000000 00000000 00000000 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: TCTL a503f0fa Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: TDBAL[0-3] fffff000 ffffe000 ffffd000 ffffc000 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: TDBAH[0-3] 00000000 00000000 00000000 00000000 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: TDLEN[0-3] 00001000 00001000 00001000 00001000 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: TDH[0-3] 00000000 00000001 00000000 00000001 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: TDT[0-3] 00000000 00000001 00000000 00000001 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: TXDCTL[0-3] 02100108 02100108 02100108 02100108 Nov 30 05:36:52 tweedle-dee kernel: igc 0000:0b:00.0 eno1: Reset adapter Nov 30 05:36:52 tweedle-dee kernel: vmbr0: port 1(eno1) entered disabled state Nov 30 05:36:56 tweedle-dee kernel: igc 0000:0b:00.0 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX Nov 30 05:36:56 tweedle-dee kernel: vmbr0: port 1(eno1) entered blocking state Nov 30 05:36:56 tweedle-dee kernel: vmbr0: port 1(eno1) entered forwarding state

then it starts over again.
eno1 is the original on board nic
lspci shows both nics recognized
this is from the syslog
Nov 30 05:35:25 tweedle-dee kernel: ixgbe 0000:08:00.0: Multiqueue Enabled: Rx Queue count = 24, Tx Queue count = 24 XDP Queue count = 0 Nov 30 05:35:25 tweedle-dee kernel: ixgbe 0000:08:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:07:00.0 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link) Nov 30 05:35:25 tweedle-dee kernel: ixgbe 0000:08:00.0: MAC: 2, PHY: 14, SFP+: 3, PBA No: FFFFFF-0FF Nov 30 05:35:25 tweedle-dee kernel: ixgbe 0000:08:00.0: a0:36:9f:78:63:ef Nov 30 05:35:25 tweedle-dee kernel: ixgbe 0000:08:00.0: Intel(R) 10 Gigabit Network Connection

if i take the new card back out then the on board card is able to get an ip.

the new card does work in another box

any ideas on what to look at next?

interfaces - i added the iface enp8s0 and it shows up in the web interface when i remove the card and able to access it. the vmbr1 i added as well but that was also after the problem of the onboard nic resetting. i can take them both back out and it does the same.
auto lo iface lo inet loopback iface eno1 inet manual iface enp8s0 inet manual auto vmbr0 iface vmbr0 inet static address 6.13.0.20/24 gateway 6.13.0.10 bridge-ports eno1 bridge-stp off bridge-fd 0 iface wlp7s0 inet manual auto vmbr10 iface vmbr10 inet manual bridge-ports none bridge-stp off bridge-fd 0 bridge-vlan-aware yes bridge-vids 2-4094 iface vmbr1 inet static address 10.16.0.10/24 bridge-ports enp8s0 bridge-stp off bridge-fd 0 bridge-vlan-aware yes bridge-vids 2-4094

ip a output when the card is installed, its showing here as the enp8s0
1701428059937.png

in the bios/uefi interface i can see the card and there are controls there that let you blink the led on the new card and that does work.
 
Last edited:
Adding (or removing) PCI(e) devices can change the PCI ID of other devices (by 1). Use ip a to find the new name of your old network device and change /etc/network/interfaces accordingly? This often trips people up and there are many threads about it, which you can find once you know that the rename of the network controller is the problem.

EDIT: Apparently, this is not the problem and I don't know how to fix it.
 
Last edited:
Adding (or removing) PCI(e) devices can change the PCI ID of other devices (by 1). Use ip a to find the new name of your old network device and change /etc/network/interfaces accordingly? This often trips people up and there are many threads about it, which you can find once you know that the rename of the network controller is the problem.
ip a shows eno1 still with the same mac, it does have an altname listed though and it works fine when just removing the new card. the new nic has another name which is expected and not used in /etc/network/interfaces. i did try to add it to interfaces and didnt help. i can shut down, take the new card out and get an ip and not have the continual reset on the original nic. im intending to continue to use the original nic, and use the new one as a direct link to another box. its an sfp+ interface card.
 
While I don't own any devices with this particular card, a quick web search suggests that Intel NICs can be quite unreliable. Often, but not always, it is related to power management. The work-around would be to disable the low-power modes by adding pcie_port_pm=off pcie_aspm.policy=performance to the kernel command line.

So, that could be something to try. Maybe, you get lucky and that's the fix that you need to make.

On the other hand, if your new NIC supports 10GigE, maybe you are OK with sharing a little bit of that bandwidth. I have decided to completely move towards VLANs for ease of maintenance. My Proxmox server connects to the switch with two bonded 10GigE NICs (not so much for bandwidth, but for reliability). I then set up several VLANs on the switch. Two of the VLANs connect to both of my ISPs, and another VLAN connects to my LAN. There are small number of additional special-purpose VLANs, too.

This is much easier to manage than having a large number of physical NICs, but it might not be viable, if you need the full 10GigE bandwidth between both of your machines at all times.
 
Last edited:
ive not seen anything suggesting it is power management related, but i can try that later. it continually resets the onboard nic, not when coming back from sleep. later tonight im going to be trying to disable the offloading that ive seen in other threads that seemed to be somewhat related but were not the same symptoms. and yes, i do want the bandwidth between the two boxes. i have vlans for other purposes but those tags are the on the vms themselves
 
The way I read the reports about problems with these type of cards is the NIC will try to go to a lower-power mode whenever it isn't actively in use. And it then has trouble recovering from the state it put itself into. So, this isn't necessarily tied to you putting your entire computer to sleep. It could strike at any time, depending on what your system BIOS decided to do. And that's what makes it difficult to give helpful suggestions. There usually isn't great documentation on what each particular BIOS does. In other words, give it a try. But no guarantee that it helps. Good luck.

And yes, I hear you about needing the extra bandwidth. If you do need it, then there isn't really much of an option. I just figured I'd raise the idea, as I found it simplified my network management tremendously. And in practice, the impact on available bandwidth is negligible for me. But that depends on lots of factors including the speed of your ISP.

Also, I got lucky and I cheaply picked up a used 4-port SFP+ card on EBay. So, I do have a lot more flexibility than you do, if I find that I am bandwidth limited.

Your gut feeling about disabling some of the offloading is another good idea. I have had to do that in the past. It seems to be a mixed bag of how well NIC-manufacturers test their offloading features, and there definitely can be obscure and annoying bugs.
 
Last edited:
Why no auto vmbr1? vmbr1 is not up and therefore Proxmox has no IP on that virtual bridge and is not reachable via enp8s0.
Theres not even anything plugged into that nic yet, thats the new one and i cant get the onboard nic to work when that card is installed. the onboard nic is what i want to continue to use for the proxmox web interface. the sfp card im planning on using to direct connect to another box
 
I ended up just getting a small pcie x1 2.5gb nic and disabled the onboard one. reconfigured the network interfaces to use that and moved on with my day with both the 10gb card and the 2.5 nic working now. it seems to be either a conflict with the igc driver and this 10gb intel card or the the motherboard and the 10gb intel card
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!