Serious problem: Proxmox 6.0.9 renames NICs?

Mattias Hedman

Well-Known Member
Jan 19, 2019
122
10
58
54
So after a fun weekend of digging deep into Debian to see why the only server updated to 6.0.9 isnt able to connect to the network, even though it got an IP and says everything is up.
The server is running on an HP Proliant DL380 G6 with 4 NICs.

Thursday I ran an update on the system and got 6.0.9, this is when shit hit the fan.
All service running on this machine wasn't on the network anymore. I wasn't able to ping the server nevertheless ssh into it.
So into the chilly server room, logging in locally, ip a says everything is up, but I cannot ping anything.
After a lot of digging and trying, the thing I had to do was to scarp my interfaces file and let the server boot with the original one.

This is not optimal at all, I do need my bonds, so I thought, let's configure the network via the Proxmox gui.
Step by step adding one network function after another and not until when I had set up two bonds and restarted I was back to square one. Thus no network what so ever.

So far it seems to be around enp3s0f* where the problems start.
So then I started to dig and discover that enp3s0f0 was in the interfaces file but NOT in the system.
It has been renamed to enp4s0f0.

What the f*** happened with 6.0.9?
That is far from ok in my book. And that really has hit my trust for Proxmox in a bad way.
 
This isn't Proxmox specific nor did we intend to change your network interface names with an update in evil intent. This is called "predictable network interface names" and should in fact prevent such scenarios. Systemd network interface names are used since Proxmox VE 5.0 and therefore aren't new. If your NIC changed from enp3s0f* to enp4s0f0, this would mean that the NIC switched the PCI bus number from 3 to 4 and remained in slot 0. This should only happen when someone changes the physical location of that NIC, at least this is the theory behind it. In fact some people report that due to some hardware/bios combinations it sometimes happens even without touching the NIC, for example when adding unrelated hardware to the server or removing something else, sometime it is enough to change something in the BIOS e.g. update it.

In any case, please refer to your hardware supplier and make sure you have the latest BIOS for your server.
 
This isn't Proxmox specific nor did we intend to change your network interface names with an update in evil intent.

Heya Tim!
Thanks for your answer - the only thing I have added the pat 6 months to the server is RAM.
No NIC's have moved or anything, and I was wrong about the numbers...
enp2s0f1 is in systemd namned enp3s0f1 - so Proxmox so then it is a crash between those, so as soon as I activate enp3s0f1 the network flips out.
This wasn't a problem since the update last week, maybe there as an Debian update as well that can cause this?
 
Can you share the output of :

# lspci -nnk
# ip address

Without this, it's very hard to follow what your are referring to. (Make sure to mask any information you don't want to share publicly)
 
Can you share the output of :

# lspci -nnk
# ip address

Without this, it's very hard to follow what your are referring to. (Make sure to mask any information you don't want to share publicly)

It'll have to wait until I get home, the server isn't answering atm.
 
Ok both outputs are sane, they show all 4 nics and it seems that they are correctly labeled :

enp2s0f0
enp2s0f1
enp3s0f0
enp3s0f1

Do you still have an problems? Can you share the dmesg output.
 
Ok both outputs are sane, they show all 4 nics and it seems that they are correctly labeled :

enp2s0f0
enp2s0f1
enp3s0f0
enp3s0f1

Do you still have an problems? Can you share the dmesg output.
Yes I do. Atm I do not dare to enter more than one NIC. How do I get the dmesg?
 
# dmesg > dmesg.txt

And please share your /etc/network/interfaces as well.
 
Ok all nics are correctly renamed on boot:
Code:
[    3.584230] bnx2: QLogic bnx2 Gigabit Ethernet Driver v2.2.6 (January 29, 2014)
[    3.584967] bnx2 0000:02:00.0 eth0: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem f4000000, IRQ 16, node addr 78:e7:d1:7c:1c:88
[    3.585773] bnx2 0000:02:00.1 eth1: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem f2000000, IRQ 17, node addr 78:e7:d1:7c:1c:8a
[    3.586465] bnx2 0000:03:00.0 eth2: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem f8000000, IRQ 18, node addr 78:e7:d1:7c:1c:8c
[    3.587164] bnx2 0000:03:00.1 eth3: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem f6000000, IRQ 19, node addr 78:e7:d1:7c:1c:8e
[    3.626358] bnx2 0000:02:00.0 enp2s0f0: renamed from eth0
[    3.676824] bnx2 0000:02:00.1 enp2s0f1: renamed from eth1
[    3.716462] bnx2 0000:03:00.0 enp3s0f0: renamed from eth2
[    3.748809] bnx2 0000:03:00.1 enp3s0f1: renamed from eth3

so Proxmox so then it is a crash between those, so as soon as I activate enp3s0f1 the network flips out
So let's focus on this, please tell me how you activate enp3s0f1 or share the interfaces file with your desired changes.
 
So let's focus on this, please tell me how you activate enp3s0f1 or share the interfaces file with your desired changes.

In the Proxmox GUI under PVE I choose Network, there I make an Linux Bond and add enp2s0f1 and enp3s0f1, save and restart network via CLI or restart server. After that all network connections are lost.

This is the interface-file I used before 6.0.9 and that was working fine.
https://ufile.io/g1y6x4pr
 
Where did you get that configuration from, the old one? You have defined an ip address for the bond as well as for the bridge where the bond is used, this isn't a valid config AFAIK. From what I've seen so far I doubt that this was working properly with the given config, you maybe changed that sometime ago and didn't reboot since then.

Please take a look into our official admin guide on how to configure your network:
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_network_configuration
 
Where did you get that configuration from, the old one? You have defined an ip address for the bond as well as for the bridge where the bond is used, this isn't a valid config AFAIK. From what I've seen so far I doubt that this was working properly with the given config, you maybe changed that sometime ago and didn't reboot since then.

Please take a look into our official admin guide on how to configure your network:
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_network_configuration
That old one is a mix of gui and cli leaning mostly towards gui, but that is the exact one I used before 6.0.9, I picked it directly from the server before posting it here. But if you say it is a faulty config then that might explain why it is not working.
I will read the guide again.
 
So @tim after reading the manual once again I see why I made an error.

Part on in the bond section talk about bond with fixed ip, I must have mixed it up with the Use bond ad bridge port.
I have now made a new interface file, https://ufile.io/5tjo7r2m, but haven't tested it yet, so Id like you to take a look at it before I use it.

Feels good to see that a solution is on its way and that was probably my fault. :)
 
Ok let's start with bond0, you configured lacp (802.3ad) make sure this is correctly setup on your switch as well. Why did you configure bond-delays, any particular reasons? The mtu here wont work, this needs to be configured for the lowest interface in the chain or all others will default to the mtu of that, in your case even though bond0 has mtu 9000 this will be ignored, because both slaves default to 1500. Make sure the rest of your network supports jumbo frames as well.

As an example for bond0:
Code:
auto bond0
iface bond0 inet static
    ....
    post-up ip link set dev enp2s0f0 mtu 9000  && ip link set dev enp3s0f0 mtu 9000 && ip link set dev bond0 mtu 9000

Finally bond1 has to be manual not static, it has no ip configured and will be brought up by vmbr0.
 
Ok let's start with bond0, you configured lacp (802.3ad) make sure this is correctly setup on your switch as well. Why did you configure bond-delays, any particular reasons? The mtu here wont work, this needs to be configured for the lowest interface in the chain or all others will default to the mtu of that, in your case even though bond0 has mtu 9000 this will be ignored, because both slaves default to 1500. Make sure the rest of your network supports jumbo frames as well.
First off, jumbo frames are supported through my whole network.

bonds... for maximum throughput what config would you recommend?

The delays came from somewhere, I need to check that again.
Your welcome to make an interface file that is correct.
 
I've linked the documentation which has examples for each of your interfaces, just use them if you want to have a working network config or incorporate my suggestions in you current config. Defining configuration options for stuff you don't really need makes things way more complicated. I can't recommend you a config for maximum throughput, because I don't know what you are doing with your server or anything else about your network. I would go with the simplest approach if nothing else is indicating a performance bottleneck in the network layer.
 
Once again @tim I have redone my interface file to be much more simple and more to the point, thus just using 2 NICs, one for admin access and another as a virtual bridge for the VMs.
If need be in the future Ill and another bridge or bond he exciting one for load balancing, but that is not needed at the current state.

Thank you!