Serious problem: Proxmox 6.0.9 renames NICs?

Mattias Hedman · Nov 4, 2019

So after a fun weekend of digging deep into Debian to see why the only server updated to 6.0.9 isnt able to connect to the network, even though it got an IP and says everything is up.
The server is running on an HP Proliant DL380 G6 with 4 NICs.

Thursday I ran an update on the system and got 6.0.9, this is when shit hit the fan.
All service running on this machine wasn't on the network anymore. I wasn't able to ping the server nevertheless ssh into it.
So into the chilly server room, logging in locally, ip a says everything is up, but I cannot ping anything.
After a lot of digging and trying, the thing I had to do was to scarp my interfaces file and let the server boot with the original one.

This is not optimal at all, I do need my bonds, so I thought, let's configure the network via the Proxmox gui.
Step by step adding one network function after another and not until when I had set up two bonds and restarted I was back to square one. Thus no network what so ever.

So far it seems to be around enp3s0f* where the problems start.
So then I started to dig and discover that enp3s0f0 was in the interfaces file but NOT in the system.
It has been renamed to enp4s0f0.

What the f*** happened with 6.0.9?
That is far from ok in my book. And that really has hit my trust for Proxmox in a bad way.

tim · Nov 4, 2019

This isn't Proxmox specific nor did we intend to change your network interface names with an update in evil intent. This is called "predictable network interface names" and should in fact prevent such scenarios. Systemd network interface names are used since Proxmox VE 5.0 and therefore aren't new. If your NIC changed from enp3s0f* to enp4s0f0, this would mean that the NIC switched the PCI bus number from 3 to 4 and remained in slot 0. This should only happen when someone changes the physical location of that NIC, at least this is the theory behind it. In fact some people report that due to some hardware/bios combinations it sometimes happens even without touching the NIC, for example when adding unrelated hardware to the server or removing something else, sometime it is enough to change something in the BIOS e.g. update it.

In any case, please refer to your hardware supplier and make sure you have the latest BIOS for your server.

Mattias Hedman · Nov 4, 2019

tim said:
This isn't Proxmox specific nor did we intend to change your network interface names with an update in evil intent.

Heya Tim!
Thanks for your answer - the only thing I have added the pat 6 months to the server is RAM.
No NIC's have moved or anything, and I was wrong about the numbers...
enp2s0f1 is in systemd namned enp3s0f1 - so Proxmox so then it is a crash between those, so as soon as I activate enp3s0f1 the network flips out.
This wasn't a problem since the update last week, maybe there as an Debian update as well that can cause this?

tim · Nov 4, 2019

Can you share the output of :

# lspci -nnk
# ip address

Without this, it's very hard to follow what your are referring to. (Make sure to mask any information you don't want to share publicly)

Mattias Hedman · Nov 4, 2019

tim said:
Can you share the output of :

# lspci -nnk
# ip address

Without this, it's very hard to follow what your are referring to. (Make sure to mask any information you don't want to share publicly)

It'll have to wait until I get home, the server isn't answering atm.

Mattias Hedman · Nov 4, 2019

So @tim here are the result of the commands you asked for.
https://ufile.io/4b0jp9ws

tim · Nov 5, 2019

Ok both outputs are sane, they show all 4 nics and it seems that they are correctly labeled :

enp2s0f0
enp2s0f1
enp3s0f0
enp3s0f1

Do you still have an problems? Can you share the dmesg output.

Mattias Hedman · Nov 5, 2019

tim said:
Ok both outputs are sane, they show all 4 nics and it seems that they are correctly labeled :

enp2s0f0
enp2s0f1
enp3s0f0
enp3s0f1

Do you still have an problems? Can you share the dmesg output.

Yes I do. Atm I do not dare to enter more than one NIC. How do I get the dmesg?

tim · Nov 5, 2019

# dmesg > dmesg.txt

And please share your /etc/network/interfaces as well.

Mattias Hedman · Nov 5, 2019

tim said:
# dmesg > dmesg.txt

And please share your /etc/network/interfaces as well.

Hey @tim first off thank you for helping me!
Here are the files you asked for: https://ufile.io/m1ptfz07

tim · Nov 6, 2019

Ok all nics are correctly renamed on boot:

Code:

[    3.584230] bnx2: QLogic bnx2 Gigabit Ethernet Driver v2.2.6 (January 29, 2014)
[    3.584967] bnx2 0000:02:00.0 eth0: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem f4000000, IRQ 16, node addr 78:e7:d1:7c:1c:88
[    3.585773] bnx2 0000:02:00.1 eth1: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem f2000000, IRQ 17, node addr 78:e7:d1:7c:1c:8a
[    3.586465] bnx2 0000:03:00.0 eth2: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem f8000000, IRQ 18, node addr 78:e7:d1:7c:1c:8c
[    3.587164] bnx2 0000:03:00.1 eth3: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem f6000000, IRQ 19, node addr 78:e7:d1:7c:1c:8e
[    3.626358] bnx2 0000:02:00.0 enp2s0f0: renamed from eth0
[    3.676824] bnx2 0000:02:00.1 enp2s0f1: renamed from eth1
[    3.716462] bnx2 0000:03:00.0 enp3s0f0: renamed from eth2
[    3.748809] bnx2 0000:03:00.1 enp3s0f1: renamed from eth3

so Proxmox so then it is a crash between those, so as soon as I activate enp3s0f1 the network flips out

So let's focus on this, please tell me how you activate enp3s0f1 or share the interfaces file with your desired changes.

Mattias Hedman · Nov 6, 2019

tim said:
So let's focus on this, please tell me how you activate enp3s0f1 or share the interfaces file with your desired changes.

In the Proxmox GUI under PVE I choose Network, there I make an Linux Bond and add enp2s0f1 and enp3s0f1, save and restart network via CLI or restart server. After that all network connections are lost.

This is the interface-file I used before 6.0.9 and that was working fine.
https://ufile.io/g1y6x4pr

tim · Nov 6, 2019

Where did you get that configuration from, the old one? You have defined an ip address for the bond as well as for the bridge where the bond is used, this isn't a valid config AFAIK. From what I've seen so far I doubt that this was working properly with the given config, you maybe changed that sometime ago and didn't reboot since then.

Please take a look into our official admin guide on how to configure your network:
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_network_configuration

Mattias Hedman · Nov 6, 2019

tim said:
Where did you get that configuration from, the old one? You have defined an ip address for the bond as well as for the bridge where the bond is used, this isn't a valid config AFAIK. From what I've seen so far I doubt that this was working properly with the given config, you maybe changed that sometime ago and didn't reboot since then.

Please take a look into our official admin guide on how to configure your network:
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_network_configuration

That old one is a mix of gui and cli leaning mostly towards gui, but that is the exact one I used before 6.0.9, I picked it directly from the server before posting it here. But if you say it is a faulty config then that might explain why it is not working.
I will read the guide again.

Mattias Hedman · Nov 6, 2019

So @tim after reading the manual once again I see why I made an error.

Part on in the bond section talk about bond with fixed ip, I must have mixed it up with the Use bond ad bridge port.
I have now made a new interface file, https://ufile.io/5tjo7r2m, but haven't tested it yet, so Id like you to take a look at it before I use it.

Feels good to see that a solution is on its way and that was probably my fault.

Mattias Hedman · Nov 7, 2019

Ok I spent some time really making my new interface-file, so I hope @tim can bless it so I can start using it.
So here it is: https://ufile.io/to836b0f

tim · Nov 7, 2019

Ok let's start with bond0, you configured lacp (802.3ad) make sure this is correctly setup on your switch as well. Why did you configure bond-delays, any particular reasons? The mtu here wont work, this needs to be configured for the lowest interface in the chain or all others will default to the mtu of that, in your case even though bond0 has mtu 9000 this will be ignored, because both slaves default to 1500. Make sure the rest of your network supports jumbo frames as well.

As an example for bond0:

Code:

auto bond0
iface bond0 inet static
    ....
    post-up ip link set dev enp2s0f0 mtu 9000  && ip link set dev enp3s0f0 mtu 9000 && ip link set dev bond0 mtu 9000

Finally bond1 has to be manual not static, it has no ip configured and will be brought up by vmbr0.

Mattias Hedman · Nov 7, 2019

tim said:
Ok let's start with bond0, you configured lacp (802.3ad) make sure this is correctly setup on your switch as well. Why did you configure bond-delays, any particular reasons? The mtu here wont work, this needs to be configured for the lowest interface in the chain or all others will default to the mtu of that, in your case even though bond0 has mtu 9000 this will be ignored, because both slaves default to 1500. Make sure the rest of your network supports jumbo frames as well.

First off, jumbo frames are supported through my whole network.

bonds... for maximum throughput what config would you recommend?

The delays came from somewhere, I need to check that again.
Your welcome to make an interface file that is correct.

tim · Nov 7, 2019

I've linked the documentation which has examples for each of your interfaces, just use them if you want to have a working network config or incorporate my suggestions in you current config. Defining configuration options for stuff you don't really need makes things way more complicated. I can't recommend you a config for maximum throughput, because I don't know what you are doing with your server or anything else about your network. I would go with the simplest approach if nothing else is indicating a performance bottleneck in the network layer.

Mattias Hedman · Nov 7, 2019

Once again @tim I have redone my interface file to be much more simple and more to the point, thus just using 2 NICs, one for admin access and another as a virtual bridge for the VMs.
If need be in the future Ill and another bridge or bond he exciting one for load balancing, but that is not needed at the current state.

Thank you!

Serious problem: Proxmox 6.0.9 renames NICs?

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

We value your privacy