[SOLVED] Proxmox 6.8.x Kernels Break Networking

mattlach

Renowned Member
Mar 23, 2016
181
21
83
Boston, MA
Hi Everyone,

So the Proxmox server got itself a reboot today for the first time in a while, and when it came back up, there was no network connectivity.

The PCI device appeared in lspci, and the network device appeared in "ip link show". ifconfig showed only the vmbr networks, and they were in a state of "UNKNOWN".

After some troubleshooting, I tried rolling back. I can confirm I have the issue on both 6.8.12 and 6.8.8, but everything works as expected on 6.5.13.

The NIC is an Intel XL710 using 40Gbit QSFP+, using the kernel module i40e.

I have this same NIC in another machine running Ubuntu Kernel 6.8.0-45, and it appears to be working just fine.

Anyone know what is going on here? Is there a known kernel issue?

Appreciate any assistance!

--Matt
 
Hi!

Check breaking changes for the latest kernel: https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_8.2

Kernel: Change in Network Interface Names​

Upgrading kernels always carries the risk of network interface names changing, which can lead to invalid network configurations after a reboot. In this case, you must either update the network configuration to reflect the name changes, or pin the network interface to its name beforehand.

See the reference documentation on how to pin the interface names based on MAC Addresses.

Currently, the following models are known to be affected at higher rates:

  • Models using i40e. Their names can get an additional port suffix like p0 added.

Boot with the newest kernel and check your interface names. If a p0 has been added to the end of the interfaces, change its name in /etc/network/interfaces (backup this file before doing the change!).
 
Hi!

Check breaking changes for the latest kernel: https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_8.2



Boot with the newest kernel and check your interface names. If a p0 has been added to the end of the interfaces, change its name in /etc/network/interfaces (backup this file before doing the change!).

Thank you for that. I will take a look at that.

Seems mind boggling that they would mess with network interface names at this point.

Almost as if we learned nothing from the transition away from eth0, eth1, eth2, etc. to consistent network naming - what- a decade ago now? More? I can't remember.

I feel sorry for everyone with a remote server and no out of band management capability who have to go on a road trip because of this one. What were they thinking?
 
Last edited:
Hi!

Check breaking changes for the latest kernel: https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_8.2



Boot with the newest kernel and check your interface names. If a p0 has been added to the end of the interfaces, change its name in /etc/network/interfaces (backup this file before doing the change!).

That turned out to be the case.

Each of my two network interfaces on this server:

enp133s0f0 and enp133s0f1 were renamed to enp133s0f0np0 and enp133s0f1np1 respectively.

This makes me a little frustrated at the part of the kernel team that is responsible for this driver. I hope they at least had a really good reason for doing something they knew would break running systems, and possibly leave those with remote systems without out of band management capabilities stranded.

Side note:

I'm actually amazed we can still configure the network in /etc/network/interfaces. I remember being told (years ago) that this was the ifup/down way of doing things and it was going away in favor of netplan which had a whole new syntax using something called YAML (which to me looks just like a version of XML, insert link to that XKCD comic about standards)

Actually, I'm just going to do it:
https://xkcd.com/927/

I remember being told the reason for doing this was that the etc/network/interfaces way fo doing things made things needlessly difficult for projects using frontends, and that the new netplan way solved that.

Based on this, I kind of would have expected that Proxmox and its PVE frontend would have benefited from this change and been an early adopter, yet here we are in 2024 still using /etc/network/interfaces.

Not that I'm complaining. I prefer the old way of doing things compared to netplan. I'm just surprised :p
 
  • Like
Reactions: carles89
  • Like
Reactions: mattlach
It is worth to check "breaking changes" section in the Roadmap everytime a new major/minor version of Promxox is released, just in case some issues like that arise.

In this case, a workaround was to pin network interface names based on its mac address: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#network_override_device_names

This way you make sure your interface name never changes again :)

That is true,

But I don't think the 6.8 kernel came in as a new major/minor update of Proxmox. Just as a regular package update.

So the node that has the i40e NIC in it had an uptime of 180+ days, and got several kernel updates through regular updates in that time. Didn't realize there was a problem until I powered down for some electrical work in my main panel, and it didn't come back up again afterwards.

(notably, this was a huge PITA as my main firewall/router is on OPN sense as a guest on another node, and despite that node properly coming up, it wouldn't start any guests citing a lack of quorum, since the node with the i40e was offline. I'll need to read up on why this was and how to override it at some point, as it was unexpected and caused me some issues.)

Though this may be because I am a home lab user and can't justify spending on the enterprise subscription.

I used to back in 2016 when I first switched to Proxmox (I had previously used an unlicensed version of ESXi) but back then Proxmox cost only €64.90 per socket per year, and I could justify the cost. Now even the community edition is twice as as it was in 2016, and there is a Premium subscription which is over €1000 per year per socket which is just nuts. it feels like Proxmox - as they have become more popular - decided to emulate VMWare's pricing model :p

Maybe I'll have to bite the bullet and sign up for it though. But that problably means I'll have to scale back to just one node. I can't justify spending that much per socket for multiple CPU's, especially since one of my secondary machines is an old dual socket board...
 
This change affected to enterprise users too, so even with a subscription you would be in the same situation. In fact, the change it's not Proxmox related but kernel related.

I get your point, maybe a warning before upgrading in case of breaking changes could be useful.
 
This change affected to enterprise users too, so even with a subscription you would be in the same situation. In fact, the change it's not Proxmox related but kernel related.

I get your point, maybe a warning before upgrading in case of breaking changes could be useful.
Or maybe it would be sane to pin NIC names to MAC by default?

Possibly, but I still fault the kernel driver team behind the i40e driver, not Proxmox.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!