Proxmox 8.2, Kernel 6.8 and i40e driver

Joe Botha

Well-Known Member
Apr 11, 2019
30
1
48
Cape Town
www.atomicaccess.co.za
Hi

We have a few Proxmox 8.2 hosts pinned to kernel 6.5 because it seems the i40e driver gets creative with network interface naming in the upgrade to kernel 6.8.

I'm considering just staying on the 6.5 kernel and waiting for this problem to go away. Is that an option?

Are there plans for a newer kernel soon, and are there plans to keep the interface naming stable (as it was in kernel version 6.5)?

Personally, as a paying Proxmox 'stable / production' version customer, I'm a bit shocked that you would knowingly make this breaking change. Does Kernel 6.8 have enough extra value to justify creating this problem? I doubt it.

ps. The documentation about 'net.naming-scheme=v252' is of no help, and possibly just plain wrong. Embarrassing.

pps. Why not just stick to longterm stable kernels like v6.6? or offer people the option. I have no ambitions to try the latest kernels. Very happy to run the most tested mainstream ones with the LEAST SURPRISES.
 
I'm considering just staying on the 6.5 kernel and waiting for this problem to go away. Is that an option?
It won't go away by waiting - there was a change in the kernel/driver that causes the naming to change. It happens from time to time.


Are there plans for a newer kernel soon, and are there plans to keep the interface naming stable (as it was in kernel version 6.5)?
We regularly release newer (opt-in) Kernels and usually a newer Kernel version with every minor point release (8.1, 8.2, ...).

Interface names can change when upgrading kernels, there are no guarantees for them staying the same - never has been. For instance, a change in how PCI devices are iterated by the kernel can change the id of PCIe devices - leading to changes in network interface names. There are ways of overriding network interface names and making them stable [1].

We've discussed introducing a mechanism for overriding network interfaces names automatically during updates though and that might be a way for us to address this issue.



pps. Why not just stick to longterm stable kernels like v6.6? or offer people the option. I have no ambitions to try the latest kernels. Very happy to run the most tested mainstream ones with the LEAST SURPRISES.
This would also happen with stable kernels, just at a later point in time. Stable doesn't mean 'network interface names can never change'.


[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#network_override_device_names
 
Hi, do you know in which naming scheme version the change was introduced which caused the i40e driver to change?

I can't spot it. Which makes me think the i40e change is a bug.

From:
https://support.xilinx.com/s/article/000034471?language=en_US

'Currently network device drivers for the Linux kernel typically use an ‘npX’ suffix to specify the port name when the NIC has more than one network port on the same PCI function. Removing the suffix is now the best practice for cards with one network port per PCI function.'

I don't see why 'eno7' needs to become 'eno7np2'.

Do you think the i40e driver people could be in the wrong?
 
Hi, do you know in which naming scheme version the change was introduced which caused the i40e driver to change?

I can't spot it. Which makes me think the i40e change is a bug.

From:
https://support.xilinx.com/s/article/000034471?language=en_US

'Currently network device drivers for the Linux kernel typically use an ‘npX’ suffix to specify the port name when the NIC has more than one network port on the same PCI function. Removing the suffix is now the best practice for cards with one network port per PCI function.'

I don't see why 'eno7' needs to become 'eno7np2'.

Do you think the i40e driver people could be in the wrong?

This is not necessarily related to a change in naming scheme versions, it could also be that the i40e driver now exposes information that wasn't available before, which then leads to different names.

I'd have to look more closely into what actually caused the issue before I could say something definitive here - I'm speculating a bit to be honest. Support for devlink was added to the i40e driver between those kernel versions, so that looks like a possible culprit.
 
Hi,

Please ask the person who deals with your kernel choices to investigate what happened with the i40e driver.

It seems like a rather ugly solution to go do static MAC based naming. Problem waiting to happen as soon as a NIC gets swapped.

We also use the ixgbe and mellanox drivers in some of our servers - and there where no problem with those.

ps. We made a choice to avoid intel NICs a while back, mostly because of their drivers, but we're still stuck with the onboard ports.
 
I'd say its a decent enough solution if used correctly. Usually you're aware that you're changing network card and can adjust the override beforehand accordingly.

This issue can also happen with ixgbe and mellanox drivers at some point. It can also happen when you update the firmware of your motherboard and it suddenly changes how PCIe devices are reported. It can also happen when you add a new PCIe device which changes the numbering of interfaces. There are a multitude of factors at play here - there's no silver bullet.
 
for i40e, the change in question was likely this (in kernel 6.7)

https://lore.kernel.org/lkml/20231013170755.2367410-1-ivecera@redhat.com/

which exposed more information about the device to userspace, which in turn means more detailed names. but like Stefan said - this can happen with any driver, or other kernel changes, or systemd/udev upgrades, or firmware upgrades.
 
  • Like
Reactions: Joe Botha
well, the new naming scheme is not so new anymore ;) but if you mean the more specific names for i40e devices, then yes, I doubt that will get reverted. but the point made earlier still stands - if you want stable names, you need to pin them on something stable. the most stable aspect of a NIC is its MAC, but of course, that is tied to the hardware itself. you can decide yourself which input parameters you want to use to assign human readable names, and whether assigning them based on hardware topology (same name on replacement, but possible risk of unrelated changes affecting the topology somehow) or hardware (rule needs adaptation on replacement, but otherwise 100% stable) is the better option.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!