Opt-in Linux 6.8 Kernel for Proxmox VE 8 available on test & no-subscription

Hello, i have issue with 6.8.4-2-pve kernel, there is no such problems with kernel 6.5.13-5-pve
i have old supermicro home lab server with motherboard X8dth-if and cheap Marvell HBA card
this kernel it failed to detect SATA links, all disk on HBA inaccessible, errors from dmesg:
Code:
[Tue Apr 30 13:29:06 2024] ata7: link is slow to respond, please be patient (ready=0)
[Tue Apr 30 13:29:07 2024] ata9.00: qc timeout after 5000 msecs (cmd 0xec)
[Tue Apr 30 13:29:07 2024] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[Tue Apr 30 13:29:07 2024] ata14.00: qc timeout after 5000 msecs (cmd 0xa1)
[Tue Apr 30 13:29:07 2024] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[Tue Apr 30 13:29:07 2024] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[Tue Apr 30 13:29:07 2024] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Tue Apr 30 13:29:16 2024] ata7: link is slow to respond, please be patient (ready=0)
[Tue Apr 30 13:29:17 2024] ata14.00: qc timeout after 10000 msecs (cmd 0xa1)
[Tue Apr 30 13:29:17 2024] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[Tue Apr 30 13:29:17 2024] ata9.00: qc timeout after 10000 msecs (cmd 0xec)
[Tue Apr 30 13:29:17 2024] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[Tue Apr 30 13:29:17 2024] ata9: limiting SATA link speed to 3.0 Gbps
[Tue Apr 30 13:29:18 2024] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[Tue Apr 30 13:29:18 2024] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
[Tue Apr 30 13:29:26 2024] ata7: link is slow to respond, please be patient (ready=0)
[Tue Apr 30 13:29:48 2024] ata9.00: qc timeout after 30000 msecs (cmd 0xec)
[Tue Apr 30 13:29:48 2024] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[Tue Apr 30 13:29:48 2024] ata14.00: qc timeout after 30000 msecs (cmd 0xa1)
[Tue Apr 30 13:29:48 2024] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[Tue Apr 30 13:29:48 2024] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[Tue Apr 30 13:29:48 2024] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
[Tue Apr 30 13:29:56 2024] ata7: limiting SATA link speed to 3.0 Gbps
[Tue Apr 30 13:30:01 2024] ata7: hardreset failed
[Tue Apr 30 13:30:01 2024] ata7: reset failed, giving up

HBA card info from lspci -v
Code:
03:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller (rev 11) (prog-if 01 [AHCI 1.0])
        Subsystem: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller
        Flags: bus master, fast devsel, latency 0, IRQ 49, NUMA node 0, IOMMU group 43
        I/O ports at c000 [size=8]
        I/O ports at bc00 [size=4]
        I/O ports at cc00 [size=8]
        I/O ports at c800 [size=4]
        I/O ports at c400 [size=32]
        Memory at fbcee000 (32-bit, non-prefetchable) [size=2K]
        Expansion ROM at fbcf0000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [70] Express Legacy Endpoint, MSI 00
        Capabilities: [e0] SATA HBA v0.0
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: ahci
        Kernel modules: ahci
 
Last edited:
I needed to migrate to SSDs, so I did a fresh install on my R320 with 6.8. As far as I can tell, everything seems to be running much better than the last upgrade attempt. I bypassed the RAID and hooked the backplane to SATA to move to ZFS. I see no unusual errors in dmesg.
 
For those having problems with broadcom nics not going up automatically after upgrade to 8.2 see: https://forum.proxmox.com/threads/broadcom-nics-down-after-pve-8-2-kernel-6-8.146185/

(this also fixed having problems with a zfs pool on pbs 3.2 - maybe because udev did not work propely because of broadcom nic)

1. Update to latest Firmware from Broadcom
2. Blacklist Infiniband Driver
3. Reboot Network and ZFS-Pool backup again
3.1 ZFS failed because udev failed because of Broadcom infiniband driver not working

root@HCI-BAK01-BER4:~# systemctl list-units --state=failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● systemd-udev-settle.service loaded failed failed Wait for udev To Complete Device Initialization

LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
 
Looks like my problem is not tied to the Intel X710 adapter, seems like 1:1 mapping is completely broken on the AMD EPYC M11SDV-8C-LN4F board.
Trying to map an on-board interface, same problem:

Code:
2024-05-01T15:00:02.304416+02:00 epyc kernel: [  782.498806] igb 0000:04:00.1: removed PHC on eno2
2024-05-01T15:00:02.603142+02:00 epyc systemd[1]: Created slice qemu.slice - Slice /qemu.
2024-05-01T15:00:02.618646+02:00 epyc systemd[1]: Started 150.scope.
2024-05-01T15:00:02.717428+02:00 epyc kernel: [  782.911177] vfio-pci 0000:04:00.1: Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.
2024-05-01T15:00:02.757235+02:00 epyc systemd[1]: 150.scope: Deactivated successfully.
2024-05-01T15:00:02.758944+02:00 epyc pvedaemon[3831]: start failed: QEMU exited with code 1
 
Looks like my problem is not tied to the Intel X710 adapter, seems like 1:1 mapping is completely broken on the AMD EPYC M11SDV-8C-LN4F board.
Trying to map an on-board interface, same problem:

Code:
2024-05-01T15:00:02.304416+02:00 epyc kernel: [  782.498806] igb 0000:04:00.1: removed PHC on eno2
2024-05-01T15:00:02.603142+02:00 epyc systemd[1]: Created slice qemu.slice - Slice /qemu.
2024-05-01T15:00:02.618646+02:00 epyc systemd[1]: Started 150.scope.
2024-05-01T15:00:02.717428+02:00 epyc kernel: [  782.911177] vfio-pci 0000:04:00.1: Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.
2024-05-01T15:00:02.757235+02:00 epyc systemd[1]: 150.scope: Deactivated successfully.
2024-05-01T15:00:02.758944+02:00 epyc pvedaemon[3831]: start failed: QEMU exited with code 1
arent there bios updates for this board? maybe that helps
 
  • Like
Reactions: athurdent
Hi, everyone, I just update two PVE servers to kernel: 6.8.4-2-pve version, and one of PVE server encounter USB device issue:
Code:
# pveversion
pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.8.4-2-pve)

# dmesg |grep "usb 2-1.6"
[    3.474356] usb 2-1.6: new full-speed USB device number 5 using ehci-pci
[   19.006357] usb 2-1.6: device descriptor read/64, error -110
[   34.622361] usb 2-1.6: device descriptor read/64, error -110
[   34.810362] usb 2-1.6: new full-speed USB device number 6 using ehci-pci
[   50.242264] usb 2-1.6: device descriptor read/64, error -110
[   65.853891] usb 2-1.6: device descriptor read/64, error -110
[   66.565871] usb 2-1.6: new full-speed USB device number 7 using ehci-pci
[   77.245618] usb 2-1.6: device not accepting address 7, error -110
[   77.325609] usb 2-1.6: new full-speed USB device number 8 using ehci-pci
[   87.997358] usb 2-1.6: device not accepting address 8, error -110

This issue caused my remote KVM console lost keyboard and mouse, I can still use keyboard after login SSH into PVE server, but I can not use keyboard and mouse after login via remote KVM console, This issue is very strange, I am not sure why and how it happen. This two PVE servers are exactly same hardware, and I notice there 2 USB PCI devices in this PVE server, but there 3 USB PCI devices in another PVE server:

Code:
# lspci |grep USB
00:1a.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
00:1d.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 (rev 05)

# lspci |grep USB
00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)
00:1a.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
00:1d.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 (rev 05)

I had try to boot into old version kernel but not help:
Code:
# pveversion
pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.2.16-20-pve)

I also have other old version PVE servers with same hardware, and they are all have two USB PCI devices, therefor, I guess maybe the new Debian or PVE update cause this change.
 
My logs are spammed with this (kernel 6.8). I've stopped the service for now and see if Grafana will get angry with me.
this is easy to fix, there is already an merged PR on their github:
https://github.com/net-snmp/net-snmp/issues/786

But there is still no new version released, i simply added this PR changes back to my old snmp version and compiled/installed.
I suggest you compile yourself too, because they release new versions every 6 months or so.
If youre lucky, maybe ubuntu 24.04 has a fixed version of net-snmp in their repositories already.

Cheers
 
That seems very strange.

Why does enabling IOMMU for PCIe passthrough kill the rpool import?
I do not have an answer to this question, but this happend already during the upgrade from kernel 5.13 to 5.15, where iommu was activated per default.
This only affects this Dell T140 and none of the other hardware i use (e.g. Dell T30 / Dell T40, misc. Fujitsu...).
Maybe it is related to the SATA host controller of this server:

RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] (rev 02)
 
  • Like
Reactions: SInisterPisces
Regarding that ethernet interface renaming that takes place with kernel 6.8 as opposed to kernel 6.5 ...

I understand that you can configure a mapping via /etc/systemd/network/. The documentation suggests to use prefix en or eth.

Currently my interfaces are labelled enp1s0 to enp6s0 (six Intel I210 with igb driver).

Is it possible to - reliably - configure the /etc/systemd/network/ mappings in a way that the naming stays the same?

My PVE is running headless, is passing three of the six NICs via PCI passthrough to an OPNsense VM running and has the other three configured as a bridge.

I really fear the kernel upgrade messing with all of this. I would love if I could just configure the *.link files in a way that I hard-wire the MAC address to their current interface names.

Would that work?
 
Last edited:
Hi,
Is it possible to - reliably - configure the /etc/systemd/network/ mappings in a way that the naming stays the same?
yes, please see: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#network_override_device_names

Sorry, was a bit too quick to answer. But you can also choose the current name to fixate it but there is a chance that it will clash with a new device in the future:
It is recommended to assign a name starting with en or eth so thatProxmox VE recognizes the interface as a physical network device which can then beconfigured via the GUI. Also, you should ensure that the name will not clashwith other interface names in the future. One possibility is to assign a namethat does not match any name pattern that systemd uses for network interfaces(see above), such as enwan0 in theexample above.
 
Last edited:
Hi,

yes, please see: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#network_override_device_names

Sorry, was a bit too quick to answer. But you can also choose the current name to fixate it but there is a chance that it will clash with a new device in the future:
But that's exactly what I'd like to know. Whether I can do

Code:
[Match]
MACAddress=aa:bb:cc:dd:ee:ff

[Link]
Name=enp1s0

For an interface that is already called enp1s0 right now or whether that will result in some clash or race condition.
 
From the above doc:
It is recommended to assign a name starting with en or eth so that Proxmox VE recognizes the interface as a physical network device which can then be configured via the GUI. Also, you should ensure that the name will not clash with other interface names in the future. One possibility is to assign a name that does not match any name pattern that systemd uses for network interfaces (see above), such as enwan0 in the example above.
So I guess you should not use a current name, for this reason alone.
 
For an interface that is already called enp1s0 right now or whether that will result in some clash or race condition.
If you fixate it for all devices, there will be no clash right now. But there might be one when you add a new device in the future. That's why the recommendation to use a name that doesn't match the systemd patterns is there.
 
  • Like
Reactions: cwt and sbellon
EDIT: false alarm, it was as I suspected, the card in the 13th gen system wasn't flashed properly to HBA330 and was still flashed as a H330 Mini. Identical hardware but different firmware. It now works correctly !

Original message:

I have two different systems running Proxmox VE 8.2.2, a Dell PowerEdge 13th gen server and a Dell PowerEdge 14th gen server.
One is not able to passthrough the card, the other is able.

Both have a Dell HBA330 and Mellanox ConnectX-3 card that are configured for PCI Passthrough.

This is the relevant error on the Dell 13th gen server:

dmesg | grep -e DMAR -e IOMMU

[ 0.010869] ACPI: DMAR 0x000000007BAFE000 0000A0 (v01 DELL PE_SC3 00000001 DELL 00000001)
[ 0.010913] ACPI: Reserving DMAR table memory at [mem 0x7bafe000-0x7bafe09f]
[ 0.076946] DMAR: IOMMU enabled
[ 0.215487] DMAR: Host address width 46
[ 0.215489] DMAR: DRHD base: 0x000000fbffc000 flags: 0x1
[ 0.215501] DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.215504] DMAR: RMRR base: 0x00000069f0e000 end: 0x00000071f15fff
[ 0.215509] DMAR: ATSR flags: 0x0
[ 0.215512] DMAR-IR: IOAPIC id 8 under DRHD base 0xfbffc000 IOMMU 0
[ 0.215514] DMAR-IR: IOAPIC id 9 under DRHD base 0xfbffc000 IOMMU 0
[ 0.215516] DMAR-IR: HPET id 0 under DRHD base 0xfbffc000
[ 0.215517] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.215788] DMAR-IR: IRQ remapping was enabled on dmar0 but we are not in kdump mode
[ 0.215883] DMAR-IR: Enabled IRQ remapping in x2apic mode

[ 0.696318] DMAR: [Firmware Bug]: RMRR entry for device 02:00.0 is broken - applying workaround
[ 0.696344] DMAR: No SATC found
[ 0.696346] DMAR: dmar0: Using Queued invalidation
[ 0.700717] DMAR: Intel(R) Virtualization Technology for Directed I/O
[ 52.218990] vfio-pci 0000:02:00.0: Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.



lspci for the 2 devices being passed through

02:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] (rev 02)
03:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]


This is the relevant error on the Dell 14th gen server:

dmesg | grep -e DMAR -e IOMMU

[ 0.010285] ACPI: DMAR 0x000000006FC14000 000108 (v01 DELL PE_SC3 00000001 DELL 00000001)
[ 0.010332] ACPI: Reserving DMAR table memory at [mem 0x6fc14000-0x6fc14107]
[ 0.226339] DMAR: IOMMU enabled
[ 0.623707] DMAR: Host address width 46
[ 0.623708] DMAR: DRHD base: 0x000000c5ffc000 flags: 0x0
[ 0.623720] DMAR: dmar0: reg_base_addr c5ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.623723] DMAR: DRHD base: 0x000000e0ffc000 flags: 0x0
[ 0.623728] DMAR: dmar1: reg_base_addr e0ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.623730] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
[ 0.623735] DMAR: dmar2: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.623737] DMAR: DRHD base: 0x000000aaffc000 flags: 0x1
[ 0.623741] DMAR: dmar3: reg_base_addr aaffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.623743] DMAR: RMRR base: 0x0000006f75f000 end: 0x0000006f761fff
[ 0.623748] DMAR: ATSR flags: 0x0
[ 0.623754] DMAR-IR: IOAPIC id 12 under DRHD base 0xfbffc000 IOMMU 2
[ 0.623756] DMAR-IR: IOAPIC id 11 under DRHD base 0xe0ffc000 IOMMU 1
[ 0.623757] DMAR-IR: IOAPIC id 10 under DRHD base 0xc5ffc000 IOMMU 0
[ 0.623759] DMAR-IR: IOAPIC id 8 under DRHD base 0xaaffc000 IOMMU 3
[ 0.623761] DMAR-IR: IOAPIC id 9 under DRHD base 0xaaffc000 IOMMU 3
[ 0.623762] DMAR-IR: HPET id 0 under DRHD base 0xaaffc000
[ 0.623763] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.624731] DMAR-IR: Enabled IRQ remapping in x2apic mode

[ 1.185064] DMAR: No SATC found
[ 1.185068] DMAR: dmar1: Using Queued invalidation
[ 1.185076] DMAR: dmar0: Using Queued invalidation
[ 1.185080] DMAR: dmar3: Using Queued invalidation
[ 1.192447] DMAR: Intel(R) Virtualization Technology for Directed I/O

lspci for the 2 devices being passed through

18:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
65:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]

As I'm typing this, I'm seeing that the HBA330 card on the 13th gen system is identified as a H330 Mini, I'm going to try to reflash it to HBA330 to see if this fixes the issue. It has worked on this system before using ESXi 7.0.
 
Last edited:
  • Like
Reactions: athurdent
Forgot to mention, both use UEFI but 13th gen is using ZFS for local storage, 14th gen uses LVM for local storage.

Dell 13th gen:
/etc/default/grub: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
/etc/kernel/cmdline(because of ZFS): root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt

Dell 14th gen:
/etc/default/grub: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!