Opt-in Linux 6.8 Kernel for Proxmox VE 8 available on test & no-subscription

nmorgowicz · Apr 26, 2024

I'm running a Supermicro M11SDV-8C-LN4F with an Intel X710-DA2 card, using SR-IOV.
With kernel 6.5 it works well when compiling the i40e driver manually.
With kernel 6.8, the interfaces got renamed (I took care of that and added npX everywhere), but my VMs, all have mapped VFs, will not start anymore:

Code:

[Thu Apr 25 08:18:46 2024] vfio-pci 0000:06:02.0: Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.
[Thu Apr 25 08:18:47 2024] vfio-pci 0000:06:02.1: Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.
[Thu Apr 25 08:18:48 2024] vfio-pci 0000:06:0a.2: Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.

I cannot compile the driver anymore, either.

Code:

root@epyc:~/intel-driver/i40e-2.24.6/src# make install
filtering include/net/flow_keys.h out
filtering include/linux/jump_label_type.h out
filtering include/linux/jump_label_type.h out
*** The target kernel has CONFIG_MODULE_SIG_ALL enabled, but
*** the signing key cannot be found. Module signing has been
*** disabled for this build.
make[1]: Entering directory '/usr/src/linux-headers-6.8.4-2-pve'
  CC [M]  /root/intel-driver/i40e-2.24.6/src/i40e_main.o
/root/intel-driver/i40e-2.24.6/src/i40e_main.c: In function ‘i40e_send_version’:
/root/intel-driver/i40e-2.24.6/src/i40e_main.c:11530:9: error: implicit declaration of function ‘strlcpy’; did you mean ‘strscpy’? [-Werror=implicit-function-declaration]
11530 |         strlcpy(dv.driver_string, DRV_VERSION, sizeof(dv.driver_string));
      |         ^~~~~~~
      |         strscpy
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:243: /root/intel-driver/i40e-2.24.6/src/i40e_main.o] Error 1
make[2]: *** [/usr/src/linux-headers-6.8.4-2-pve/Makefile:1926: /root/intel-driver/i40e-2.24.6/src] Error 2
make[1]: *** [Makefile:240: __sub-make] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-6.8.4-2-pve'

Had to revert to 6.5 for now. Any hints?

May want to try the new driver from intel: https://sourceforge.net/projects/e1000/files/i40e stable/

Looks like it was just posted a few hours ago.

athurdent · Apr 26, 2024

nmorgowicz said:
May want to try the new driver from intel: https://sourceforge.net/projects/e1000/files/i40e stable/

Looks like it was just posted a few hours ago.

Thanks a bunch! While that one builds fine on 6.8 (and renames the interfaces back to 6.5 naming scheme, heads-up ), my VMs still won't start.

Code:

2024-04-26T10:01:06.615661+02:00 epyc pvedaemon[2472]: start VM 150: UPID:epyc:000009A8:0000865F:662B5F42:qmstart:150:root@pam:
2024-04-26T10:01:06.616051+02:00 epyc pvedaemon[1199]: <root@pam> starting task UPID:epyc:000009A8:0000865F:662B5F42:qmstart:150:root@pam:
2024-04-26T10:01:06.869810+02:00 epyc systemd[1]: Created slice qemu.slice - Slice /qemu.
2024-04-26T10:01:06.877831+02:00 epyc systemd[1]: Started 150.scope.
2024-04-26T10:01:06.974713+02:00 epyc kernel: [  344.971738] vfio-pci 0000:06:0a.2: Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.
2024-04-26T10:01:07.030297+02:00 epyc systemd[1]: 150.scope: Deactivated successfully.
2024-04-26T10:01:07.032053+02:00 epyc pvedaemon[2472]: start failed: QEMU exited with code 1
2024-04-26T10:01:07.040383+02:00 epyc pvedaemon[1199]: <root@pam> end task UPID:epyc:000009A8:0000865F:662B5F42:qmstart:150:root@pam: start failed: QEMU exited with code 1

FWIW, this is my config for that VF:

Code:

/usr/bin/ip link set enp6s0f1 vf 2 mac ba:dd:51:de:00:02
/usr/bin/ip link set enp6s0f1 vf 2 vlan 666

Hi @husseyg , nice to mee you here, too! The above is my speed test server for all my UI testing...

leesteken · Apr 26, 2024

The amdgpu driver on kernel version 6.8.1-1 and 6.8.4-2 crashes my AMD Radeon RX570, while the earlier kernels worked fine. It still works fine with RX6950XT and this issue can be worked-around by blacklisting amdgpu, but I cannot use the RX570 for the Proxmox host console after VM shutdown (or run a backup). Anyone else having similar issues?

Part of the (too large) crash journalctl:

Code:

apr 26 12:11:12 toro kernel: vfio-pci 0000:0b:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=io+mem:owns=none
apr 26 12:11:13 toro kernel: amdgpu 0000:0b:00.0: enabling device (0400 -> 0403)
apr 26 12:11:13 toro kernel: [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1DA2:0xE343 0xEF).
apr 26 12:11:13 toro kernel: [drm] register mmio base: 0xFCE00000
apr 26 12:11:13 toro kernel: [drm] register mmio size: 262144
apr 26 12:11:13 toro kernel: [drm] add ip block number 0 <vi_common>
apr 26 12:11:13 toro kernel: [drm] add ip block number 1 <gmc_v8_0>
apr 26 12:11:13 toro kernel: [drm] add ip block number 2 <tonga_ih>
apr 26 12:11:13 toro kernel: [drm] add ip block number 3 <gfx_v8_0>
apr 26 12:11:13 toro kernel: [drm] add ip block number 4 <sdma_v3_0>
apr 26 12:11:13 toro kernel: [drm] add ip block number 5 <powerplay>
apr 26 12:11:13 toro kernel: [drm] add ip block number 6 <dm>
apr 26 12:11:13 toro kernel: [drm] add ip block number 7 <uvd_v6_0>
apr 26 12:11:13 toro kernel: [drm] add ip block number 8 <vce_v3_0>
apr 26 12:11:13 toro kernel: amdgpu 0000:0b:00.0: amdgpu: Fetched VBIOS from VFCT
apr 26 12:11:13 toro kernel: amdgpu: ATOM BIOS: 113-D00034-L01
apr 26 12:11:13 toro kernel: [drm] UVD is enabled in VM mode
apr 26 12:11:13 toro kernel: [drm] UVD ENC is enabled in VM mode
apr 26 12:11:13 toro kernel: [drm] VCE enabled in VM mode
apr 26 12:11:13 toro kernel: amdgpu 0000:0b:00.0: vgaarb: deactivate vga console
apr 26 12:11:13 toro kernel: amdgpu 0000:0b:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
apr 26 12:11:13 toro kernel: [drm] GPU posting now...
apr 26 12:11:13 toro kernel: [drm] vm size is 256 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
apr 26 12:11:13 toro kernel: amdgpu 0000:0b:00.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
apr 26 12:11:13 toro kernel: amdgpu 0000:0b:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
apr 26 12:11:13 toro kernel: [drm] Detected VRAM RAM=4096M, BAR=256M
apr 26 12:11:13 toro kernel: [drm] RAM width 256bits GDDR5
apr 26 12:11:13 toro kernel: [drm] amdgpu: 4096M of VRAM memory ready
apr 26 12:11:13 toro kernel: [drm] amdgpu: 32113M of GTT memory ready.
apr 26 12:11:13 toro kernel: [drm] GART: num cpu pages 65536, num gpu pages 65536
apr 26 12:11:13 toro kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
apr 26 12:11:13 toro kernel: [drm] Chained IB support enabled!
apr 26 12:11:13 toro kernel: amdgpu: hwmgr_sw_init smu backed is polaris10_smu
apr 26 12:11:13 toro kernel: [drm] Found UVD firmware Version: 1.130 Family ID: 16
apr 26 12:11:13 toro kernel: [drm] Found VCE firmware Version: 53.26 Binary ID: 3
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: [drm:resource_construct [amdgpu]] *ERROR* DC: unexpected audio fuse!
apr 26 12:11:14 toro kernel: [drm] Display Core v3.2.266 initialized on DCE 11.2
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: [drm] *ERROR* No EDID read.
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: [drm] *ERROR* No EDID read.
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: [drm] *ERROR* No EDID read.
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu:
                             last message was failed ret is 65535
apr 26 12:11:14 toro kernel: ------------[ cut here ]------------
apr 26 12:11:14 toro kernel: WARNING: CPU: 2 PID: 5101 at drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:1112 uvd_v6_0_ring_insert_nop+0xf8/0x130 [amdgpu]
apr 26 12:11:14 toro kernel: Modules linked in: cfg80211 veth ebt_arp ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw xt_mac ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp iptable_filter ip_set_hash_net ip_set nf_tables sunrpc binfmt_misc 8021q garp mrp bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd snd_hda_intel kvm snd_intel_dspcfg snd_intel_sdw_acpi vhost_net snd_hda_codec vhost crct10dif_pclmul vhost_iotlb polyval_clmulni snd_hda_core tap polyval_generic ghash_clmulni_intel sha256_ssse3 snd_hwdep nct6775 sha1_ssse3 nct6775_core snd_pcm aesni_intel hwmon_vid snd_timer crypto_simd cryptd input_leds snd rapl soundcore apple_mfi_fastcharge pcspkr wmi_bmof mxm_wmi k10temp ccp joydev mac_hid amdgpu amdxcp drm_exec gpu_sched drm_buddy drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core video efi_pstore
apr 26 12:11:14 toro kernel:  dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c usbmouse hid_generic usbkbd usbhid hid xhci_pci vfio_pci vendor_reset(OE) vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd nvme xhci_pci_renesas igb crc32_pclmul e1000e i2c_piix4 i2c_algo_bit xhci_hcd ahci nvme_core dca libahci nvme_auth wmi gpio_amdpt
apr 26 12:11:14 toro kernel: CPU: 2 PID: 5101 Comm: desktop.sh Tainted: P           OE      6.8.1-1-pve #1
apr 26 12:11:14 toro kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470 Master SLI, BIOS P4.90 11/01/2022
apr 26 12:11:14 toro kernel: RIP: 0010:uvd_v6_0_ring_insert_nop+0xf8/0x130 [amdgpu]
apr 26 12:11:14 toro kernel: Code: 01 00 00 83 e8 01 89 83 e0 01 00 00 45 39 ec 74 24 85 c0 0f 8f 5b ff ff ff 48 c7 c7 f8 e0 c6 c1 e8 7d 78 14 ed e9 4a ff ff ff <0f> 0b 41 d1 ec 0f 85 31 ff ff ff 5b 41 5c 41 5d 5d 31 c0 31 d2 31
apr 26 12:11:14 toro kernel: RSP: 0018:ffffadf0550e7a60 EFLAGS: 00010202
apr 26 12:11:14 toro kernel: RAX: ffffffffc129f2c0 RBX: ffff8b3a7c02bb08 RCX: 0000000000000010
apr 26 12:11:14 toro kernel: RDX: 000000000000000f RSI: 000000000000000f RDI: ffff8b3a7c02bb08
apr 26 12:11:14 toro kernel: RBP: ffffadf0550e7a78 R08: 0000000000000000 R09: 0000000000000000
apr 26 12:11:14 toro kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000f
apr 26 12:11:14 toro kernel: R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000000
apr 26 12:11:14 toro kernel: FS:  0000765497095740(0000) GS:ffff8b46be300000(0000) knlGS:0000000000000000
apr 26 12:11:14 toro kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
apr 26 12:11:14 toro kernel: CR2: 00007f91d77fde00 CR3: 0000000163264000 CR4: 00000000003506f0
apr 26 12:11:14 toro kernel: Call Trace:
apr 26 12:11:14 toro kernel:  <TASK>
apr 26 12:11:14 toro kernel:  ? show_regs+0x6d/0x80
apr 26 12:11:14 toro kernel:  ? __warn+0x89/0x160
apr 26 12:11:14 toro kernel:  ? uvd_v6_0_ring_insert_nop+0xf8/0x130 [amdgpu]
apr 26 12:11:14 toro kernel:  ? report_bug+0x17e/0x1b0
apr 26 12:11:14 toro kernel:  ? handle_bug+0x46/0x90
apr 26 12:11:14 toro kernel:  ? exc_invalid_op+0x18/0x80
apr 26 12:11:14 toro kernel:  ? asm_exc_invalid_op+0x1b/0x20
apr 26 12:11:14 toro kernel:  ? __pfx_uvd_v6_0_ring_insert_nop+0x10/0x10 [amdgpu]
apr 26 12:11:14 toro kernel:  ? uvd_v6_0_ring_insert_nop+0xf8/0x130 [amdgpu]
apr 26 12:11:14 toro kernel:  amdgpu_ring_commit+0x39/0x80 [amdgpu]
apr 26 12:11:14 toro kernel:  uvd_v6_0_ring_test_ring+0xf6/0x180 [amdgpu]
apr 26 12:11:14 toro kernel:  amdgpu_ring_test_helper+0x21/0x90 [amdgpu]
apr 26 12:11:14 toro kernel:  uvd_v6_0_hw_init+0x97/0x620 [amdgpu]
apr 26 12:11:14 toro kernel:  amdgpu_device_init+0x1fcc/0x26e0 [amdgpu]
apr 26 12:11:14 toro kernel:  amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
apr 26 12:11:14 toro kernel:  amdgpu_pci_probe+0x195/0x520 [amdgpu]
apr 26 12:11:14 toro kernel:  local_pci_probe+0x47/0xb0
apr 26 12:11:14 toro kernel:  pci_device_probe+0xc5/0x260
apr 26 12:11:14 toro kernel:  really_probe+0x1cc/0x430
apr 26 12:11:14 toro kernel:  __driver_probe_device+0x8c/0x190
apr 26 12:11:14 toro kernel:  device_driver_attach+0x55/0xd0
apr 26 12:11:14 toro kernel:  bind_store+0x77/0xd0
apr 26 12:11:14 toro kernel:  drv_attr_store+0x24/0x50
apr 26 12:11:14 toro kernel:  sysfs_kf_write+0x3e/0x60
apr 26 12:11:14 toro kernel:  kernfs_fop_write_iter+0x133/0x210
apr 26 12:11:14 toro kernel:  vfs_write+0x2a8/0x480
apr 26 12:11:14 toro kernel:  ksys_write+0x73/0x100
apr 26 12:11:14 toro kernel:  __x64_sys_write+0x19/0x30
apr 26 12:11:14 toro kernel:  do_syscall_64+0x87/0x180
apr 26 12:11:14 toro kernel:  ? srso_return_thunk+0x5/0x5f
apr 26 12:11:14 toro kernel:  ? filp_flush+0x57/0x90
apr 26 12:11:14 toro kernel:  ? srso_return_thunk+0x5/0x5f
apr 26 12:11:14 toro kernel:  ? syscall_exit_to_user_mode+0x86/0x260
apr 26 12:11:14 toro kernel:  ? srso_return_thunk+0x5/0x5f
apr 26 12:11:14 toro kernel:  ? do_syscall_64+0x93/0x180
apr 26 12:11:14 toro kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
apr 26 12:11:14 toro kernel: RIP: 0033:0x765497190240
apr 26 12:11:14 toro kernel: Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
apr 26 12:11:14 toro kernel: RSP: 002b:00007fffb8504818 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
apr 26 12:11:14 toro kernel: RAX: ffffffffffffffda RBX: 000000000000000d RCX: 0000765497190240
apr 26 12:11:14 toro kernel: RDX: 000000000000000d RSI: 000058999e267990 RDI: 0000000000000001
apr 26 12:11:14 toro kernel: RBP: 000058999e267990 R08: 0000000000000007 R09: 0000000000000073
apr 26 12:11:14 toro kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 000000000000000d
apr 26 12:11:14 toro kernel: R13: 000076549726b760 R14: 000000000000000d R15: 00007654972669e0
apr 26 12:11:14 toro kernel:  </TASK>
apr 26 12:11:14 toro kernel: ---[ end trace 0000000000000000 ]---
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring uvd test failed (-110)
apr 26 12:11:14 toro kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <uvd_v6_0> failed -110
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu: amdgpu_device_ip_init failed
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu: Fatal error during GPU init
apr 26 12:11:14 toro kernel: amdgpu 0000:0b:00.0: amdgpu: amdgpu: finishing device.

The RX570 still works fine when passed through to a Linux VM (Linux kernel 5.15 or so).

thomas79 · Apr 26, 2024

Hi all,
also on a supermicro based server here. I use both onboard NICS. One for the VMs and one for accessing the webinterface. Also have 4-port NIC-PCI-card installed but not used yet. Today I did a restart of the host and the onboard NICs where just gone with kernel 6.8. After the heart attack I rebooted to Kernel 6.5 and all went good again. Is Supermicro such a exotic board?

athurdent · Apr 26, 2024

thomas79 said:
Hi all,
also on a supermicro based server here. I use both onboard NICS. One for the VMs and one for accessing the webinterface. Also have 4-port NIC-PCI-card installed but not used yet. Today I did a restart of the host and the onboard NICs where just gone with kernel 6.8. After the heart attack I rebooted to Kernel 6.5 and all went good again. Is Supermicro such a exotic board?

They may have gotten a new name, see here: https://pve.proxmox.com/wiki/Roadmap#8.2-known-issues
ifconfig -a should show the new name.

thomas79 · Apr 26, 2024

athurdent said:
They may have gotten a new name, see here: https://pve.proxmox.com/wiki/Roadmap#8.2-known-issues
ifconfig -a should show the new name.

thanks for the fast reply. Yep, i red this but mine where just gone. I only had the 4 interfaces of the pci card visible with ip -a. To be honest I didnt check ifconfig -a. Hmmm...next time I will do the restart with a bigger time slot and investigate a little bit more.

Ramalama · Apr 26, 2024

thomas79 said:
thanks for the fast reply. Yep, i red this but mine where just gone. I only had the 4 interfaces of the pci card visible with ip -a. To be honest I didnt check ifconfig -a. Hmmm...next time I will do the restart with a bigger time slot and investigate a little bit more.

May i ask which NIC?
Im just asking, because an Mellanox for example could have been switched to infiniband mode (So you dont see the usual Lan interfaces), but you can for example switch them back to network mode with the mellanox utilitly.
But thats just mellanox.

Im wondering, because i have Intel E810 / Mellanox ConnectX4/5 / X550 and all are working without issues with kernel 6.8.X.
I gathered in this thread that most people that have renaming or other issues have either an Broadcom NIC, or an Intel 710 NIC and i don't have any of those sadly in any Server.

Cheers

thomas79 · Apr 26, 2024

Ramalama said:
May i ask which NIC?
Im just asking, because an Mellanox for example could have been switched to infiniband mode (So you dont see the usual Lan interfaces), but you can for example switch them back to network mode with the mellanox utilitly.
But thats just mellanox.

Im wondering, because i have Intel E810 / Mellanox ConnectX4/5 / X550 and all are working without issues with kernel 6.8.X.
I gathered in this thread that most people that have renaming or other issues have either an Broadcom NIC, or an Intel 710 NIC and i don't have any of those sadly in any Server.

Cheers

Yes, its a Intel X722. The one that stayed is an Intel I350.

Ramalama · Apr 26, 2024

thomas79 said:
Yes, its a Intel X722. The one that stayed is an Intel I350.

Code:

;X722
BEGIN DEVICE
DEVICENAME: X722
VENDOR: 8086
DEVICE: 37D0
NVM IMAGE: LBG_B2_6p20_CF_2x10G.bin
EEPID: 80003D82
REPLACES: 80001571 80001A3C 80001DEF 8000207F 800023C3 8000265A 8000275D 80002A29 80002E3F 80003327 800035CB 800039EC
OROM IMAGE: BootIMG.FLB
EEPROM MAP: iSCSI.txt
RESET TYPE: REBOOT
END DEVICE

BEGIN DEVICE
DEVICENAME: X722
VENDOR: 8086
DEVICE: 37D0
NVM IMAGE: LBG_B2_6p20_CF_FH4x10G.bin
EEPID: 80003D8D
REPLACES: 8000156F 80001A3D 80001DF0 80002080 800023C4 8000265B 8000275E 80002AC1 80002E3A 80003338 800035D5 800039F5
OROM IMAGE: BootIMG.FLB
EEPROM MAP: iSCSI.txt
RESET TYPE: REBOOT
END DEVICE

BEGIN DEVICE
DEVICENAME: X722
VENDOR: 8086
DEVICE: 37D0
NVM IMAGE: LBG_B2_6p20_CF_LP4x10G.bin
EEPID: 80003D95
REPLACES: 80001570 80001A3E 80001DF1 80002087 800023C5 8000265C 8000275F 80002AC9 80002E40 80003340 800035E1 800039FB
OROM IMAGE: BootIMG.FLB
EEPROM MAP: iSCSI.txt
RESET TYPE: REBOOT
END DEVICE

https://www.intel.com/content/www/u...ntel-ethernet-network-adapter-700-series.html

Intel Firmware 9.40 for the X710 Adapters includes some Updates (above) for X722.
I would try that at least, im updating all my intel Cards from time to time either.
You can simply download the package, extract directly the "700Series_NVMUpdatePackage_v9_40_Linux.tar.gz" on Proxmox, then chmod +x nvmupdate64e and execute it without arguments.
It will check for compatible adapters and ask you if you want to upgrade if its compatible.

1. Do not force an update to an unsupported branded NIC, i did this once and the Second NIC Port stopped working, only one worked, that was with an Intel x550 Asrock branded card. Glad god the Intel tool asks you if you want to make a backup first of your actual Firmware, so i rolled back with the backup.
2. HP/Dell and others (Branded Intel NICs) Provide usually their own Firmware, you can try to get them.
3. There is no harm to update if the nvmupdate64e allows you, just make a backup of the actual FW and dont force to update an unsupported Branded NIC, like i did with my X550 xD

Otherwise the Process is very straightforward and simple, im buying for that reason always unbranded Nics if i can (Same for Mellanox), to have the ability to update the Firmware. And never had issues, most of the time i just got a better sfp compability and no other benefits that i realized, but never the less, if possible it might be the right step for your problem.
Just try and see.

athurdent · Apr 26, 2024

When flashing newer NVM, also make sure to not have any VMs running. They usually loose network and e.g. an OPNsense VM refuses to shut down in a timely fashion afterwards. I accidentally flashed the latest one with a VM active, which I had to shut down hard afterwards.

tomten · Apr 26, 2024

leesteken said:
The amdgpu driver on kernel version 6.8.1-1 and 6.8.4-2 crashes my AMD Radeon RX570, while the earlier kernels worked fine. It still works fine with RX6950XT and this issue can be worked-around by blacklisting amdgpu, but I cannot use the RX570 for the Proxmox host console after VM shutdown (or run a backup). Anyone else having similar issues?

...

The RX570 still works fine when passed through to a Linux VM (Linux kernel 5.15 or so).

Not exactly the same but I run a LXC with kodi on a AMD 4800H box and with a 6.8 kernel I get a GUI but it's sluggish and crashes, works without a problem on 6.5. I'm not seeing any crashes from the kernel though, in fact there are no errors.

athurdent · Apr 26, 2024

For my error message, this seems to be the relevant commit. But I am in the dark what that means...

Code:

> +    /*
> +     * If the driver has requested IOMMU_RESV_DIRECT then we cannot allow
> +     * the blocking domain to be attached as it does not contain the
> +     * required 1:1 mapping. This test effectively exclusive the device from
> +     * being used with iommu_group_claim_dma_owner() which will block vfio
> +     * and iommufd as well.
> +     */
> +    if (dev->iommu->requires_direct &&
> +        (new_domain->type == IOMMU_DOMAIN_BLOCKED ||
> +         new_domain == group->blocking_domain)) {
> +        dev_warn(dev,
> +             "Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.\n");
> +        return -EINVAL;
> +    }
> +

https://lore.kernel.org/lkml/8cc1d69e-f86d-fd04-7737-914d967dc0f5@intel.com/

Perhaps someone that knows about this stuff can help?

Daniel Keller · Apr 26, 2024

cwt said:
May I ask which BIOS version you have? I'm currently on 1.4 and SuperMicro offers 1.6b within their firmware bundle. Had to blacklist the driver.

I also have BIOS version 1.4 and have also solved it with blacklisting

i'm waiting for kernel 6.8.7 which has some bugfixes for bnxt_en before i try a firmware update

merasil · Apr 26, 2024

athurdent said:

For my error message, this seems to be the relevant commit. But I am in the dark what that means...

Code:

> +    /*
> +     * If the driver has requested IOMMU_RESV_DIRECT then we cannot allow
> +     * the blocking domain to be attached as it does not contain the
> +     * required 1:1 mapping. This test effectively exclusive the device from
> +     * being used with iommu_group_claim_dma_owner() which will block vfio
> +     * and iommufd as well.
> +     */
> +    if (dev->iommu->requires_direct &&
> +        (new_domain->type == IOMMU_DOMAIN_BLOCKED ||
> +         new_domain == group->blocking_domain)) {
> +        dev_warn(dev,
> +             "Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.\n");
> +        return -EINVAL;
> +    }
> +

https://lore.kernel.org/lkml/8cc1d69e-f86d-fd04-7737-914d967dc0f5@intel.com/

Perhaps someone that knows about this stuff can help?

seems also to be my issue, but i cant get any clue what this is...

Dark26 · Apr 26, 2024

Since kernel 6.8, my serveur crash after 1 day ( processor n5105 mini pc) . With the previous kernel ( 6.5 ), more than 120 day without any crash.

reboot and back to 6.5, hoping it's only a kernel problem...

sirebral · Apr 28, 2024

Same here, running a 2-node cluster on Supermicro. Boxes are both Intel Gen 1 scalable processors. I may just hold off as I'm about to rebuild the cluster, regardless. However, I can't presently run on the 6.8 branch for more than a few hours. Yet to isolate errors, yet quite a few components crash on a regular-basis. No such issues existed before the new kernel release. I'm back on 6.5 and holding there until my reload.

bdbz · Tuesday at 01:28

I too am running a supermicro cluster and have experienced issues with the 6.8 kernel. This was my process to get things back in working order for the time being:

Bash:

nano /etc/apt/preferences.d/proxmox-default-kernel

Code:

Package: proxmox-default-kernel
Pin: version 1.0.1
Pin-Priority: 1000

Bash:

apt install proxmox-kernel-6.5.13-5-pve

pve-efiboot-tool kernel list
pve-efiboot-tool kernel pin 6.5.13-5-pve

reboot now

dpkg -P proxmox-kernel-6.8 proxmox-kernel-6.8.4-2-pve-signed

reboot now

Your package versions may vary. The dpkg -P command will report errors, but if you rerun pve-efiboot-tool kernel list you will see the kernel is no longer there.

I had to reconfigure the /etc/network/interfaces to deal with the older NIC names.

If you have issues with network coming up after fresh boot run systemctl restart networking.service

sirebral · Tuesday at 03:52

Thanks; I'm hoping it's just out of branch drivers. These boxes have run for 2 years; I've messed with them quite a bit, and they need a refresh. I'm adding a cluster node and converting from ZFS to CEPH, so a full overhaul. I don't use the onboard NIC's; rather, I have Intel 500 series in all nodes, so I'm hoping that they will work fine. I appreciate your insight, regardless.

Der Harry · Tuesday at 08:05

bdbz said:
I too am running a supermicro cluster and have experienced issues with the 6.8 kernel. This was my process to get things back in working order for the time being:

I created a thread for 6.8 issuers. Probably you can also send us a crashlog

https://forum.proxmox.com/threads/proxmox-freeze-nach-kernel-update-to-6-8-4-2-pve.145920/

athurdent · Tuesday at 08:49

I have tried a lot of combinations of IOMMU kernel parameters I thought might be remotely affecting 1:1 mapping, to no avail.
Still completely at loss why this happens on my Supermicro EPYC system:

Code:

Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.

@Stoiko Ivanov any chance of backing out this patch:

Code:

> +    /*
> +     * If the driver has requested IOMMU_RESV_DIRECT then we cannot allow
> +     * the blocking domain to be attached as it does not contain the
> +     * required 1:1 mapping. This test effectively exclusive the device from
> +     * being used with iommu_group_claim_dma_owner() which will block vfio
> +     * and iommufd as well.
> +     */
> +    if (dev->iommu->requires_direct &&
> +        (new_domain->type == IOMMU_DOMAIN_BLOCKED ||
> +         new_domain == group->blocking_domain)) {
> +        dev_warn(dev,
> +             "Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.\n");
> +        return -EINVAL;
> +    }
> +

Opt-in Linux 6.8 Kernel for Proxmox VE 8 available on test & no-subscription

New Member

Active Member

Distinguished Member

New Member

Active Member

New Member

Well-Known Member

New Member

Well-Known Member

Active Member

Member

Active Member

Renowned Member

Member

Well-Known Member

Member

New Member

Member

Member

Active Member