Opt-in Linux 6.5 Kernel with ZFS 2.2 for Proxmox VE 8 available on test & no-subscription

t.lamprecht · Nov 18, 2023

AWP Technologies said:
Do I need to manually add this or should I stay on 6.5.11-2 and wait for update?

Either can work.
Interface renaming is a once-on-boot thing, so if it's currently working for you, it will stay that way, well at least until the next reboot.
If you add those lines manually you'd hedge against needing to reboot earlier than an update is ready.

That said, I'll re-check this now a bit more closely and can also test it better for this regression, so there probably will be an update soon – but just to manage expectations: this is the test repo, it is the weekend, and I cannot promise any fixed timeline, as always.

tew · Nov 18, 2023

t.lamprecht said:
I think I've got it.
The kernel is a red herring, the cause is the new systemd default link policy shipped by the pve-manager package bumped yesterday.

E.g., if you adde the following to lines below the [link] section in /usr/lib/systemd/network/98-proxmox-ve-default.link and reboot the kernel should not matter again anymore.

Code:

NamePolicy=keep kernel database onboard slot path AlternativeNamesPolicy=database onboard slot path

This puts also back some sense in why my kernel testing yesterday didn't observe this already, I only installed the new kernel manually, not pulling in the new default-link yet (that was bumped only later).

We'll look into handling the default better, i.e., go back to the 99-deault-link.d snippet approach where all configs are merged, or take in above properties in ours – will need to recheck the discussion I had with a colleague (who favored the separate file a bit more).

Rebooting into the old kernel seems to make udev re-use the name that was previously assigned to that, thus this was seemingly fix an issue and make it look like the kernel is at fault, while it really wasn't (that's my working theory, will focus on the fix before checkling that more closely).

Managed to apply the fix using Ubuntu Live USB - works perfectly, thanks!!

AWP Technologies · Nov 18, 2023

t.lamprecht said:
Either can work.
Interface renaming is a once-on-boot thing, so if it's currently working for you, it will stay that way, well at least until the next reboot.
If you add those lines manually you'd hedge against needing to reboot earlier than an update is ready.

That said, I'll re-check this now a bit more closely and can also test it better for this regression, so there probably will be an update soon – but just to manage expectations: this is the test repo, it is the weekend, and I cannot promise any fixed timeline, as always.

of course thank you for taking a look at it as soon as you could. I have one other question. During my trouble shooting i ran proxmox-boot-tool reinit to see if it had anything to do with the boot config. This shouldn't cause any problems should it? from what i understood this just rebuilds the boot manager. i also was playing with sgdisk checking partitions and i accidentally ran sgdisk -h thinking it would show the help menu. This is instead linked to --hybrid option. I did not list any devices in command. This shouldn't hurt anything either should it. From what i read if you don't list a partition and device it shouldn't have an effect? If you would just confirm this for me.

AWP Technologies · Nov 18, 2023

AWP Technologies said:
of course thank you for taking a look at it as soon as you could. I have one other question. During my trouble shooting i ran proxmox-boot-tool reinit to see if it had anything to do with the boot config. This shouldn't cause any problems should it? from what i understood this just rebuilds the boot manager. i also was playing with sgdisk checking partitions and i accidentally ran sgdisk -h thinking it would show the help menu. This is instead linked to --hybrid option. I did not list any devices in command. This shouldn't hurt anything either should it. From what i read if you don't list a partition and device it shouldn't have an effect? If you would just confirm this for me.

also i understand testing is bound to have issues. Im glad i could help identify one for the rest of us users. Proxmox has been a wonderful ride.

t.lamprecht · Nov 18, 2023

AWP Technologies said:
of course thank you for taking a look at it as soon as you could. I have one other question. During my trouble shooting i ran proxmox-boot-tool reinit to see if it had anything to do with the boot config. This shouldn't cause any problems should it? from what i understood this just rebuilds the boot manager.

I think this question would better fit into its own thread as it's not really related to the new 6.5 opt-in kernel.
But no, proxmox-boot-tool reinit should not be destructive, it just copies over the bootloader (grub or systemd-boot) to the efi system partition, if that is set up and used for early booting.

AWP Technologies said:
i also was playing with sgdisk checking partitions and i accidentally ran sgdisk -h thinking it would show the help menu. This is instead linked to --hybrid option. I did not list any devices in command. This shouldn't hurt anything either should it. From what i read if you don't list a partition and device it shouldn't have an effect? If you would just confirm this for me.

Yeah, sgdisk is very feature full but not the easiest user experience, I nowadays check the manual page via, e.g., man sgdisk first most of the time to avoid fallout by my and the tools authors understanding of command line interfaces clashing.
Anyhow, yes, if you did not pass any devices, sgdisk won't do anything.

t.lamprecht · Nov 18, 2023

I re-visited the chat I had with a co-worker about the location and re-checked the systemd docs, and I think that here we're for now better of in extending the existing 99-default.link policy and add our override for the MACAddressPolicy as drop-in file there, as those get merged so the defaults that we don't want to touch still are in effect. We originally did not choose to do that to make it slightly easier for users to do system-wide override, but after re-checking more closely, it should be easy enough to do so with drop-in's too.

tl;dr: should be fixed with pve-manager version 8.0.9 that just uploaded to the pvetest repository (after triple reboot testing).

herm · Nov 18, 2023

pve-manager version 8.0.9 fixed it for me !

Ernst T. · Nov 18, 2023

Das Update von 6.5.11-1/pve-8.0.8 auf Kernel 6.5.11-3 und pve-manager/8.0.9/fd1a0ae1b385cdcd hat bei mir problemlos funktioniert!

pschneider1968 · Nov 19, 2023

Instantus said:
Kernel 6.5.3-1-pve (like the latest 6.2) does not boot reliably from a Lexar NM790 4TB SSD. Sometimes the kernel boots, sometimes the boot process aborts with "nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0", dropping to BusyBox. A patch is available since 6.5.5 in the mainline kernel. If upstream does not upgrade or backport the patch in the near future, please consider adding it to the PVE build. Multiple users already encountered this issue. Responses in the thread that this build fixed the issue are not correct (tested on NUC7i5).

Does adding "rootdelay=60" or even more help with this issue? I need this on my setup too. Without it, I get a similar error message, something along the line of "Device initialization timed out", I don't remember exactly. However my machine needed this delay already since the time of PVE 5.x kernel.

janssensm · Nov 19, 2023

Seeing there was an updated zfsutils-linux: 2.2.0-pve3 already in the no-subscription repo, with changelog:

Code:

 * avoid error from zfs-mount when /etc/exports.d does not exist (yet)

  * ensure vdev_stat struct layout compat between 2.1 and 2.2, avoiding
    false-positive detection of the non-allocating feature from 2.2 when the
    kernel still used the 2.1 module.

After a full-upgrade from current no-subscription repo (I first disabled pvetest) the issues I reported here and here are both solved.
Tested with current kernel 6.2.16-19-pve and 6.5.11-3-pve. Thanks for fixing @t.lamprecht and @Stoiko Ivanov.

I also really appreciate proxmox staff engaging and contributing upstream to the openzfs project, so others outside our nice proxmox community may benefit as well

t.lamprecht · Nov 19, 2023

janssensm said:
After a full-upgrade from current no-subscription repo (I first disabled pvetest) the issues I reported here and here are both solved.

Thanks for your feedback.
And yes, as those two issues could be fixed we felt that it's safe enough to move the packages over to pve-no-subscription.
The 6.5 kernel is (for now) still opt-in, but does not need activating the test repo anymore.

Ernst T. · Nov 19, 2023

t.lamprecht said:
should be fixed with pve-manager version 8.0.9

Just for information.

I found a small problem after yesterday's updates from the "pvetest" repository.

Since then, completed backup jobs always send two emails, regardless of the notification setting. Even if no mail address is set at all (then only one mail is sent).

The menu item for switching the notifications from Mail to Target has also disappeared.

Have a nice Sunday ...

t.lamprecht · Nov 19, 2023

Ernst T. said:
Since then, completed backup jobs always send two emails, regardless of the notification setting. Even if no mail address is set at all (then only one mail is sent).

This is due to the new notification framework, you can either drop the mail from the jobs to use the new infrastructure, or edit/disable the default matcher under Datacenter -> Notifications.

For backup jobs we still need to do the full-switch to the new infrastructure, it's a bit more intertwined there, and we did not want to rush that part. We'll look into making the upgrade a bit smoother in the future, so that ideally no duplicate mails are produced.

Ernst T. said:
The menu item for switching the notifications from Mail to Target has also disappeared.

The old one that was there for a few weeks in Datacenter? Yeah, that got reworked to a more unified system, avoiding the need for a hard-coded list and group/filters by replacing them with a more flexible and general matcher-based notification routing.

Please also note that this is not dependent on the 6.5 kernel this thread talks about, would be great if you could open a new thread for any further input or question about this topic; still thanks for your feedback!

ivenae · Nov 19, 2023

Also run into the network issue on Saturday night.
Rebooting 6.5.3 instead of 6.5.11-3 worked for me.
It should not, if i read the thread here.

vesalius · Nov 20, 2023

Hello all,

Currently running Proxmox 8.0.4 with kernel 5.15.108 on 2 of my Proxmox nodes as that is the last working kernel for them. The Atom c3758 cpu with on MB x553 10g SFP+ x4 fails networking on all the proxmox 6.2 kernels. Similar issues seen from others on the intel x553 SFP+ adapter with VyOS and ubuntu installs, that can be fixed with the latest intel out-of-tree ixgbe module/driver.

Would like to test the latest 6.5 Proxmox kernel on one of these 2 nodes, but when I installed on another node that does not have x553 networking I noticed the older 5.15.108 kernel was automatically removed leaving just the immediate prior 6.2 kernel and new 6.5 kernel. Any way to install without removing the older 5.15 kernel, so that I have it as a fallback? will the ZFS 2.2 run fine, even if lacking new features with the 5.15 kernel?

Neobin · Nov 20, 2023

Both of my VMs with PCIe-passthrough (which run perfectly fine on all the 6.2 kernels) can not start anymore on the 6.5 kernel.

One of them, as an example:

Bash:

Nov 20 06:08:29 pve2 pvedaemon[3220]: start VM 201: UPID:pve2:00000C94:000054EE:655AE9CD:qmstart:201:root@pam:
Nov 20 06:08:29 pve2 pvedaemon[1813]: <root@pam> starting task UPID:pve2:00000C94:000054EE:655AE9CD:qmstart:201:root@pam:
Nov 20 06:08:30 pve2 kernel: ------------[ cut here ]------------
Nov 20 06:08:30 pve2 kernel: kernel BUG at mm/migrate.c:654!
Nov 20 06:08:30 pve2 kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Nov 20 06:08:30 pve2 kernel: CPU: 10 PID: 3220 Comm: task UPID:pve2: Tainted: P           O       6.5.11-3-pve #1
Nov 20 06:08:30 pve2 kernel: Hardware name: Supermicro Super Server/X12SDV-8C-SP6F, BIOS 1.3a 07/11/2023
Nov 20 06:08:30 pve2 kernel: RIP: 0010:migrate_folio_extra+0x87/0x90
Nov 20 06:08:30 pve2 kernel: Code: 31 ff 45 31 c0 c3 cc cc cc cc e8 54 e1 ff ff 44 89 e8 5b 41 5c 41 5d 41 5e 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 c3 cc cc cc cc <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90
Nov 20 06:08:30 pve2 kernel: RSP: 0018:ff57cb7f06dab768 EFLAGS: 00010282
Nov 20 06:08:30 pve2 kernel: RAX: 0017ffffc4008067 RBX: ffc2db578b1af200 RCX: 0000000000000002
Nov 20 06:08:30 pve2 kernel: RDX: ffc2db578b1af200 RSI: ffc2db578d183740 RDI: ff16271d8b653498
Nov 20 06:08:30 pve2 kernel: RBP: ff57cb7f06dab790 R08: 0000000000000000 R09: 0000000000000000
Nov 20 06:08:30 pve2 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ff16271d8b653498
Nov 20 06:08:30 pve2 kernel: R13: 0000000000000002 R14: ffc2db578d183740 R15: ff57cb7f06dab95c
Nov 20 06:08:30 pve2 kernel: FS:  00007f4eed9e0b80(0000) GS:ff16275c00080000(0000) knlGS:0000000000000000
Nov 20 06:08:30 pve2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 20 06:08:30 pve2 kernel: CR2: 00005645d3bb45f4 CR3: 0000000280fe2005 CR4: 0000000000771ee0
Nov 20 06:08:30 pve2 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 20 06:08:30 pve2 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 20 06:08:30 pve2 kernel: PKRU: 55555554
Nov 20 06:08:30 pve2 kernel: Call Trace:
Nov 20 06:08:30 pve2 kernel:  <TASK>
Nov 20 06:08:30 pve2 kernel:  ? show_regs+0x6d/0x80
Nov 20 06:08:30 pve2 kernel:  ? die+0x37/0xa0
Nov 20 06:08:30 pve2 kernel:  ? do_trap+0xd4/0xf0
Nov 20 06:08:30 pve2 kernel:  ? do_error_trap+0x71/0xb0
Nov 20 06:08:30 pve2 kernel:  ? migrate_folio_extra+0x87/0x90
Nov 20 06:08:30 pve2 kernel:  ? exc_invalid_op+0x52/0x80
Nov 20 06:08:30 pve2 kernel:  ? migrate_folio_extra+0x87/0x90
Nov 20 06:08:30 pve2 kernel:  ? asm_exc_invalid_op+0x1b/0x20
Nov 20 06:08:30 pve2 kernel:  ? migrate_folio_extra+0x87/0x90
Nov 20 06:08:30 pve2 kernel:  ? move_to_new_folio+0x146/0x160
Nov 20 06:08:30 pve2 kernel:  migrate_pages_batch+0x856/0xbc0
Nov 20 06:08:30 pve2 kernel:  ? __pfx_remove_migration_pte+0x10/0x10
Nov 20 06:08:30 pve2 kernel:  ? __pfx_alloc_migration_target+0x10/0x10
Nov 20 06:08:30 pve2 kernel:  migrate_pages+0xbb6/0xd60
Nov 20 06:08:30 pve2 kernel:  ? __pfx_alloc_migration_target+0x10/0x10
Nov 20 06:08:30 pve2 kernel:  __alloc_contig_migrate_range+0xaf/0x1d0
Nov 20 06:08:30 pve2 kernel:  alloc_contig_range+0x153/0x280
Nov 20 06:08:30 pve2 kernel:  ? sysvec_apic_timer_interrupt+0xa6/0xd0
Nov 20 06:08:30 pve2 kernel:  alloc_contig_pages+0x204/0x260
Nov 20 06:08:30 pve2 kernel:  alloc_fresh_hugetlb_folio+0x70/0x1a0
Nov 20 06:08:30 pve2 kernel:  alloc_pool_huge_page+0x81/0x120
Nov 20 06:08:30 pve2 kernel:  __nr_hugepages_store_common+0x211/0x4d0
Nov 20 06:08:30 pve2 kernel:  nr_hugepages_store+0x92/0xa0
Nov 20 06:08:30 pve2 kernel:  kobj_attr_store+0xf/0x40
Nov 20 06:08:30 pve2 kernel:  sysfs_kf_write+0x3b/0x60
Nov 20 06:08:30 pve2 kernel:  kernfs_fop_write_iter+0x130/0x210
Nov 20 06:08:30 pve2 kernel:  vfs_write+0x251/0x440
Nov 20 06:08:30 pve2 kernel:  ksys_write+0x73/0x100
Nov 20 06:08:30 pve2 kernel:  __x64_sys_write+0x19/0x30
Nov 20 06:08:30 pve2 kernel:  do_syscall_64+0x58/0x90
Nov 20 06:08:30 pve2 kernel:  ? handle_mm_fault+0xad/0x360
Nov 20 06:08:30 pve2 kernel:  ? exit_to_user_mode_prepare+0x39/0x190
Nov 20 06:08:30 pve2 kernel:  ? irqentry_exit_to_user_mode+0x17/0x20
Nov 20 06:08:30 pve2 kernel:  ? irqentry_exit+0x43/0x50
Nov 20 06:08:30 pve2 kernel:  ? exc_page_fault+0x94/0x1b0
Nov 20 06:08:30 pve2 kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Nov 20 06:08:30 pve2 kernel: RIP: 0033:0x7f4eedb16140
Nov 20 06:08:30 pve2 kernel: Code: 40 00 48 8b 15 c1 9c 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 24 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
Nov 20 06:08:30 pve2 kernel: RSP: 002b:00007ffe48954218 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
Nov 20 06:08:30 pve2 kernel: RAX: ffffffffffffffda RBX: 00005645d26ae2a0 RCX: 00007f4eedb16140
Nov 20 06:08:30 pve2 kernel: RDX: 0000000000000003 RSI: 00005645da0fe890 RDI: 0000000000000011
Nov 20 06:08:30 pve2 kernel: RBP: 00005645da0fe890 R08: 0000000000000000 R09: 00007f4eedbf0d10
Nov 20 06:08:30 pve2 kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
Nov 20 06:08:30 pve2 kernel: R13: 00005645d26ae2a0 R14: 0000000000000011 R15: 00005645da0f98e0
Nov 20 06:08:30 pve2 kernel:  </TASK>
Nov 20 06:08:30 pve2 kernel: Modules linked in: ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp kvm_intel ipmi_ssif kvm nouveau snd_hda_intel snd_intel_dspcfg crct10dif_pclmul snd_intel_sdw_acpi polyval_clmulni snd_hda_codec polyval_generic irdma mxm_wmi ghash_clmulni_intel drm_ttm_helper aesni_intel snd_hda_core ttm snd_hwdep crypto_simd i40e drm_display_helper cryptd snd_pcm cmdlinepart rapl cec ib_uverbs snd_timer dax_hmem rc_core ast cxl_acpi spi_nor intel_cstate snd video intel_th_gth drm_shmem_helper mei_me isst_if_mmio isst_if_mbox_pci ib_core soundcore cxl_core wmi mtd pcspkr intel_th_pci drm_kms_helper isst_if_common mei intel_th acpi_ipmi ioatdma intel_vsec ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter joydev
Nov 20 06:08:30 pve2 kernel:  input_leds mac_hid vhost_net vhost vhost_iotlb tap coretemp drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbmouse usbhid hid mpt3sas vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio ice iommufd xhci_pci sdhci_pci nvme raid_class xhci_pci_renesas crc32_pclmul scsi_transport_sas igb xhci_hcd cqhci spi_intel_pci gnss i2c_i801 nvme_core spi_intel sdhci i2c_smbus ahci i2c_algo_bit nvme_common i2c_ismt dca libahci pinctrl_cedarfork
Nov 20 06:08:30 pve2 kernel: ---[ end trace 0000000000000000 ]---
Nov 20 06:08:30 pve2 kernel: RIP: 0010:migrate_folio_extra+0x87/0x90
Nov 20 06:08:30 pve2 kernel: Code: 31 ff 45 31 c0 c3 cc cc cc cc e8 54 e1 ff ff 44 89 e8 5b 41 5c 41 5d 41 5e 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 c3 cc cc cc cc <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90
Nov 20 06:08:30 pve2 kernel: RSP: 0018:ff57cb7f06dab768 EFLAGS: 00010282
Nov 20 06:08:30 pve2 kernel: RAX: 0017ffffc4008067 RBX: ffc2db578b1af200 RCX: 0000000000000002
Nov 20 06:08:30 pve2 kernel: RDX: ffc2db578b1af200 RSI: ffc2db578d183740 RDI: ff16271d8b653498
Nov 20 06:08:30 pve2 kernel: RBP: ff57cb7f06dab790 R08: 0000000000000000 R09: 0000000000000000
Nov 20 06:08:30 pve2 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ff16271d8b653498
Nov 20 06:08:30 pve2 kernel: R13: 0000000000000002 R14: ffc2db578d183740 R15: ff57cb7f06dab95c
Nov 20 06:08:30 pve2 kernel: FS:  00007f4eed9e0b80(0000) GS:ff16275c00080000(0000) knlGS:0000000000000000
Nov 20 06:08:30 pve2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 20 06:08:30 pve2 kernel: CR2: 00005645d3bb45f4 CR3: 0000000280fe2005 CR4: 0000000000771ee0
Nov 20 06:08:30 pve2 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 20 06:08:30 pve2 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 20 06:08:30 pve2 kernel: PKRU: 55555554
Nov 20 06:08:30 pve2 pvedaemon[1813]: <root@pam> end task UPID:pve2:00000C94:000054EE:655AE9CD:qmstart:201:root@pam: unable to read tail (got 0 bytes)

Bash:

agent: 0
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 8
cpu: host
efidisk0: local-zfs:vm-201-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:16:00,pcie=1
hugepages: 1024
ide2: none,media=cdrom
machine: q35
memory: 131072
meta: creation-qemu=7.0.0,ctime=1665967836
name: TrueNAS
net0: virtio=[...],bridge=vmbr0
numa: 1
ostype: other
scsi0: local-zfs:vm-201-disk-1,discard=on,iothread=1,size=16G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=[...]
sockets: 1
startup: order=2,up=120
vmgenid: [...]

Bash:

16:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx [1000:00e6]
        Subsystem: Broadcom / LSI 9500-16i Tri-Mode HBA [1000:4050]
        Kernel driver in use: vfio-pci
        Kernel modules: mpt3sas

Bash:

softdep mpt3sas pre: vfio-pci
softdep nouveau pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci
options vfio-pci ids=1000:00e6,10de:1fb0,10de:10fa

Bash:

proxmox-ve: 8.0.2 (running kernel: 6.5.11-3-pve)
pve-manager: 8.0.9 (running version: 8.0.9/fd1a0ae1b385cdcd)
proxmox-kernel-helper: 8.0.5
proxmox-kernel-6.5: 6.5.11-3
proxmox-kernel-6.5.11-3-pve: 6.5.11-3
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
proxmox-kernel-6.2: 6.2.16-19
ceph-fuse: 18.2.0-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx6
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.6
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.10
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.5
proxmox-mail-forward: 0.2.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.1.1
pve-cluster: 8.0.5
pve-container: 5.0.5
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.0.7
pve-qemu-kvm: 8.1.2-2
pve-xtermjs: 5.3.0-2
qemu-server: 8.0.8
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.0-pve3

Neobin · Nov 20, 2023

Sorry for posting again, but wanted to provide the info for the other VM too; in case it helps:

Bash:

Nov 20 05:49:31 pve2 pve-guests[1806]: start VM 202: UPID:pve2:0000070E:00000816:655AE55B:qmstart:202:root@pam:
Nov 20 05:49:31 pve2 pve-guests[1805]: <root@pam> starting task UPID:pve2:0000070E:00000816:655AE55B:qmstart:202:root@pam:
Nov 20 05:49:31 pve2 kernel: kvm[1807]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
Nov 20 05:49:32 pve2 kernel: ------------[ cut here ]------------
Nov 20 05:49:32 pve2 kernel: kernel BUG at mm/migrate.c:654!
Nov 20 05:49:32 pve2 kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Nov 20 05:49:32 pve2 kernel: CPU: 4 PID: 1806 Comm: task UPID:pve2: Tainted: P           O       6.5.11-3-pve #1
Nov 20 05:49:32 pve2 kernel: Hardware name: Supermicro Super Server/X12SDV-8C-SP6F, BIOS 1.3a 07/11/2023
Nov 20 05:49:32 pve2 kernel: RIP: 0010:migrate_folio_extra+0x87/0x90
Nov 20 05:49:32 pve2 kernel: Code: 31 ff 45 31 c0 c3 cc cc cc cc e8 54 e1 ff ff 44 89 e8 5b 41 5c 41 5d 41 5e 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 c3 cc cc cc cc <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90
Nov 20 05:49:32 pve2 kernel: RSP: 0018:ff4a1a51a070f7d8 EFLAGS: 00010282
Nov 20 05:49:32 pve2 kernel: RAX: 0017ffffc0008025 RBX: ff9a04388b015e40 RCX: 0000000000000002
Nov 20 05:49:32 pve2 kernel: RDX: ff9a04388b015e40 RSI: ff9a04388a69f680 RDI: ff2f9f333b25e5c0
Nov 20 05:49:32 pve2 kernel: RBP: ff4a1a51a070f800 R08: 0000000000000000 R09: 0000000000000000
Nov 20 05:49:32 pve2 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ff2f9f333b25e5c0
Nov 20 05:49:32 pve2 kernel: R13: 0000000000000002 R14: ff9a04388a69f680 R15: ff4a1a51a070f9cc
Nov 20 05:49:32 pve2 kernel: FS:  00007fe5db4eeb80(0000) GS:ff2f9f717ff00000(0000) knlGS:0000000000000000
Nov 20 05:49:32 pve2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 20 05:49:32 pve2 kernel: CR2: 000055c6329cf300 CR3: 000000028abce005 CR4: 0000000000771ee0
Nov 20 05:49:32 pve2 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 20 05:49:32 pve2 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 20 05:49:32 pve2 kernel: PKRU: 55555554
Nov 20 05:49:32 pve2 kernel: Call Trace:
Nov 20 05:49:32 pve2 kernel:  <TASK>
Nov 20 05:49:32 pve2 kernel:  ? show_regs+0x6d/0x80
Nov 20 05:49:32 pve2 kernel:  ? die+0x37/0xa0
Nov 20 05:49:32 pve2 kernel:  ? do_trap+0xd4/0xf0
Nov 20 05:49:32 pve2 kernel:  ? do_error_trap+0x71/0xb0
Nov 20 05:49:32 pve2 kernel:  ? migrate_folio_extra+0x87/0x90
Nov 20 05:49:32 pve2 kernel:  ? exc_invalid_op+0x52/0x80
Nov 20 05:49:32 pve2 kernel:  ? migrate_folio_extra+0x87/0x90
Nov 20 05:49:32 pve2 kernel:  ? asm_exc_invalid_op+0x1b/0x20
Nov 20 05:49:32 pve2 kernel:  ? migrate_folio_extra+0x87/0x90
Nov 20 05:49:32 pve2 kernel:  ? move_to_new_folio+0x146/0x160
Nov 20 05:49:32 pve2 kernel:  migrate_pages_batch+0x856/0xbc0
Nov 20 05:49:32 pve2 kernel:  ? __pfx_remove_migration_pte+0x10/0x10
Nov 20 05:49:32 pve2 kernel:  ? __pfx_alloc_migration_target+0x10/0x10
Nov 20 05:49:32 pve2 kernel:  migrate_pages+0xbb6/0xd60
Nov 20 05:49:32 pve2 kernel:  ? __pfx_alloc_migration_target+0x10/0x10
Nov 20 05:49:32 pve2 kernel:  __alloc_contig_migrate_range+0xaf/0x1d0
Nov 20 05:49:32 pve2 kernel:  alloc_contig_range+0x153/0x280
Nov 20 05:49:32 pve2 kernel:  ? sysvec_apic_timer_interrupt+0xa6/0xd0
Nov 20 05:49:32 pve2 kernel:  alloc_contig_pages+0x204/0x260
Nov 20 05:49:32 pve2 kernel:  alloc_fresh_hugetlb_folio+0x70/0x1a0
Nov 20 05:49:32 pve2 kernel:  alloc_pool_huge_page+0x81/0x120
Nov 20 05:49:32 pve2 kernel:  __nr_hugepages_store_common+0x211/0x4d0
Nov 20 05:49:32 pve2 kernel:  nr_hugepages_store+0x92/0xa0
Nov 20 05:49:32 pve2 kernel:  kobj_attr_store+0xf/0x40
Nov 20 05:49:32 pve2 kernel:  sysfs_kf_write+0x3b/0x60
Nov 20 05:49:32 pve2 kernel:  kernfs_fop_write_iter+0x130/0x210
Nov 20 05:49:32 pve2 kernel:  vfs_write+0x251/0x440
Nov 20 05:49:32 pve2 kernel:  ksys_write+0x73/0x100
Nov 20 05:49:32 pve2 kernel:  __x64_sys_write+0x19/0x30
Nov 20 05:49:32 pve2 kernel:  do_syscall_64+0x58/0x90
Nov 20 05:49:32 pve2 kernel:  ? irqentry_exit+0x43/0x50
Nov 20 05:49:32 pve2 kernel:  ? exc_page_fault+0x94/0x1b0
Nov 20 05:49:32 pve2 kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Nov 20 05:49:32 pve2 kernel: RIP: 0033:0x7fe5db624140
Nov 20 05:49:32 pve2 kernel: Code: 40 00 48 8b 15 c1 9c 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 24 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
Nov 20 05:49:32 pve2 kernel: RSP: 002b:00007fffc59ed018 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
Nov 20 05:49:32 pve2 kernel: RAX: ffffffffffffffda RBX: 000055c62da552a0 RCX: 00007fe5db624140
Nov 20 05:49:32 pve2 kernel: RDX: 0000000000000001 RSI: 000055c634d59bf0 RDI: 0000000000000010
Nov 20 05:49:32 pve2 kernel: RBP: 000055c634d59bf0 R08: 0000000000000000 R09: 000000000000010f
Nov 20 05:49:32 pve2 kernel: R10: 03a94646a4066c37 R11: 0000000000000202 R12: 0000000000000001
Nov 20 05:49:32 pve2 kernel: R13: 000055c62da552a0 R14: 0000000000000010 R15: 000055c634d52c20
Nov 20 05:49:32 pve2 kernel:  </TASK>
Nov 20 05:49:32 pve2 kernel: Modules linked in: ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp kvm_intel ipmi_ssif kvm nouveau snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi irdma crct10dif_pclmul polyval_clmulni snd_hda_codec mxm_wmi drm_ttm_helper polyval_generic ttm ghash_clmulni_intel aesni_intel i40e snd_hda_core crypto_simd drm_display_helper cryptd snd_hwdep snd_pcm cec ib_uverbs cmdlinepart dax_hmem rc_core ast rapl snd_timer cxl_acpi video drm_shmem_helper intel_th_gth spi_nor intel_cstate snd cxl_core isst_if_mmio isst_if_mbox_pci ib_core pcspkr drm_kms_helper wmi mei_me soundcore mtd acpi_ipmi intel_th_pci isst_if_common mei intel_th ipmi_si intel_vsec ipmi_devintf ipmi_msghandler acpi_pad joydev input_leds ioatdma
Nov 20 05:49:32 pve2 kernel:  acpi_power_meter mac_hid vhost_net vhost vhost_iotlb tap coretemp drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbmouse usbhid hid mpt3sas vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio xhci_pci sdhci_pci iommufd xhci_pci_renesas nvme ice crc32_pclmul igb raid_class cqhci i2c_i801 xhci_hcd nvme_core gnss scsi_transport_sas spi_intel_pci sdhci i2c_smbus ahci i2c_algo_bit spi_intel nvme_common i2c_ismt dca libahci pinctrl_cedarfork
Nov 20 05:49:32 pve2 kernel: ---[ end trace 0000000000000000 ]---
Nov 20 05:49:32 pve2 kernel: RIP: 0010:migrate_folio_extra+0x87/0x90
Nov 20 05:49:32 pve2 kernel: Code: 31 ff 45 31 c0 c3 cc cc cc cc e8 54 e1 ff ff 44 89 e8 5b 41 5c 41 5d 41 5e 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 c3 cc cc cc cc <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90
Nov 20 05:49:32 pve2 kernel: RSP: 0018:ff4a1a51a070f7d8 EFLAGS: 00010282
Nov 20 05:49:32 pve2 kernel: RAX: 0017ffffc0008025 RBX: ff9a04388b015e40 RCX: 0000000000000002
Nov 20 05:49:32 pve2 kernel: RDX: ff9a04388b015e40 RSI: ff9a04388a69f680 RDI: ff2f9f333b25e5c0
Nov 20 05:49:32 pve2 kernel: RBP: ff4a1a51a070f800 R08: 0000000000000000 R09: 0000000000000000
Nov 20 05:49:32 pve2 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ff2f9f333b25e5c0
Nov 20 05:49:32 pve2 kernel: R13: 0000000000000002 R14: ff9a04388a69f680 R15: ff4a1a51a070f9cc
Nov 20 05:49:32 pve2 kernel: FS:  00007fe5db4eeb80(0000) GS:ff2f9f717ff00000(0000) knlGS:0000000000000000
Nov 20 05:49:32 pve2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 20 05:49:32 pve2 kernel: CR2: 000055c6329cf300 CR3: 000000028abce005 CR4: 0000000000771ee0
Nov 20 05:49:32 pve2 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 20 05:49:32 pve2 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 20 05:49:32 pve2 kernel: PKRU: 55555554
Nov 20 05:49:32 pve2 pvesh[1804]: Starting VM 202 failed: unable to read tail (got 0 bytes)

Bash:

agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 8
cpu: host
efidisk0: local-zfs:vm-202-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:15:00,pcie=1
hugepages: 1024
ide2: none,media=cdrom
machine: q35
memory: 8192
meta: creation-qemu=8.0.2,ctime=1689743256
name: Jellyfin
net0: virtio=[...],bridge=vmbr0
numa: 1
ostype: l26
scsi0: local-zfs:vm-202-disk-1,discard=on,iothread=1,size=64G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=[...]
sockets: 1
vmgenid: [...]

Bash:

15:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU117GLM [Quadro T1000 Mobile] [10de:1fb0] (rev a1)
        Subsystem: NVIDIA Corporation TU117GLM [Quadro T1000 Mobile] [10de:12db]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau
15:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)
        Subsystem: NVIDIA Corporation Device [10de:12db]
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

dcsapak · Nov 20, 2023

Neobin said:
Both of my VMs with PCIe-passthrough (which run perfectly fine on all the 6.2 kernels) can not start anymore on the 6.5 kernel.

can you maybe open a new thread, i'll see if i can reproduce it here

EDIT:

generally passthrough works here (tested on a consumer amd board, and a older intel server mainboard)

on a hunch: the error message looks more memory than passthrough related, could you maybe try without hugepages ?

Neobin · Nov 20, 2023

dcsapak said:
can you maybe open a new thread, i'll see if i can reproduce it here

EDIT:

generally passthrough works here (tested on a consumer amd board, and a older intel server mainboard)

on a hunch: the error message looks more memory than passthrough related, could you maybe try without hugepages ?

Created the thread:
https://forum.proxmox.com/threads/v...do-not-start-anymore-on-pve-kernel-6-5.136741

Will test later and give feedback in the separate thread.
Thank you so far.

toe · Nov 20, 2023

A bit late in reply but some feedback - been running this for a couple of weeks on two machines. No problems noticed whatsoever.

Machines are
1x AMD EPYC 7282 and 1x AMD Ryzen 3600x

Storage is NVMe ZFS for root partitions with the Epyc running a storage cluster on platter drives (ZFS).

Cheers for the efforts!

Opt-in Linux 6.5 Kernel with ZFS 2.2 for Proxmox VE 8 available on test & no-subscription

Proxmox Staff Member

New Member

Member

Member

Proxmox Staff Member

Proxmox Staff Member

Member

Renowned Member

Member

Famous Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Member

Renowned Member

Distinguished Member

Distinguished Member

Proxmox Staff Member

Distinguished Member

Member

We value your privacy