Opt-in Linux 6.14 Kernel for Proxmox VE 8 available on test & no-subscription

flo3393 · Apr 4, 2025

So did i understand right when 6.14 is not having problems next upcoming weeks 6.11 will be replaced by 6.14 and i have to opt in to 6,14?

Stoiko Ivanov · Apr 4, 2025

sash said:
proxmox-kernel-6.11 running without issues, proxmox-kernel-6.14 is not able to boot a VM with passed through Controller (TrueNAS).

If i can provide logs, please tell me what you need, so i can help finding this possible bug.

the complete journal of a failing boot (`journalctl -b` after you tried to start the VM and it fails) would help for beginning to debug this.

t.lamprecht · Apr 4, 2025

flo3393 said:
So did i understand right when 6.14 is not having problems next upcoming weeks 6.11 will be replaced by 6.14 and i have to opt in to 6,14?

Yes, to continue getting updates you will need to move from 6.11 to 6.14 sooner or later.
While we might release another 6.11 update, there is no active plan for that currently.

And for the record, the 6.8 kernel will continue to be supported for the lifetime of PVE 8, so if one uses that and sees no need for a newer kernel they can continue to use it just fine.

flo3393 · Apr 4, 2025

t.lamprecht said:
Yes, to continue getting updates you will need to move from 6.11 to 6.14 sooner or later.
While we might release another 6.11 update, there is no active plan for that currently.

And for the record, the 6.8 kernel will continue to be supported for the lifetime of PVE 8, so if one uses that and sees no need for a newer kernel they can continue to use it just fine.

Thanks! If an update of the kernel is available it will be posted within this thread?

sash · Apr 4, 2025

Stoiko Ivanov said:
the complete journal of a failing boot (`journalctl -b` after you tried to start the VM and it fails) would help for beginning to debug this.

Sorry, in the mean time i've found a workaroud.

My Controller was only passed through in the VM by "RAW Device" - every Option is 1:1 working with 6.8. and 6.11 - now i've created a mapped device and pass this mapped Device through to the VM the VM starts normally.

SakuraChan00 · Apr 5, 2025

zenowl77 said:
i will keep looking, but so far it appears that similarly to 6.8 changes in vfio (and likely other files) have broken compatibility, so we need a patch for 16.9/17.X to restore it, so far on the PolloLoco - NVIDIA vGPU Guide page there is no discussion for it and i do not see anything from the producer of the 6.8 patch here: GreenDam

i will keep searching, hopefully they release something soon.

latest driver supports 6.11 for 16.9/17.4, no plans that I know of patching for 6.14 maybe the new 18.x branch supports it however geting around their changes for stuff i smell not, you'd need to look at the nvidia site for release notes, they tend to only support ubuntu LTS kernels

stanthewizzard2025 · Apr 5, 2025

sash said:
Sorry, in the mean time i've found a workaroud.

My Controller was only passed through in the VM by "RAW Device" - every Option is 1:1 working with 6.8. and 6.11 - now i've created a mapped device and pass this mapped Device through to the VM the VM starts normally.

I will switch from esxi to proxmox for a true (with esxi passthrough of LSI SAS without issues).
How do you passtrough your sas to true as with proxmox ?
Thank for advises

danmcq · Apr 5, 2025

I have an Epyc Genoa (Epyc 9554) system that won't boot with the 6.14 kernel. The system is only a few months old, and has run without issue on both the 6.8 and 6.11 kernels. Looking through the journalctl -b logs after a failed boot, I see this:

Code:

Apr 05 00:03:06 proxmox-host kernel: BERT: Error records from previous boot:
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]: event severity: fatal
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:  Error 0, type: fatal
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:  fru_text: ProcessorError
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:   section_type: IA32/X64 processor error
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:   Local APIC_ID: 0x1c
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:   CPUID Info:
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:   00000000: 00a10f11 00000000 1c800800 00000000
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:   00000010: 76fa320b 00000000 178bfbff 00000000
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:   00000020: 00000000 00000000 00000000 00000000
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:   Error Information Structure 0:
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:    Error Structure Type: cache error
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:    Check Information: 0x000000000602001f
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:     Transaction Type: 2, Generic
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:     Operation: 0, generic error
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:     Level: 0
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:     Processor Context Corrupt: true
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:     Uncorrected: true
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:   Context Information Structure 0:
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:    Register Context Type: MSR Registers (Machine Check and other MSRs)
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:    Register Array Size: 0x0050
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:    MSR Address: 0xc0002051
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:   Context Information Structure 1:
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:    Register Context Type: Unclassified Data
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:    Register Array Size: 0x0030
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:    Register Array:
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:    00000000: 00000010 00000000 1c3010c0 fffffffe
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:    00000010: 00000011 00000000 cb300024 00000000
Apr 05 00:03:06 proxmox-host kernel: [Hardware Error]:    00000020: 00000017 00000000 cb300024 00000000
Apr 05 00:03:06 proxmox-host kernel: BERT: Total records found: 1
Apr 05 00:03:06 proxmox-host kernel: mce: [Hardware Error]: Machine check events logged
Apr 05 00:03:06 proxmox-host kernel: PM:   Magic number: 5:744:8
Apr 05 00:03:06 proxmox-host kernel: mce: [Hardware Error]: CPU 54: Machine Check: 0 Bank 5: aea0000000000108
Apr 05 00:03:06 proxmox-host kernel: clockevents clockevent82: hash matches
Apr 05 00:03:06 proxmox-host kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffffc04dc0ee MISC d0140ff600000000 PPIN 2b0c17e59888012 SYND 4d000000 IPID 500>
Apr 05 00:03:06 proxmox-host kernel: memory memory52: hash matches
Apr 05 00:03:06 proxmox-host kernel: mce: [Hardware Error]: PROCESSOR 2:a10f11 TIME 1743825773 SOCKET 0 APIC 1c microcode a101148
Apr 05 00:03:06 proxmox-host kernel: RAS: Correctable Errors collector initialized.

I'm skeptical about this being a hardware issue with the CPU. After I saw these errors, I ran stress-ng while booted on the 6.11 kernel for 6+ hours on all cores without any problems. Here are some additional stats about my system that may be relevant:

Proxmox VE 8.3.5
Mirroed ZFS root on 2x SAMSUNG MZ7LH960
CPU: AMD EPYC 9554
Motherboard: ASRock Rack GENOAD8X-2T/BCM
Memory: 8x64GB MICRON DDR5 RDIMM
PCIe: LSI 9300-8I SAS3008, 2x RTX 4090

GreenDamTan · Apr 5, 2025

zenowl77 said:
i will keep looking, but so far it appears that similarly to 6.8 changes in vfio (and likely other files) have broken compatibility, so we need a patch for 16.9/17.X to restore it, so far on the PolloLoco - NVIDIA vGPU Guide page there is no discussion for it and i do not see anything from the producer of the 6.8 patch here: GreenDam

i will keep searching, hopefully they release something soon.

i made a patch for kernel 6.14.0-1-pve
https://gitlab.com/GreenDamTan/vgpu....12_also_6.14/550.144.02.patch?ref_type=heads
it seems work now
if it can work in other proxmox server i will make pull request to PolloLoco repo

Jahara · Apr 5, 2025

I'm excited for the AMDXDNA Risen NPU driver that is in 6.14. I'll be testing later if the necessary firmware was also updated to support it.

t.lamprecht · Apr 5, 2025

danmcq said:
BERT: Error records from previous boot:

BERT is an ACPI table and the kernel only reads it.
That said, the newer kernel could use instructions or code paths that trigger issues that were not present on the older kernel, so it might correlate.

I'd check for firmware updates and potentially talk with your system vendors, it might not be something that gets triggered by pure load like stress-ng, but more subtle, if its indeed the hardware.

FWIW, I got a EPYC 9475F Turin based test system here that has no BERT records triggered by booting 6.14, while it's a different CPU generation and so definitively not 1:1 comparable, it's at least not something that happens on recent EPYC generations in general.

danmcq · Apr 5, 2025

t.lamprecht said:
BERT is an ACPI table and the kernel only reads it.
That said, the newer kernel could use instructions or code paths that trigger issues that were not present on the older kernel, so it might correlate.

I'd check for firmware updates and potentially talk with your system vendors, it might not be something that gets triggered by pure load like stress-ng, but more subtle, if its indeed the hardware.

FWIW, I got a EPYC 9475F Turin based test system here that has no BERT records triggered by booting 6.14, while it's a different CPU generation and so definitively not 1:1 comparable, it's at least not something that happens on recent EPYC generations in general.

I appreciate the added context. I was able to successfully boot with 6.14 using the mce=off kernel command line param, but this isn't ideal so I've reverted to 6.11 for now. I'll reach out to ASRock Rack this week and see if there might be a newer bios update available that isn't listed on their website.

GreenDamTan · Apr 6, 2025

GreenDamTan said:
i made a patch for kernel 6.14.0-1-pve
https://gitlab.com/GreenDamTan/vgpu....12_also_6.14/550.144.02.patch?ref_type=heads
it seems work now
if it can work in other proxmox server i will make pull request to PolloLoco repo
View attachment 84531

new branch for 17.5 and 16.9
https://gitlab.com/GreenDamTan/vgpu-proxmox/-/tree/17.5_16.9_kernel_6.12_and_6.14?ref_type=heads

listhor · Apr 6, 2025

I have upgraded 2 hosts (Xeon E-2278G and N100) without graphic cards; and after 2 days all is ok. I have there Trunas (mapped AHCI passthrough), OPNsense and various Linux distributions.

SakuraChan00 · Apr 6, 2025

GreenDamTan said:
new branch for 17.5 and 16.9
https://gitlab.com/GreenDamTan/vgpu-proxmox/-/tree/17.5_16.9_kernel_6.12_and_6.14?ref_type=heads
View attachment 84566

thanks for your prompt work congrats and it only took a poke

jauling · Apr 6, 2025

6.14.0-1-pve is reporting some kind of ECC errors (I'm not running ECC memory with N100 lol). My dmesg is completely filled with these errors:

Code:

# dmesg | grep igen6
[    3.775370] caller igen6_probe+0x1bc/0x8e0 [igen6_edac] mapping multiple BARs
[    3.775412] EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (POLLED)
[    3.775430] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    3.775431] EDAC igen6 MC0: ADDR 0x7fffffffe0
[    3.775469] EDAC igen6: v2.5.1
[    4.824379] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    4.824382] EDAC igen6 MC0: ADDR 0x7fffffffe0
[    5.852327] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    5.852335] EDAC igen6 MC0: ADDR 0x7fffffffe0
[    6.872532] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    6.872536] EDAC igen6 MC0: ADDR 0x7fffffffe0
[    7.896351] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    7.896355] EDAC igen6 MC0: ADDR 0x7fffffffe0
[    8.920317] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    8.920320] EDAC igen6 MC0: ADDR 0x7fffffffe0

I've blacklist the igen6_edac module, but still with this 6.14 kernel my idle CPU frequency is still higher than with 6.8, so I've reverted.

Not sure about anyone else, but when running 6.8.12-9-pve, the igen6_edac module also loads, but the only dmesg entries are these:

Code:

# dmesg | grep igen6
[    3.455293] caller igen6_probe+0x193/0x8b0 [igen6_edac] mapping multiple BARs
[    3.455337] EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
[    3.455389] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
[    3.455391] EDAC igen6 MC0: ADDR 0x7fffffffe0
[    3.455431] EDAC igen6: v2.5.1

t.lamprecht · Apr 6, 2025

jauling said:
6.14.0-1-pve is reporting some kind of ECC errors (I'm not running ECC memory with N100 lol). My dmesg is completely filled with these errors:

DDR5 has built in basic EEC and Intel N100 had an issue where memory error events were not correctly handled, this has been fixed with a commit that landed in Linux 6.13, and thus it can indeed happen that 6.14 will show these messages while older kernel do not.

This does not necessarily mean that the HW must be broken, the reporting of it could be broken too – some vendors sell Intel N100 based systems with memory installed already, that memory is often some noname cheap stuff, might even work fine but could have other issues. And, there really could be a problem with your HW, e.g. the memory DIMMs or the motherboard or the CPU and if that's the case then disabling the EDAC module is only silencing the messenger and symptoms, not really fixing anything.

FWIW, I got an Intel N100 based system running PVE, and it does not show these errors when booting our 6.14 kernel, I bought memory separately and chose a reputable vendor, so at least it's not a general issue to Intel N100 based systems.

jauling · Apr 6, 2025

I also bought my DDR5 Corsair Vengeance SODIMM seperately, albeit second hand. I ran it through its paces with memtest86 and it gave me no issues. I suppose I can factory reset my BIOS and see what happens. Thanks for sharing, a bit annoyed now that it might be a hardware issue. I was looking for an excuse to upgrade to 32G from 16G though, so silverlining I guess.

gfngfn256 · Apr 6, 2025

jauling said:
excuse to upgrade to 32G from 16G

Intel N100 does not natively support more than 16GB - as shown here officially. Though possibly YMMV.

jauling · Apr 6, 2025

gfngfn256 said:
Intel N100 does not natively support more than 16GB - as shown here officially. Though possibly YMMV.

Yep, I know. I've seen some guys throw 48G SODIMMs at this same motherboard I have. Curious, does your N100, assuming you're running Linux, show correct manufacturer info when you run dmidecode --type memory? Mine doesn't. Though I see the part number is believable. CMSX16GX5M1A4800C40

Opt-in Linux 6.14 Kernel for Proxmox VE 8 available on test & no-subscription

Member

Proxmox Staff Member

Proxmox Staff Member

Member

Member

Member

Member

New Member

New Member

Attachments

New Member

Proxmox Staff Member

New Member

New Member

Member

Member

New Member

Proxmox Staff Member

New Member

Distinguished Member

New Member

We value your privacy