Opt-in Linux 6.17 Kernel for Proxmox VE 9 available on test & no-subscription

on 6.17.2-1-pve My system looks to boot normally but I see disk errors in dmesg and the system never seems to come online for network traffic(at least, not for proxmox. Ping works fine as does ssh, although login is generally not working due to random disk errors), and randomly allows login via console, although most of the time it refuses login. rebooting back into 6.14.11-4-pve the system boots normally, shows no disk errors, and goes right back into the ceph cluster as if no hardware issues existed. Of note, I also am using dell BOSS cards on my systems as was reported earlier in this thread. I suspect there is something in the kernel that is not playing nice with that storage controller.


Edit:

I updated one of my other nodes to test. It is a Dell C6420 with the same BOSS S1 card as the other node. It booted up on the newer kernel just fine. No errors or issues. The server that I had issues with is a R640. The R640 has Intel ssds in the BOSS card and the c6420 has sk hynix drives. Not sure where the issue lies at this point as I noticed the dmesg errors I was seeing were for other ssds connected to the HBA330. The c6420 uses an S140 controller.
 
Last edited:
Anyone tried this on Poweredge R660 servers and verified this works on this platform? Secure boot and UEFI enabled on our environment.
 
Im also seeing the issue on r630's with legacy bios enabled where using kernel vmlinuz-6.17.2-1-pve the host just boot loops. A colleague of mine said after changing the grub config to get some more output and running "update-grub" and "proxmox-boot-tool refresh" the server was able to boot 6.17. This is happening on all of our r630's (about 60). Will be continuing to look into this issue.
hm - just for the record - my system which did not run into issues is booting with UEFI (from the error-messages reported above it seems that other who ran into issues also use UEFI (but a confirmation would be appreciated).
how and where does your host boot-loop? - do you also get the message to check the SEL (as described above with screenshots)?


Today, I performed the update via the Proxmox web interface and then rebooted the server. After the reboot, Proxmox failed to boot. This concerns a Dell R730xd server. I had to boot into kernel 6.14.11-4-pve and pin the old kernel using the following command: proxmox-boot-tool kernel pin 6.14.11-4-pve
Is this issue already known?

The image shows the console shortly before Proxmox crashes:
regarding the image - could you try removing 'quiet' from the end of the kernel command line (hitting 'e' in the boot-loader and looking for the line which contains quiet) - see https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline for a bit more background) - maybe the kernel messages show a bit more what's going on?

Thanks!
 
hm - just for the record - my system which did not run into issues is booting with UEFI (from the error-messages reported above it seems that other who ran into issues also use UEFI (but a confirmation would be appreciated).
how and where does your host boot-loop? - do you also get the message to check the SEL (as described above with screenshots)?



regarding the image - could you try removing 'quiet' from the end of the kernel command line (hitting 'e' in the boot-loader and looking for the line which contains quiet) - see https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline for a bit more background) - maybe the kernel messages show a bit more what's going on?

Thanks!

Not using uefi, legacy bios. Yes, I see the same boot errors as other users have stated. Here is the SEL log and boot error. Happening on multiple hosts. I get past the grub boot menu and get a black screen for about 10 seconds then the host reboots until I choose a different kernel.
1763568533429.png
1763568725975.png
 
Not using uefi, legacy bios. Yes, I see the same boot errors as other users have stated. Here is the SEL log and boot error. Happening on multiple hosts. I get past the grub boot menu and get a black screen for about 10 seconds then the host reboots until I choose a different kernel.

any chance you see something with quiet removed from the kernel command line?

else - the system I managed to boot has a newer bios version (this might well play a role here) - see this thread from the dell forum (quite old):
https://www.dell.com/community/en/c...ont-lcd-panel/647f6e8ff4ccf8a8debe938a?page=1
 
So on the reboot i enabled SR-IOV Global and I/OAT DMA and it booted up.
hmm - that could very well also be the cause - at least from memory and grepping dmesg SR-IOV seems enabled on my test-machine - not sure/did not check i/O AT DMA.
 
Im also seeing the issue on r630's with legacy bios enabled where using kernel vmlinuz-6.17.2-1-pve the host just boot loops. A colleague of mine said after changing the grub config to get some more output and running "update-grub" and "proxmox-boot-tool refresh" the server was able to boot 6.17. This is happening on all of our r630's (about 60). Will be continuing to look into this issue.
do u have any r660's running proxmox that you upgraded from 8.4 to 9?
 
Our lab runs R730xd's so I'll check this in the morning. I don't believe we have SR-IOV enabled on those but will check.
 
I upgraded my Dell R730 dual-socket box using UEFI mode and had no issue. My "I/OAT DMA Engine" is disabled (default setting), and "SR-IOV Global Enable" is also disabled (default setting). BIOS 2.19.0. firmware 2.86.86.86. I upgraded from the previous PVE 9 version. I didn't watch the console but it came back OK after the reboot. Let me know if there's anything you want me to check.