Issues after upgrading to 6.17.4-1-pve

I see there was a new kernel release… don’t think the changes will make a difference but i will give it a test in a bit and report back.
 
  • Like
Reactions: akesp
Still having the same issue with the newer kernel. First it takes forever to import my ZFS pools etc unlike 17.2 then I get a hard lockup on one of the CPU’s with a complaint about a nvme module.. what follows is the error log.. see pictures taken of the boot screen from my phone

For now I just booted back to the pined 6.17.2-2 kernel and everything is happy again.
 

Attachments

  • IMG_3825_preview.jpeg
    IMG_3825_preview.jpeg
    719.3 KB · Views: 13
Last edited:
  • Like
Reactions: akesp
If you are using an Intel-based system with an Intel-VMD (Volume Management Device) - try disabling it in the BIOS of your Mainboard / System. I have two Socket 1700 systems (one "production", one "playground") with a B760 chipset that were displaying the same error with Kernel 6.17.4.x . Disabling the VMD on my playground system eliminated the problems I had.

So I suppose the 6.17.4 has a problem with these Intel VMDs and disabling it fixes the boot issue.

But beware: I did NOT test whether all my storage devices were working as before after I turned VMD off. If you have a setup where this VMD is needed (Pseudo-Hardware-RAID ?), you might not get out of this kerfuffle this way. More Information about VMD here:
https://www.intel.com/content/www/u...anagement-device-intel-vmd-product-brief.html

EDIT: Another warning: I know from firsthand experience, that turning the VMD off will corrupt any Windows 11 installations on that machine. My playground machine had a drive with windows on it and I had to reinstall it after I messed around with the VMD.

EDIT2: Warning network devices might get shuffled around! When you are disabling the VMD, your network devices might get renamed. This happened to me on my "production" system. The network devices were enp2s0 and enp3s0 before, and enp4s0 enp5s0 after the disabling. If that happens, log into your proxmox server via local console, check your new interface names via "ip a", then go to "/etc/network" and edit the file "interfaces" or your files under "interfaces.d". After another reboot everything should be fine again.
 
Last edited:
If you are using an Intel-based system with an Intel-VMD (Volume Management Device) - try disabling it in the BIOS of your Mainboard / System. I have two Socket 1700 systems (one "production", one "playground") with a B760 chipset that were displaying the same error with Kernel 6.17.4.x . Disabling the VMD on my playground system eliminated the problems I had.

So I suppose the 6.17.4 has a problem with these Intel VMDs and disabling it fixes the boot issue.

But beware: I did NOT test whether all my storage devices were working as before after I turned VMD off. If you have a setup where this VMD is needed (Pseudo-Hardware-RAID ?), you might not get out of this kerfuffle this way. More Information about VMD here:
https://www.intel.com/content/www/u...anagement-device-intel-vmd-product-brief.html

EDIT: Another warning: I know from firsthand experience, that turning the VMD off will corrupt any Windows 11 installations on that machine. My playground machine had a drive with windows on it and I had to reinstall it after I messed around with the VMD.
Thank you!!! This does seem to resolve my boot issue.. Was able to complete a boot and a reboot.. Still takes forever now but I am willing to live with that as this system never reboots unless I am doing updates.. Not going to unpin the other kernel just yet and let this run for a few days with load and see how it goes..

As for my corruption, nope no issue..
 
Yeah I didn’t personally get that but would not have been an issue as it would not have been the first time my network bonds got messed up lol..
 
I have been dealing with serious problems for 3 days after upgrading to Proxmox VE 9.1, which uses kernel 6.17.

Cluster issue​

I have a 7-node Proxmox cluster.
On one node, after upgrading to Proxmox 9.1 (kernel 6.17):
  • ZFS modules fail to load
  • ZFS pool does not import
  • System does not load modules correctly
Because of this, I removed the node from the cluster and attempted a clean reinstall.

Clean install problems (Proxmox 9.1 ISO)​

A clean installation of Proxmox 9.1 fails consistently:
  • Multiple errors during installation (dpkg, libfaketime, initramfs)
  • Errors like:
    • unable to sync file ... Input/output error
    • unable to install initramfs
  • Installer fails while generating initramfs for kernel 6.17
I tested:
  • Different USB flash drives
  • New SSD disk for the Boot
  • Rufus (DD mode)
  • Balena Etcher
  • Different USB ports
Same errors every time.

Temporary workaround (not successful)​

Out of desperation, I installed Proxmox 9.0 (kernel 6.14) first, which installed successfully, and then upgraded to 9.1.
After the upgrade:
  • The server crashes
  • Kernel modules do not load correctly
  • System is unstable / unusable

Hardware details​

  • Server: HP ProLiant G8
  • Storage controller: SATA in AHCI mode
  • No RAID controller
This is the cli of the errors i have during a clean install of proxmox 9.1 in the server.


This is by far the worst kernel experience I’ve had in 7 years as a Proxmox user.
The G8's always have a bit of a learning curve attached to them when you don't run them on RAID Controllers.
This however looks like a classic USB Flash issue. I guess you run your OS on a Flashdrive?

We have multiple G8s and they all work fine on the newest PVE/PBS Version. Most of our G8s boot from RAID Controllers tho.
The ones that don't don't have the full OS on the Flashdrive, only GRUB (and /boot) lies on the USB Drive and then boots the OS directly from the Disks.