Issues after upgrading to 6.17.4-1-pve

I see there was a new kernel release… don’t think the changes will make a difference but i will give it a test in a bit and report back.
 
  • Like
Reactions: akesp
Still having the same issue with the newer kernel. First it takes forever to import my ZFS pools etc unlike 17.2 then I get a hard lockup on one of the CPU’s with a complaint about a nvme module.. what follows is the error log.. see pictures taken of the boot screen from my phone

For now I just booted back to the pined 6.17.2-2 kernel and everything is happy again.
 

Attachments

  • IMG_3825_preview.jpeg
    IMG_3825_preview.jpeg
    719.3 KB · Views: 12
Last edited:
  • Like
Reactions: akesp
If you are using an Intel-based system with an Intel-VMD (Volume Management Device) - try disabling it in the BIOS of your Mainboard / System. I have two Socket 1700 systems (one "production", one "playground") with a B760 chipset that were displaying the same error with Kernel 6.17.4.x . Disabling the VMD on my playground system eliminated the problems I had.

So I suppose the 6.17.4 has a problem with these Intel VMDs and disabling it fixes the boot issue.

But beware: I did NOT test whether all my storage devices were working as before after I turned VMD off. If you have a setup where this VMD is needed (Pseudo-Hardware-RAID ?), you might not get out of this kerfuffle this way. More Information about VMD here:
https://www.intel.com/content/www/u...anagement-device-intel-vmd-product-brief.html

EDIT: Another warning: I know from firsthand experience, that turning the VMD off will corrupt any Windows 11 installations on that machine. My playground machine had a drive with windows on it and I had to reinstall it after I messed around with the VMD.

EDIT2: Warning network devices might get shuffled around! When you are disabling the VMD, your network devices might get renamed. This happened to me on my "production" system. The network devices were enp2s0 and enp3s0 before, and enp4s0 enp5s0 after the disabling. If that happens, log into your proxmox server via local console, check your new interface names via "ip a", then go to "/etc/network" and edit the file "interfaces" or your files under "interfaces.d". After another reboot everything should be fine again.
 
Last edited:
If you are using an Intel-based system with an Intel-VMD (Volume Management Device) - try disabling it in the BIOS of your Mainboard / System. I have two Socket 1700 systems (one "production", one "playground") with a B760 chipset that were displaying the same error with Kernel 6.17.4.x . Disabling the VMD on my playground system eliminated the problems I had.

So I suppose the 6.17.4 has a problem with these Intel VMDs and disabling it fixes the boot issue.

But beware: I did NOT test whether all my storage devices were working as before after I turned VMD off. If you have a setup where this VMD is needed (Pseudo-Hardware-RAID ?), you might not get out of this kerfuffle this way. More Information about VMD here:
https://www.intel.com/content/www/u...anagement-device-intel-vmd-product-brief.html

EDIT: Another warning: I know from firsthand experience, that turning the VMD off will corrupt any Windows 11 installations on that machine. My playground machine had a drive with windows on it and I had to reinstall it after I messed around with the VMD.
Thank you!!! This does seem to resolve my boot issue.. Was able to complete a boot and a reboot.. Still takes forever now but I am willing to live with that as this system never reboots unless I am doing updates.. Not going to unpin the other kernel just yet and let this run for a few days with load and see how it goes..

As for my corruption, nope no issue..
 
Yeah I didn’t personally get that but would not have been an issue as it would not have been the first time my network bonds got messed up lol..