[RESOLVED] Strange Sata fault after upgrade to Proxmox 9.0

Fabry

New Member
Aug 21, 2025
2
0
1
I'm having a problem with one of my Proxmox installation.

I have two installation that are nearly identical as hardware. On one, the upgrade went smoothly without any issues. On the other, it stopped halfway through because of a conflict with Docker (which was installed directly on Proxmox, not in an LXC or VM) that hadn't been updated to the Trixie repositories.

After some trial and error with --fix-broken and other corrections (I eventually even removed Docker since I no longer needed it), I managed to get the upgrade to complete.

However, after the reboot, my SATA devices were no longer present. This includes an HDD where some VMs and LXCs reside. The drive powers on at boot and even appears in the kernel log, but after a few seconds, it powers off, and by the time the boot process is finished, the drive is gone.

I thought it might be a drive failure, but I've noticed something strange: even the SATA CD-ROM is not present after the boot finishes (while it's also detected by the kernel during boot).

What could be the issue? Booting the previous 6.8 kernel doesn't fix it.
Lsscsi doesn't detect the drive (or the CD-ROM), while lspci detects the SATA card (integrated into the chipset).

I'm considering reinstalling from scratch, but it would be a huge amount of work. I'd have to migrate all the accessible VMs and LXCs (since they don't use the disappeared disk) to the other node, then manually copy all the configurations for those that do use the disappeared disk (which I assume aren't migratable or suitable for backup).

Finally, this node is also the cluster master, and I think it would be inconvenient to reassign the other node (which would be full of VMs and LXCs) to a new cluster.

So, I'm first looking for recovery solutions without having to reinstall everything.
 
I've solved it!

After carefully reading the kernel log, I saw that the SATA devices disappeared right after a "load driverctl."

Upon checking, I found that I had a
Code:
driverctl set-override 0000:01:00.1 vfio-pci
command set, which some GPU/PCI Passthrough guide had made me configure in the past (I don't even remember why it was needed).

However, because I moved the video card to a different slot, the GPU was no longer at 0000:01:00.1 but at 0000:03:00.1. The address 0000:01:00 now belonged to the SATA controller (and possibly the USB as well).
With the override removed, everything started working as it should.

I also removed the loading of the VFIO modules since I'm no longer using full PCI Passthrough, but Nvidia vGPU instead. That is, I removed file module.conf from /etc/modprobe.d/ which contained
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd #This not working/not found with recent Proxmox
 
Last edited: