Kernel regression in PVE 8.4.16

mstratford

New Member
Feb 12, 2026
3
0
1
Hello all!

After rebooting my PVE 8.4.16 box into kernel 6.8.12.18-pve, I have discovered that PCIe passthrough of one of my PCIe storage controllers now causes QEMU to hang trying to start the VM. This manifests in errors like `timeout waiting for systemd` and the `qemu.slice` showing a `100.scope` with just `[kvm]` listed and is unkillable. All edits to the VM's hardware configuration fail with timeouts waiting for various qemu services. It seems like it spins 100% on the first core (25% of 4 Cores).

If I remove the device from the passthrough (`0d`, lspci: `0d:00.0 SATA controller: ASMedia Technology Inc. ASM1166 Serial ATA Controller (rev 02)`), the VM starts fine.

If I reboot PVE into 6.8.12-15-pve, it also boots fine, suggesting a kernel regression.

1770853615323.png

At this point, i'd appreciate any help or guidance for where to look for logs which might explain why QEMU is hanging.

Thank you super kindly in advance!
 
Hi!

As far as I can remember, there was a patch applied for that specific SATA controller. Could you post the syslog before, at and after the VM hangs and the output of lspci -s 0000:01:00 -vvnnk?
 
Hi dakralex! Thanks for the help so far!

After some hardware shuffles, they are now `01` and `0b`. `0b` is the troublesome one. `lspci` provided for both.
Code:
root@nasty:~# lspci -s 0000:01:00 -vvnnk
01:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0086] (rev 05)
        Subsystem: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0086]
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 43
        IOMMU group: 9
        Region 0: I/O ports at f000 [disabled] [size=256]
        Region 1: Memory at fce40000 (64-bit, non-prefetchable) [disabled] [size=64K]
        Region 3: Memory at fce00000 (64-bit, non-prefetchable) [disabled] [size=256K]
        Expansion ROM at fcd00000 [disabled] [size=1M]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 512 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: No such device
                Not readable
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [c0] MSI-X: Enable- Count=16 Masked-
                Vector table: BAR=1 offset=0000e000
                PBA: BAR=1 offset=0000f000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 04000001 0000220f 01070000 07d37215
        Capabilities: [1e0 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [1c0 v1] Power Budgeting <?>
        Capabilities: [190 v1] Dynamic Power Allocation <?>
        Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Kernel driver in use: vfio-pci
        Kernel modules: mpt3sas


Code:
root@nasty:~# lspci -s 0000:0b:00 -vvnnk
0b:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1166 Serial ATA Controller [1b21:1166] (rev 02) (prog-if 01 [AHCI 1.0])
        Subsystem: ZyDAS Technology Corp. ASM1166 Serial ATA Controller [2116:2116]
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 44
        IOMMU group: 11
        Region 0: Memory at fcf82000 (32-bit, non-prefetchable) [size=8K]
        Region 5: Memory at fcf80000 (32-bit, non-prefetchable) [size=8K]
        Expansion ROM at fcf00000 [size=512K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [80] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x2
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [130 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Kernel driver in use: vfio-pci
        Kernel modules: ahci


Attached also is the syslog. Start of log is the reboot, VM starts a couple minutes later at:
Feb 12 18:31:34 nasty systemd[1]: Started 100.scope.

Lastly included a screenshot of trying to use the console once it thinks it has started and a screenshot of the screen when it hangs for a couple minutes before shutting down.
Screenshot 2026-02-12 at 18.31.52.pngScreenshot 2026-02-12 at 18.38.52.png

I have some kernel settings to allow me to pass through some of the motherboard USB controllers to other VMs
Code:
amd_iommu=on pcie_acs_override=downstream,multifunction
but I tried to remove the pcie_acs_override on boot and it still misbehaves on the newer kernel.


Thank you so much!
 
Last edited:
Hi!

I assume that /dev/sda through /dev/sdh are exposed through the SATA controller? At least these are reported after VM 100 is started:

Code:
Feb 12 18:31:31 nasty pvedaemon[3129]: start VM 100: UPID:nasty:00000C39:00001EA3:698E1C83:qmstart:100:root@pam:
Feb 12 18:31:31 nasty pvedaemon[2954]: <root@pam> starting task UPID:nasty:00000C39:00001EA3:698E1C83:qmstart:100:root@pam:
Feb 12 18:31:31 nasty kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Feb 12 18:31:31 nasty kernel: sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Feb 12 18:31:31 nasty kernel: sd 0:0:1:0: [sdb] Synchronizing SCSI cache
Feb 12 18:31:31 nasty kernel: sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Feb 12 18:31:31 nasty kernel: sd 0:0:2:0: [sdc] Synchronizing SCSI cache
Feb 12 18:31:31 nasty kernel: sd 0:0:2:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Feb 12 18:31:31 nasty kernel: sd 0:0:3:0: [sdd] Synchronizing SCSI cache
Feb 12 18:31:31 nasty kernel: sd 0:0:3:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Feb 12 18:31:31 nasty kernel: sd 0:0:4:0: [sde] Synchronizing SCSI cache
Feb 12 18:31:31 nasty kernel: sd 0:0:4:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Feb 12 18:31:31 nasty kernel: sd 0:0:5:0: [sdf] Synchronizing SCSI cache
Feb 12 18:31:31 nasty kernel: sd 0:0:5:0: [sdf] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Feb 12 18:31:31 nasty kernel: sd 0:0:6:0: [sdg] Synchronizing SCSI cache
Feb 12 18:31:31 nasty kernel: sd 0:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Feb 12 18:31:31 nasty kernel: sd 0:0:7:0: [sdh] Synchronizing SCSI cache
Feb 12 18:31:31 nasty kernel: sd 0:0:7:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221107000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: removing handle(0x000f), sas_addr(0x4433221107000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: enclosure logical id(0x56c92bf0000ac124), slot(4)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221100000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: removing handle(0x0009), sas_addr(0x4433221100000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: enclosure logical id(0x56c92bf0000ac124), slot(3)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221101000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: removing handle(0x000a), sas_addr(0x4433221101000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: enclosure logical id(0x56c92bf0000ac124), slot(2)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221102000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: removing handle(0x000b), sas_addr(0x4433221102000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: enclosure logical id(0x56c92bf0000ac124), slot(1)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221103000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: removing handle(0x000c), sas_addr(0x4433221103000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: enclosure logical id(0x56c92bf0000ac124), slot(0)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221104000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: removing handle(0x000d), sas_addr(0x4433221104000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: enclosure logical id(0x56c92bf0000ac124), slot(7)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221105000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: removing handle(0x000e), sas_addr(0x4433221105000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: enclosure logical id(0x56c92bf0000ac124), slot(6)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221106000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: removing handle(0x0010), sas_addr(0x4433221106000000)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: enclosure logical id(0x56c92bf0000ac124), slot(5)
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: sending message unit reset !!
Feb 12 18:31:31 nasty kernel: mpt2sas_cm0: message unit reset: SUCCESS
Feb 12 18:31:31 nasty kernel: sd 7:0:0:0: [sdk] Synchronizing SCSI cache
Feb 12 18:31:31 nasty kernel: ata7.00: Entering standby power mode
Feb 12 18:31:32 nasty kernel: sd 8:0:0:0: [sdl] Synchronizing SCSI cache
Feb 12 18:31:32 nasty kernel: ata8.00: Entering standby power mode
Feb 12 18:31:33 nasty kernel: ata9.00: Entering standby power mode
Feb 12 18:31:33 nasty kernel: ata10.00: Entering standby power mode
Feb 12 18:31:33 nasty kernel: sd 12:0:0:0: [sdo] Synchronizing SCSI cache
Feb 12 18:31:33 nasty kernel: ata12.00: Entering standby power mode

Could you also send a syslog booting and starting the VM on the older kernel version where it did work + if the enumeration of the PCI devices changes at all with lspci?