Why does the hard drive always disappear after a few minutes of startup

Fucter

New Member
May 30, 2023
8
0
1
syslog
kernel: [ 838.961593] pcieport 0000:0a:04.0: can't change power state from D3cold to D0 (config space inaccessible) May 30 13:30:29 qysx kernel: [ 838.961627] pcieport 0000:0a:00.0: can't change power state from D3cold to D0 (config space inaccessible) May 30 13:31:03 qysx kernel: [ 873.191794] ata1.00: exception Emask 0x52 SAct 0x8000000 SErr 0xffffffff action 0xe frozen May 30 13:31:03 qysx kernel: [ 873.191833] ata1: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch } May 30 13:31:03 qysx kernel: [ 873.191845] ata1.00: failed command: WRITE FPDMA QUEUED May 30 13:31:03 qysx kernel: [ 873.191850] ata1.00: cmd 61/08:d8:80:06:4c/00:00:00:00:00/40 tag 27 ncq dma 4096 out May 30 13:31:03 qysx kernel: [ 873.191850] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error) May 30 13:31:03 qysx kernel: [ 873.191863] ata1.00: status: { DRDY } May 30 13:31:03 qysx kernel: [ 873.191873] ata1: hard resetting link May 30 13:31:03 qysx kernel: [ 873.191890] ahci 0000:09:00.1: AHCI controller unavailable! May 30 13:31:04 qysx kernel: [ 874.236497] ata1: failed to resume link (SControl FFFFFFFF) May 30 13:31:04 qysx kernel: [ 874.236518] ata1: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) May 30 13:31:09 qysx kernel: [ 879.257631] ata1: hard resetting link May 30 13:31:09 qysx kernel: [ 879.257649] ahci 0000:09:00.1: AHCI controller unavailable! May 30 13:31:10 qysx kernel: [ 880.298392] ata1: failed to resume link (SControl FFFFFFFF) May 30 13:31:10 qysx kernel: [ 880.298488] ata1: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) May 30 13:31:10 qysx kernel: [ 880.298496] ata1: limiting SATA link speed to <unknown> May 30 13:31:15 qysx kernel: [ 885.363207] ata1: hard resetting link May 30 13:31:15 qysx kernel: [ 885.363218] ahci 0000:09:00.1: AHCI controller unavailable! May 30 13:31:16 qysx kernel: [ 886.396220] ata1: failed to resume link (SControl FFFFFFFF) May 30 13:31:16 qysx kernel: [ 886.396279] ata1: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) May 30 13:31:16 qysx kernel: [ 886.396290] ata1.00: disabled May 30 13:31:16 qysx kernel: [ 886.396298] ata1.00: device reported invalid CHS sector 0 May 30 13:31:16 qysx kernel: [ 886.396304] ahci 0000:09:00.1: AHCI controller unavailable! May 30 13:31:16 qysx kernel: [ 886.396354] sd 0:0:0:0: [sda] tag#27 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=45s May 30 13:31:16 qysx kernel: [ 886.396364] sd 0:0:0:0: [sda] tag#27 Sense Key : Illegal Request [current] May 30 13:31:16 qysx kernel: [ 886.396367] sd 0:0:0:0: [sda] tag#27 Add. Sense: Unaligned write command May 30 13:31:16 qysx kernel: [ 886.396379] sd 0:0:0:0: [sda] tag#27 CDB: Write(16) 8a 00 00 00 00 00 00 4c 06 80 00 00 00 08 00 00 May 30 13:31:16 qysx kernel: [ 886.396381] blk_update_request: I/O error, dev sda, sector 4982400 op 0x1:(WRITE) flags 0x208800 phys_seg 1 prio class 0 May 30 13:31:16 qysx kernel: [ 886.396409] ata1: EH complete May 30 13:31:16 qysx kernel: [ 886.396431] ata1.00: detaching (SCSI 0:0:0:0) May 30 13:31:16 qysx kernel: [ 886.398366] sd 0:0:0:0: [sda] Synchronizing SCSI cache May 30 13:31:16 qysx kernel: [ 886.398408] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK May 30 13:31:16 qysx kernel: [ 886.398410] sd 0:0:0:0: [sda] Stopping disk May 30 13:31:16 qysx kernel: [ 886.398413] sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK


root@qysx:~# lspci -D -nnk 0000:00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Root Complex [1022:1630] Subsystem: ASUSTeK Computer Inc. Renoir Root Complex [1043:8809] 0000:00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Renoir IOMMU [1022:1631] Subsystem: ASUSTeK Computer Inc. Renoir IOMMU [1043:8809] 0000:00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632] 0000:00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1633] Kernel driver in use: pcieport 0000:00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632] DeviceName: Onboard IGD 0000:00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634] Kernel driver in use: pcieport 0000:00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634] Kernel driver in use: pcieport 0000:00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632] 0000:00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635] Kernel driver in use: pcieport 0000:00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635] Kernel driver in use: pcieport 0000:00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 51) Subsystem: ASUSTeK Computer Inc. FCH SMBus Controller [1043:87e1] Kernel driver in use: piix4_smbus Kernel modules: i2c_piix4, sp5100_tco 0000:00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51) Subsystem: ASUSTeK Computer Inc. FCH LPC Bridge [1043:87e1] 0000:00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166a] 0000:00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166b] 0000:00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166c] 0000:00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166d] 0000:00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166e] 0000:00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166f] 0000:00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1670] 0000:00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1671] 0000:01:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01) Kernel driver in use: pcieport 0000:02:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01) Kernel driver in use: pcieport 0000:02:02.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01) Kernel driver in use: pcieport 0000:02:03.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01) Kernel driver in use: pcieport 0000:02:08.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01) Kernel driver in use: pcieport 0000:02:0a.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01) Kernel driver in use: pcieport 0000:02:0b.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01) Kernel driver in use: pcieport 0000:05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05) Subsystem: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:0123] Kernel driver in use: vfio-pci Kernel modules: r8169 0000:06:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05) Subsystem: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:0123] Kernel driver in use: vfio-pci Kernel modules: r8169 0000:07:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05) Subsystem: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:0123] Kernel driver in use: vfio-pci Kernel modules: r8169 0000:08:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05) Subsystem: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:0123] Kernel driver in use: vfio-pci Kernel modules: r8169 0000:09:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01) Subsystem: ASMedia Technology Inc. 400 Series Chipset USB 3.1 XHCI Controller [1b21:1142] Kernel driver in use: xhci_hcd Kernel modules: xhci_pci 0000:09:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01) Subsystem: ASMedia Technology Inc. 400 Series Chipset SATA Controller [1b21:1062] Kernel driver in use: ahci Kernel modules: ahci 0000:09:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01) Kernel driver in use: pcieport 0000:0a:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev ff) Kernel driver in use: pcieport 0000:0a:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev ff) Kernel driver in use: pcieport 0000:0a:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev ff) Kernel driver in use: pcieport 0000:0c:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev ff) Kernel driver in use: r8169 Kernel modules: r8169 0000:0e:00.0 Non-Volatile memory controller [0108]: SK hynix Device [1c5c:1627] Subsystem: SK hynix Device [1c5c:1627] Kernel driver in use: nvme Kernel modules: nvme 0000:0f:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [1002:1638] (rev c9) Subsystem: ASUSTeK Computer Inc. Device [1043:8809] Kernel driver in use: vfio-pci Kernel modules: amdgpu 0000:0f:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637] Subsystem: ASUSTeK Computer Inc. Device [1043:8809] Kernel modules: snd_hda_intel 0000:0f:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df] Subsystem: ASUSTeK Computer Inc. Family 17h (Models 10h-1fh) Platform Security Processor [1043:8809] Kernel driver in use: ccp Kernel modules: ccp 0000:0f:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639] Subsystem: ASUSTeK Computer Inc. Renoir USB 3.1 [1043:87e1] Kernel driver in use: xhci_hcd Kernel modules: xhci_pci 0000:0f:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639] Subsystem: ASUSTeK Computer Inc. Renoir USB 3.1 [1043:87e1] Kernel driver in use: xhci_hcd Kernel modules: xhci_pci 0000:0f:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3] Subsystem: ASUSTeK Computer Inc. Family 17h (Models 10h-1fh) HD Audio Controller [1043:8797] Kernel modules: snd_hda_intel 0000:10:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 81) Subsystem: ASUSTeK Computer Inc. FCH SATA Controller [AHCI mode] [1043:87e1] Kernel driver in use: ahci Kernel modules: ahci

CPU is 5600G, brd:b450m pros-s
and I have installed a Windows 10 with a direct connection to the core graphics card. The other one is DSM, which does not have a direct connection to the Sata controller, but only a direct connection to the hard drive. However, after a few minutes of booting up, the DSM's hard drive disappears. The DSM displays io error, and the hard drive also disappears. I have to restart PVE to see the hard drive again. I don't have any other devices.
 
Last edited:
From the part of the system log that is visible, it looks like you are doing passthrough of 0a:00.0 and 0a:00.4 which is a bit strange because they are PCI bridges (and will influence other PCI(e) devices). Can you please share the VM configuration files of the VM(s) with PCI(e) passthrough and the VM with the disk passthrough?
Maybe the disk is connected to a SATA controller that you passthrough or it is in the same IOMMU group (which also makes Proxmox lose connection to it)? What is the full output of these commands: cat /proc/cmdline; for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done?
 
From the part of the system log that is visible, it looks like you are doing passthrough of 0a:00.0 and 0a:00.4 which is a bit strange because they are PCI bridges (and will influence other PCI(e) devices). Can you please share the VM configuration files of the VM(s) with PCI(e) passthrough and the VM with the disk passthrough?
Maybe the disk is connected to a SATA controller that you passthrough or it is in the same IOMMU group (which also makes Proxmox lose connection to it)? What is the full output of these commands: cat /proc/cmdline; for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done?
BOOT_IMAGE=/boot/vmlinuz-5.13.19-2-pve root=/dev/mapper/pve-root ro quiet iommu=pt initcall_blacklist=sysfb_init amd_iommu=on drm.debug=0 kvm_amd.nested=1 kvm.ignore_msrs=1 kvm.report_ignored_msrs=0 pci=assign-busses pcie_acs_override=downstream,multifunction vfio_iommu_type1.allow_unsafe_interrupts=1
IOMMU group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU group 10 01:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01)
IOMMU group 11 02:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01)
IOMMU group 12 02:02.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01)
IOMMU group 13 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01)
IOMMU group 14 02:08.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01)
IOMMU group 15 02:0a.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01)
IOMMU group 16 02:0b.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1812] (rev 01)
IOMMU group 17 05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
IOMMU group 18 06:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
IOMMU group 19 07:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
IOMMU group 1 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1633]
IOMMU group 20 08:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
IOMMU group 21 09:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
IOMMU group 22 09:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
IOMMU group 23 09:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
IOMMU group 24 0a:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU group 25 0a:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU group 26 0a:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU group 27 0c:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 04)
IOMMU group 28 0e:00.0 Non-Volatile memory controller [0108]: SK hynix Device [1c5c:1627]
IOMMU group 29 0f:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [1002:1638] (rev c9)
IOMMU group 2 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU group 30 0f:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637]
IOMMU group 31 0f:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
IOMMU group 32 0f:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
IOMMU group 33 0f:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
IOMMU group 34 0f:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3]
IOMMU group 35 10:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 81)
IOMMU group 3 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU group 4 00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU group 5 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU group 6 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
IOMMU group 7 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
IOMMU group 8 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 51)
IOMMU group 8 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU group 9 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166a]
IOMMU group 9 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166b]
IOMMU group 9 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166c]
IOMMU group 9 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166d]
IOMMU group 9 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166e]
IOMMU group 9 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166f]
IOMMU group 9 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1670]
IOMMU group 9 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1671]
 

Attachments

  • dsm.png
    dsm.png
    38.2 KB · Views: 2
  • windows.png
    windows.png
    58.7 KB · Views: 2
From the part of the system log that is visible, it looks like you are doing passthrough of 0a:00.0 and 0a:00.4 which is a bit strange because they are PCI bridges (and will influence other PCI(e) devices). Can you please share the VM configuration files of the VM(s) with PCI(e) passthrough and the VM with the disk passthrough?
Maybe the disk is connected to a SATA controller that you passthrough or it is in the same IOMMU group (which also makes Proxmox lose connection to it)? What is the full output of these commands: cat /proc/cmdline; for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done?
Windows only provides direct access to vga devices:qemu config:
Code:
boot: order=sata0
cores: 8
cpu: host,hidden=1
hostpci0: 0000:0f:00.0,pcie=1,x-vga=1,romfile=vbios_1638.dat
hotplug: 0
machine: pc-q35-6.1
memory: 20480
meta: creation-qemu=6.1.0,ctime=1685214268
name: Windows
net0: virtio=62:F8:13:40:52:18,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
sata0: local-lvm:vm-103-disk-0,size=80G
sata1: local-lvm:vm-103-disk-1,size=250G
scsihw: virtio-scsi-pci
smbios1: uuid=8dc6a18d-5393-4d94-b4d7-0930dd473953
sockets: 1
vmgenid: e0bee595-38e9-4c22-b2af-d56da6d2afd4

my grub: GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt initcall_blacklist=sysfb_init amd_iommu=on drm.debug=0 kvm_amd.nested=1 kvm.ignore_msrs=1 kvm.report_ignored_msrs=0 pci=assign-busses pcie_acs_override=downstream,multifunction vfio_iommu_type1.allow_unsafe_interrupts=1"

pve-blacklist
blacklist nvidiafb blacklist amdgpu blacklist i915 blacklist snd_hda_intel options vfio_iommu_type1 allow_unsafe_interrupts=1

vfio:
[/B] options vfio-pci ids=1002:1638 options vfio-pci disable_idle_d3=1 [B]

modules
[/B] vfio vfio_iommu_type1 vfio_pci vfio_virqfd [B]
There's nothing else besides that
 
Jun 04 19:55:01 Fucter kernel: pcieport 0000:0a:04.0: can't change power state from D3cold to D0 (config space inaccessible) Jun 04 19:55:01 Fucter kernel: pcieport 0000:0a:00.0: can't change power state from D3cold to D0 (config space inaccessible) Jun 04 19:55:33 Fucter kernel: ata1.00: exception Emask 0x52 SAct 0x1000 SErr 0xffffffff action 0xe frozen Jun 04 19:55:33 Fucter kernel: ata1: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch } Jun 04 19:55:33 Fucter kernel: ata1.00: failed command: WRITE FPDMA QUEUED Jun 04 19:55:33 Fucter kernel: ata1.00: cmd 61/01:60:08:00:90/00:00:00:00:00/40 tag 12 ncq dma 512 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error) Jun 04 19:55:33 Fucter kernel: ata1.00: status: { DRDY } Jun 04 19:55:33 Fucter kernel: ata1: hard resetting link Jun 04 19:55:33 Fucter kernel: ahci 0000:09:00.1: AHCI controller unavailable! Jun 04 19:55:34 Fucter kernel: ata1: failed to resume link (SControl FFFFFFFF) Jun 04 19:55:34 Fucter kernel: ata1: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Jun 04 19:55:40 Fucter kernel: ata1: hard resetting link Jun 04 19:55:40 Fucter kernel: ahci 0000:09:00.1: AHCI controller unavailable! Jun 04 19:55:41 Fucter kernel: ata1: failed to resume link (SControl FFFFFFFF) Jun 04 19:55:41 Fucter kernel: ata1: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Jun 04 19:55:41 Fucter kernel: ata1: limiting SATA link speed to <unknown> Jun 04 19:55:46 Fucter kernel: ata1: hard resetting link Jun 04 19:55:46 Fucter kernel: ahci 0000:09:00.1: AHCI controller unavailable! Jun 04 19:55:47 Fucter kernel: ata1: failed to resume link (SControl FFFFFFFF) Jun 04 19:55:47 Fucter kernel: ata1: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Jun 04 19:55:47 Fucter kernel: ata1.00: disabled Jun 04 19:55:47 Fucter kernel: ata1.00: device reported invalid CHS sector 0 Jun 04 19:55:47 Fucter kernel: ahci 0000:09:00.1: AHCI controller unavailable! Jun 04 19:55:47 Fucter kernel: sd 0:0:0:0: [sda] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=44s Jun 04 19:55:47 Fucter kernel: sd 0:0:0:0: [sda] tag#12 Sense Key : Illegal Request [current] Jun 04 19:55:47 Fucter kernel: sd 0:0:0:0: [sda] tag#12 Add. Sense: Unaligned write command Jun 04 19:55:47 Fucter kernel: sd 0:0:0:0: [sda] tag#12 CDB: Write(16) 8a 00 00 00 00 00 00 90 00 08 00 00 00 01 00 00 Jun 04 19:55:47 Fucter kernel: blk_update_request: I/O error, dev sda, sector 9437192 op 0x1:(WRITE) flags 0x208800 phys_seg 1 prio class 0 Jun 04 19:55:47 Fucter kernel: ata1: EH complete Jun 04 19:55:47 Fucter kernel: ata1.00: detaching (SCSI 0:0:0:0) Jun 04 19:55:47 Fucter kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache Jun 04 19:55:47 Fucter kernel: sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Jun 04 19:55:47 Fucter kernel: sd 0:0:0:0: [sda] Stopping disk Jun 04 19:55:47 Fucter kernel: sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Jun 04 19:57:33 Fucter pvedaemon[1188]: <root@pam> successful auth for user 'root@pam' Jun 04 19:58:13 Fucter pvestatd[1168]: auth key pair too old, rotating.. Jun 04 19:59:04 Fucter systemd[1]: Starting Cleanup of Temporary Directories... Jun 04 19:59:04 Fucter systemd[1]: systemd-tmpfiles-clean.service: Succeeded. Jun 04 19:59:04 Fucter systemd[1]: Finished Cleanup of Temporary Directories.

The ahci controller is incorrect after only five minutes of power on
 
Unfortunately, the IOMMU group information is useless because of the use of pcie_acs_override. Can you show the groups without using it?
Is 0f:00.0 the only device that is passed through to a VM? Maybe lspci -t can give some information about which devices use PCI bridges 0a:04.0 and 0a:00.0.
Do you get the same SATA issues when not starting VMs with passthough, or when you disable IOMMU?
 
Unfortunately, the IOMMU group information is useless because of the use of pcie_acs_override. Can you show the groups without using it?
Is 0f:00.0 the only device that is passed through to a VM? Maybe lspci -t can give some information about which devices use PCI bridges 0a:04.0 and 0a:00.0.
Do you get the same SATA issues when not starting VMs with passthough, or when you disable IOMMU?
root@Fucter:~# lspci -t
-[0000:00]-+-00.0
+-00.2
+-01.0
+-01.1-[01-08]----00.0-[02-08]--+-00.0-[03]--
| +-02.0-[04]--
| +-03.0-[05]----00.0
| +-08.0-[06]----00.0
| +-0a.0-[07]----00.0
| \-0b.0-[08]----00.0
+-02.0
+-02.1-[09-0d]--+-00.0
| +-00.1
| \-00.2-[0a-0d]--+-00.0-[0b]--
| +-01.0-[0c]----00.0
| \-04.0-[0d]--
+-02.2-[0e]----00.0
+-08.0
+-08.1-[0f]--+-00.0
| +-00.1
| +-00.2
| +-00.3
| +-00.4
| \-00.6
+-08.2-[10]----00.0
+-14.0
+-14.3
+-18.0
+-18.1
+-18.2
+-18.3
+-18.4
+-18.5
+-18.6
\-18.7
 
Unfortunately, the IOMMU group information is useless because of the use of pcie_acs_override. Can you show the groups without using it?
Is 0f:00.0 the only device that is passed through to a VM? Maybe lspci -t can give some information about which devices use PCI bridges 0a:04.0 and 0a:00.0.
Do you get the same SATA issues when not starting VMs with passthough, or when you disable IOMMU?


I started trying to close PCIe_ Acs_ Override is used, and I will observe if there will be a d3cold later. I only directly connected to the core graphics card of APU, and there are no other options left
 
Even if I turn off PCIe_ Acs_ Overrides, still reporting errors

Jun 05 00:53:30 Fucter pvedaemon[1191]: <root@pam> successful auth for user 'root@pam' Jun 05 00:53:43 Fucter pvedaemon[1193]: <root@pam> successful auth for user 'root@pam' Jun 05 01:01:23 Fucter pvedaemon[1193]: <root@pam> starting task UPID:Fucter:00001DBC:000245FE:647CC363:vncproxy:102:root@pam: Jun 05 01:01:23 Fucter pvedaemon[7612]: starting vnc proxy UPID:Fucter:00001DBC:000245FE:647CC363:vncproxy:102:root@pam: Jun 05 01:01:35 Fucter pveproxy[1200]: problem with client ::ffff:10.10.10.29; Connection reset by peer Jun 05 01:01:35 Fucter pvedaemon[1193]: <root@pam> end task UPID:Fucter:00001DBC:000245FE:647CC363:vncproxy:102:root@pam: OK Jun 05 01:03:25 Fucter pveproxy[1200]: problem with client ::ffff:10.10.10.24; Connection reset by peer Jun 05 01:03:25 Fucter pveproxy[1200]: proxy detected vanished client connection Jun 05 01:06:40 Fucter smartd[774]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 69 to 68 Jun 05 01:06:40 Fucter smartd[774]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 31 to 32 Jun 05 01:17:01 Fucter CRON[9969]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jun 05 01:17:01 Fucter CRON[9970]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 05 01:17:01 Fucter CRON[9969]: pam_unix(cron:session): session closed for user root Jun 05 01:19:46 Fucter kernel: pcieport 0000:0a:04.0: can't change power state from D3cold to D0 (config space inaccessible) Jun 05 01:19:46 Fucter kernel: pcieport 0000:0a:00.0: can't change power state from D3cold to D0 (config space inaccessible) Jun 05 01:20:47 Fucter kernel: ata1.00: exception Emask 0x52 SAct 0x0 SErr 0xffffffff action 0xe frozen Jun 05 01:20:47 Fucter kernel: ata1: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch } Jun 05 01:20:47 Fucter kernel: ata1.00: failed command: FLUSH CACHE EXT Jun 05 01:20:47 Fucter kernel: ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 6 res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error) Jun 05 01:20:47 Fucter kernel: ata1.00: status: { DRDY } Jun 05 01:20:47 Fucter kernel: ata1: hard resetting link Jun 05 01:20:47 Fucter kernel: ahci 0000:09:00.1: AHCI controller unavailable! Jun 05 01:20:48 Fucter kernel: ata1: failed to resume link (SControl FFFFFFFF) Jun 05 01:20:48 Fucter kernel: ata1: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Jun 05 01:20:53 Fucter kernel: ata1: hard resetting link Jun 05 01:20:53 Fucter kernel: ahci 0000:09:00.1: AHCI controller unavailable! Jun 05 01:20:54 Fucter kernel: ata1: failed to resume link (SControl FFFFFFFF) Jun 05 01:20:54 Fucter kernel: ata1: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Jun 05 01:20:54 Fucter kernel: ata1: limiting SATA link speed to <unknown> Jun 05 01:20:59 Fucter kernel: ata1: hard resetting link Jun 05 01:20:59 Fucter kernel: ahci 0000:09:00.1: AHCI controller unavailable! Jun 05 01:21:00 Fucter kernel: ata1: failed to resume link (SControl FFFFFFFF) Jun 05 01:21:00 Fucter kernel: ata1: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Jun 05 01:21:00 Fucter kernel: ata1.00: disabled Jun 05 01:21:00 Fucter kernel: ahci 0000:09:00.1: AHCI controller unavailable! Jun 05 01:21:00 Fucter kernel: sd 0:0:0:0: [sda] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=73s Jun 05 01:21:00 Fucter kernel: sd 0:0:0:0: [sda] tag#6 Sense Key : Illegal Request [current] Jun 05 01:21:00 Fucter kernel: sd 0:0:0:0: [sda] tag#6 Add. Sense: Unaligned write command Jun 05 01:21:00 Fucter kernel: sd 0:0:0:0: [sda] tag#6 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 Jun 05 01:21:00 Fucter kernel: blk_update_request: I/O error, dev sda, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 Jun 05 01:21:00 Fucter kernel: ata1: EH complete Jun 05 01:21:00 Fucter kernel: ata1.00: detaching (SCSI 0:0:0:0) Jun 05 01:21:00 Fucter kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache Jun 05 01:21:00 Fucter kernel: sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Jun 05 01:21:00 Fucter kernel: sd 0:0:0:0: [sda] Stopping disk Jun 05 01:21:00 Fucter kernel: sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Jun 05 01:21:00 Fucter pvestatd[1162]: status update time (7.526 seconds) Jun 05 01:36:39 Fucter smartd[774]: Device: /dev/sda [SAT], removed ATA device: No such device Jun 05 02:07:26 Fucter pvedaemon[1193]: <root@pam> successful auth for user 'root@pam' Jun 05 02:08:29 Fucter pveproxy[1202]: worker exit Jun 05 02:08:29 Fucter pveproxy[1199]: worker 1202 finished Jun 05 02:08:29 Fucter pveproxy[1199]: starting 1 worker(s) Jun 05 02:08:29 Fucter pveproxy[1199]: worker 17158 started
 
disable csm,now
Jun 05 17:32:16 Fucter kernel: pcieport 0000:0a:04.0: Unable to change power state from D3hot to D0, device inaccessible Jun 05 17:32:16 Fucter kernel: pcieport 0000:0a:00.0: Unable to change power state from D3hot to D0, device inaccessible Jun 05 17:32:49 Fucter kernel: ata1.00: exception Emask 0x52 SAct 0x80000 SErr 0xffffffff action 0xe frozen Jun 05 17:32:49 Fucter kernel: ata1: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch } Jun 05 17:32:49 Fucter kernel: ata1.00: failed command: WRITE FPDMA QUEUED Jun 05 17:32:49 Fucter kernel: ata1.00: cmd 61/01:98:08:00:90/00:00:00:00:00/40 tag 19 ncq dma 512 out res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x56 (ATA bus error) Jun 05 17:32:49 Fucter kernel: ata1.00: status: { DRDY } Jun 05 17:32:49 Fucter kernel: ata1: hard resetting link Jun 05 17:32:49 Fucter kernel: ahci 0000:09:00.1: AHCI controller unavailable! Jun 05 17:32:50 Fucter kernel: ata1: failed to resume link (SControl FFFFFFFF) Jun 05 17:32:50 Fucter kernel: ata1: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Jun 05 17:32:55 Fucter kernel: ata1: hard resetting link Jun 05 17:32:55 Fucter kernel: ahci 0000:09:00.1: AHCI controller unavailable! Jun 05 17:32:56 Fucter kernel: ata1: failed to resume link (SControl FFFFFFFF) Jun 05 17:32:56 Fucter kernel: ata1: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Jun 05 17:32:56 Fucter kernel: ata1: limiting SATA link speed to <unknown> Jun 05 17:33:01 Fucter kernel: ata1: hard resetting link Jun 05 17:33:01 Fucter kernel: ahci 0000:09:00.1: AHCI controller unavailable! Jun 05 17:33:03 Fucter kernel: ata1: failed to resume link (SControl FFFFFFFF) Jun 05 17:33:03 Fucter kernel: ata1: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Jun 05 17:33:03 Fucter kernel: ata1.00: disable device Jun 05 17:33:03 Fucter kernel: ahci 0000:09:00.1: AHCI controller unavailable! Jun 05 17:33:03 Fucter kernel: sd 0:0:0:0: [sda] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=44s Jun 05 17:33:03 Fucter kernel: sd 0:0:0:0: [sda] tag#19 Sense Key : Illegal Request [current] Jun 05 17:33:03 Fucter kernel: sd 0:0:0:0: [sda] tag#19 Add. Sense: Unaligned write command Jun 05 17:33:03 Fucter kernel: sd 0:0:0:0: [sda] tag#19 CDB: Write(16) 8a 00 00 00 00 00 00 90 00 08 00 00 00 01 00 00 Jun 05 17:33:03 Fucter kernel: I/O error, dev sda, sector 9437192 op 0x1:(WRITE) flags 0x208800 phys_seg 1 prio class 2 Jun 05 17:33:03 Fucter kernel: ata1: EH complete Jun 05 17:33:03 Fucter kernel: ata1.00: detaching (SCSI 0:0:0:0) Jun 05 17:33:03 Fucter kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache Jun 05 17:33:03 Fucter kernel: sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Jun 05 17:33:03 Fucter kernel: sd 0:0:0:0: [sda] Stopping disk Jun 05 17:33:03 Fucter kernel: sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Jun 05 17:33:03 Fucter pvestatd[1168]: status update time (13.449 seconds) Jun 05 17:40:16 Fucter smartd[779]: Device: /dev/sda [SAT], removed ATA device: No such device Jun 05 17:55:29 Fucter pvedaemon[1197]: <root@pam> successful auth for user 'root@pam'

D3cold to D0 - D3hot to D0
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!