Host freezes on VM start when passing through SATA disk

moxs20

Member
Apr 10, 2020
2
0
6
47
Hi everyone,

On my home server I'm trying to pass through a SATA HDD to a VM (Debian/Openmediavault) via a cheap 2-port PCIe SATA controller (ASM1062).

I followed the PCI passthrough instructions on the wiki (https://pve.proxmox.com/wiki/Pci_passthrough). Now every time I start up the VM with passed through SATA controller and a HDD attached, my Proxmox host freezes/becomes non-responsive, so I have to power it off and restart. With no HDD attached, the VM boots fine.

I already changed PCIe ports, updated BIOS, tried another guest system (Ubuntu server), but to no avail. Syslog messages (see below) look always the same before the machine freezes.

Here is my current setup and config:

Hardware
Code:
FUJITSU D3400-B1/D3400-B1, BIOS V5.0.0.11 R1.29.0 for D3400-B1x 01/27/2020
CPU0: Intel(R) Celeron(R) CPU G3900 @ 2.80GHz (family: 0x6, model: 0x5e, stepping: 0x3)
SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)

pveversion
Code:
pve-manager/6.1-8/806edfe1 (running kernel: 5.3.18-3-pve)

cat /etc/default/grub
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"

cat /etc/modules
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

dmesg | grep ecap
Code:
[    0.062365] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 7e3ff0505e
[    0.062368] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da

find /sys/kernel/iommu_groups/ -type l
Code:
/sys/kernel/iommu_groups/7/devices/0000:00:1f.2
/sys/kernel/iommu_groups/7/devices/0000:00:1f.0
/sys/kernel/iommu_groups/7/devices/0000:00:1f.3
/sys/kernel/iommu_groups/7/devices/0000:00:1f.4
/sys/kernel/iommu_groups/5/devices/0000:00:1c.0
/sys/kernel/iommu_groups/3/devices/0000:00:16.0
/sys/kernel/iommu_groups/1/devices/0000:00:02.0
/sys/kernel/iommu_groups/8/devices/0000:01:00.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.7
/sys/kernel/iommu_groups/4/devices/0000:00:17.0
/sys/kernel/iommu_groups/2/devices/0000:00:14.2
/sys/kernel/iommu_groups/2/devices/0000:00:14.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/9/devices/0000:02:00.0

cat /etc/pve/nodes/pve/qemu-server/107.conf
Code:
bootdisk: scsi0
cores: 1
hostpci0: 02:00.0
ide2: local:iso/ubuntu-18.04.4-live-server-amd64.iso,media=cdrom
memory: 1024
name: vm-ubuntu
net0: virtio=06:27:82:86:D5:33,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-107-disk-0,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=59c160d6-...
sockets: 1
vmgenid: 4632c869-...

tail -f /var/log/syslog (until system freezes)
Code:
Apr 10 15:25:36 pve pvedaemon[1070]: <root@pam> starting task UPID:pve:00002065:00026913:5E9073D0:qmstart:107:root@pam:
Apr 10 15:25:36 pve pvedaemon[8293]: start VM 107: UPID:pve:00002065:00026913:5E9073D0:qmstart:107:root@pam:
Apr 10 15:25:36 pve kernel: [ 1579.583491] ata6.00: disabled
Apr 10 15:25:36 pve kernel: [ 1579.583733] sd 5:0:0:0: [sdb] Synchronizing SCSI cache
Apr 10 15:25:36 pve kernel: [ 1579.583754] sd 5:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 10 15:25:36 pve kernel: [ 1579.583755] sd 5:0:0:0: [sdb] Stopping disk
Apr 10 15:25:36 pve kernel: [ 1579.583759] sd 5:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 10 15:25:36 pve systemd[1]: Created slice qemu.slice.
Apr 10 15:25:36 pve systemd[1]: Started 107.scope.
Apr 10 15:25:36 pve systemd-udevd[8298]: Using default interface naming scheme 'v240'.
Apr 10 15:25:36 pve systemd-udevd[8298]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 10 15:25:36 pve systemd-udevd[8298]: Could not generate persistent MAC address for tap107i0: No such file or directory
Apr 10 15:25:36 pve kernel: [ 1580.347080] device tap107i0 entered promiscuous mode
Apr 10 15:25:36 pve systemd-udevd[8298]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 10 15:25:36 pve systemd-udevd[8298]: Could not generate persistent MAC address for fwbr107i0: No such file or directory
Apr 10 15:25:36 pve systemd-udevd[8300]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 10 15:25:36 pve systemd-udevd[8299]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 10 15:25:36 pve systemd-udevd[8299]: Using default interface naming scheme 'v240'.
Apr 10 15:25:36 pve systemd-udevd[8299]: Could not generate persistent MAC address for fwln107i0: No such file or directory
Apr 10 15:25:36 pve systemd-udevd[8300]: Using default interface naming scheme 'v240'.
Apr 10 15:25:36 pve systemd-udevd[8300]: Could not generate persistent MAC address for fwpr107p0: No such file or directory
Apr 10 15:25:36 pve kernel: [ 1580.379381] fwbr107i0: port 1(fwln107i0) entered blocking state
Apr 10 15:25:36 pve kernel: [ 1580.379382] fwbr107i0: port 1(fwln107i0) entered disabled state
Apr 10 15:25:36 pve kernel: [ 1580.379445] device fwln107i0 entered promiscuous mode
Apr 10 15:25:36 pve kernel: [ 1580.379476] fwbr107i0: port 1(fwln107i0) entered blocking state
Apr 10 15:25:36 pve kernel: [ 1580.379478] fwbr107i0: port 1(fwln107i0) entered forwarding state
Apr 10 15:25:36 pve kernel: [ 1580.383594] vmbr0: port 3(fwpr107p0) entered blocking state
Apr 10 15:25:36 pve kernel: [ 1580.383596] vmbr0: port 3(fwpr107p0) entered disabled state
Apr 10 15:25:36 pve kernel: [ 1580.383664] device fwpr107p0 entered promiscuous mode
Apr 10 15:25:36 pve kernel: [ 1580.383682] vmbr0: port 3(fwpr107p0) entered blocking state
Apr 10 15:25:36 pve kernel: [ 1580.383683] vmbr0: port 3(fwpr107p0) entered forwarding state
Apr 10 15:25:36 pve kernel: [ 1580.390983] fwbr107i0: port 2(tap107i0) entered blocking state
Apr 10 15:25:36 pve kernel: [ 1580.390985] fwbr107i0: port 2(tap107i0) entered disabled state
Apr 10 15:25:36 pve kernel: [ 1580.391092] fwbr107i0: port 2(tap107i0) entered blocking state
Apr 10 15:25:36 pve kernel: [ 1580.391093] fwbr107i0: port 2(tap107i0) entered forwarding state
Apr 10 15:25:38 pve pvedaemon[1070]: <root@pam> end task UPID:pve:00002065:00026913:5E9073D0:qmstart:107:root@pam: OK

As I just started with Proxmox and due to only basic Linux knowledge I wasn't able to debug further on my own.

Any help appreciated.
 
what looks 'lspci -k' before you start the vm?
maybe prevent loading of the driver for that card?
 
Thanks for your reply @dcsapak. In the meantime I did some further testing and I think I narrowed it down to hardware-related issues regarding my combination of PCIe SATA controller, HDD/SSD and probably SATA cables. Just to summarize how my initial setup looked like and what I tried out:

- Samsung SSD (boot/system drive) connected to mainboard's SATA controller
- WD HDD connected to PCIe SATA controller

This combination always caused Proxmox to freeze when starting the VM.

Then I started from scratch and tried another (old) SSD as my new system drive:
- Transcend SSD connected to PCIe SATA controller and freshly installed Proxmox
- WD HDD connected to the mainboard's SATA controller

After setting up Proxmox and the Openmediavault VM I was able to pass through the internal SATA controller with the attached WD HDD - booting the VM went fine and I was able to mount the HDD in Openmediavault without issues. Also SMART features and spindown (hdparm) worked.

Then I switched back to my Samsung SSD as system drive (due to having more disk space than the Transcend one), now connected to the PCIe controller as the previous Transcend drive and leaving the WD HDD attached to the mainboard's controller. I was able to install Proxmox again but then I observed - depending on which SATA cable I used - that booting stuck with either "Decoding failed - System halted" or kernel panics.

After experimenting with different SATA cables I found one which let the system boot and everything seems to work including pass through, but in syslog after every system boot I find ATA bus errors:

Code:
Apr 18 20:44:42 pve systemd[1]: Starting The Proxmox VE cluster filesystem...
Apr 18 20:44:42 pve kernel: [    7.398508] ata5.00: exception Emask 0x10 SAct 0x100000 SErr 0x400000 action 0x6 frozen
Apr 18 20:44:42 pve kernel: [    7.398538] ata5.00: irq_stat 0x08000000, interface fatal error
Apr 18 20:44:42 pve kernel: [    7.398557] ata5: SError: { Handshk }
Apr 18 20:44:42 pve kernel: [    7.398572] ata5.00: failed command: WRITE FPDMA QUEUED
Apr 18 20:44:42 pve kernel: [    7.398592] ata5.00: cmd 61/88:a0:00:48:94/01:00:00:00:00/40 tag 20 ncq dma 200704 out
Apr 18 20:44:42 pve kernel: [    7.398592]          res 40/00:a0:00:48:94/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Apr 18 20:44:42 pve kernel: [    7.398635] ata5.00: status: { DRDY }
Apr 18 20:44:42 pve kernel: [    7.398650] ata5: hard resetting link
Apr 18 20:44:42 pve kernel: [    7.422626] vmbr0: port 1(enp1s0) entered disabled state
Apr 18 20:44:43 pve kernel: [    7.870629] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 18 20:44:43 pve kernel: [    7.871051] ata5.00: supports DRM functions and may not be fully accessible
Apr 18 20:44:43 pve kernel: [    7.874873] ata5.00: supports DRM functions and may not be fully accessible
Apr 18 20:44:43 pve kernel: [    7.877620] ata5.00: configured for UDMA/133
Apr 18 20:44:43 pve kernel: [    7.877644] ata5: EH complete
Apr 18 20:44:43 pve kernel: [    7.877691] ata5.00: Enabling discard_zeroes_data
Apr 18 20:44:43 pve kernel: [    7.906551] ata5: limiting SATA link speed to 3.0 Gbps
Apr 18 20:44:43 pve kernel: [    7.906553] ata5.00: exception Emask 0x10 SAct 0x3a00000 SErr 0x400000 action 0x6 frozen
Apr 18 20:44:43 pve kernel: [    7.906600] ata5.00: irq_stat 0x08000000, interface fatal error
Apr 18 20:44:43 pve kernel: [    7.906620] ata5: SError: { Handshk }
Apr 18 20:44:43 pve kernel: [    7.906635] ata5.00: failed command: READ FPDMA QUEUED
Apr 18 20:44:43 pve kernel: [    7.906655] ata5.00: cmd 60/08:a8:b0:11:51/00:00:04:00:00/40 tag 21 ncq dma 4096 in
Apr 18 20:44:43 pve kernel: [    7.906655]          res 40/00:c8:00:48:94/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Apr 18 20:44:43 pve kernel: [    7.906697] ata5.00: status: { DRDY }
Apr 18 20:44:43 pve kernel: [    7.906711] ata5.00: failed command: READ FPDMA QUEUED
Apr 18 20:44:43 pve kernel: [    7.906730] ata5.00: cmd 60/20:b8:00:14:94/00:00:02:00:00/40 tag 23 ncq dma 16384 in
Apr 18 20:44:43 pve kernel: [    7.906730]          res 40/00:c8:00:48:94/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Apr 18 20:44:43 pve kernel: [    7.906788] ata5.00: status: { DRDY }
Apr 18 20:44:43 pve kernel: [    7.906802] ata5.00: failed command: READ FPDMA QUEUED
Apr 18 20:44:43 pve kernel: [    7.906821] ata5.00: cmd 60/20:c0:00:40:94/00:00:02:00:00/40 tag 24 ncq dma 16384 in
Apr 18 20:44:43 pve kernel: [    7.906821]          res 40/00:c8:00:48:94/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Apr 18 20:44:43 pve kernel: [    7.906863] ata5.00: status: { DRDY }
Apr 18 20:44:43 pve kernel: [    7.906877] ata5.00: failed command: WRITE FPDMA QUEUED
Apr 18 20:44:43 pve kernel: [    7.906897] ata5.00: cmd 61/88:c8:00:48:94/01:00:00:00:00/40 tag 25 ncq dma 200704 out
Apr 18 20:44:43 pve kernel: [    7.906897]          res 40/00:c8:00:48:94/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Apr 18 20:44:43 pve kernel: [    7.906939] ata5.00: status: { DRDY }

It's not clear to me what causes these errors, but it is likely not a Proxmox issue. Some Googling suggested trying out better cables, avoid bending, getting a new SSD or mainboard.

My conclusion so far is that my WD HDD and Samsung SSD have compatibility issues with the PCIe SATA controller. Maybe upgrading the Samsung SSD firmware helps, but I'm not too optimistic about that. Trying out another PCIe controller might also be an option. Any recommendations?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!