PCIE Passthrough of USB chipset is not stable

gouthamravee

Well-Known Member
May 16, 2019
31
7
48
Hello!
Motherboard: GA-Z77X-UD3H
CPU: E3-1275 V2


I've successfully passthroughed other PCIE devices on this host and to this VM and they are absolutely stable.
I want to also passthrough the embedded VIA USB controller, so that a couple of the USB ports on the back which I use for external drives can be directly attached to the only VM that uses them. The devices are USB external drive cases, I use them to test and copy data to new drives for my storage server.
The chipset shows up as VL80x xHCI USB 3.0 Controller

The problem is when I do manage to pass through the device, its not stable. It will stay attached to the VM for a few hours, maybe a day or two. Then out of nowhere the whole VM will stop responding or behave erratically. Restarting the VM leads to a black screen instead of the Proxmox logo, this persists until I remove the PCIE passthrough.

I was thinking something on the host was trying to take over the chipset, so I did what I believe is the correct method to prevent the host from accessing the controller.
I created a file called usb-ports.conf in the directory /etc/modprobe.d
The contents of that file are

Code:
options vfio-pci ids=1106:3432,1458:5007

The IDs are what I found when looking up the pci devices using lspci -nnk
Code:
04:00.0 USB controller [0c03]: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller [1106:3432] (rev 03)
        Subsystem: Gigabyte Technology Co., Ltd VL80x xHCI USB 3.0 Controller [1458:5007]
        Kernel driver in use: vfio-pci
        Kernel modules: xhci_pci

The chipset is on its own IOMMU group and doesn't conflict with others from what I can see.
Bash:
IOMMU Group 1 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v2/Ivy Bridge DRAM Controller [8086:0158] (rev 09)
IOMMU Group 2 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port [8086:0151] (rev 09)
IOMMU Group 2 00:01.1 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port [8086:0155] (rev 09)
IOMMU Group 2 01:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
IOMMU Group 2 02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
IOMMU Group 3 00:14.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller [8086:1e31] (rev 04)
IOMMU Group 4 00:16.0 Communication controller [0780]: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 [8086:1e3a] (rev 04)
IOMMU Group 5 00:1a.0 USB controller [0c03]: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 [8086:1e2d] (rev 04)
IOMMU Group 6 00:1c.0 PCI bridge [0604]: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 [8086:1e10] (rev c4)
IOMMU Group 7 00:1c.4 PCI bridge [0604]: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 5 [8086:1e18] (rev c4)
IOMMU Group 8 00:1c.5 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev c4)
IOMMU Group 8 05:00.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev 30)
IOMMU Group 9 00:1c.6 PCI bridge [0604]: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 7 [8086:1e1c] (rev c4)
IOMMU Group 10 00:1c.7 PCI bridge [0604]: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 8 [8086:1e1e] (rev c4)
IOMMU Group 11 00:1d.0 USB controller [0c03]: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 [8086:1e26] (rev 04)
IOMMU Group 12 00:1f.0 ISA bridge [0601]: Intel Corporation Z77 Express Chipset LPC Controller [8086:1e44] (rev 04)
IOMMU Group 12 00:1f.2 SATA controller [0106]: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] [8086:1e02] (rev 04)
IOMMU Group 12 00:1f.3 SMBus [0c05]: Intel Corporation 7 Series/C216 Chipset Family SMBus Controller [8086:1e22] (rev 04)
IOMMU Group 13 03:00.0 Ethernet controller [0200]: Intel Corporation 82580 Gigabit Network Connection [8086:150e] (rev 01)
IOMMU Group 14 03:00.1 Ethernet controller [0200]: Intel Corporation 82580 Gigabit Network Connection [8086:150e] (rev 01)
IOMMU Group 15 03:00.2 Ethernet controller [0200]: Intel Corporation 82580 Gigabit Network Connection [8086:150e] (rev 01)
IOMMU Group 16 03:00.3 Ethernet controller [0200]: Intel Corporation 82580 Gigabit Network Connection [8086:150e] (rev 01)
IOMMU Group 17 04:00.0 USB controller [0c03]: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller [1106:3432] (rev 03)
IOMMU Group 18 07:00.0 Ethernet controller [0200]: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet [1969:1083] (rev c0)
IOMMU Group 19 08:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller [1b4b:9172] (rev 11)



What am I missing here?
Thank you!
 
Last edited:
Make sure that vfio-pci is loaded before xhci_pci by adding softdep xhci_pci pre: vfio-pci to /etc/modprobe.d/usb-ports.conf, otherwise vfio-pci might not be able to bind the device before xhci_pci touches it. Binding devices early to vfio-pci typically makes a device work more stable inside a VM.

PCI(e) device don't always reset properly, even when they advertise Function Level Reset (flr). This prevents them from working a second time after a VM is shut down, until the whole Proxmox host is restarted. Sometimes VMs won't even boot because the device takes forever to reset. This might be happening to you when you try to restart the VM. There are probably messages in journalctl (scroll to the right time using the arrow keys).

I don't know why the device stops working after a while. It might be a firmware issue, hardware issue, temperature issue. Any messages in the VM logs or Proxmox host journalctl about it maybe?
I think your device is not known for working well with passthrough (a niche that manufacturers don't design nor test for) or even known to not work very well. I know that it's hard to make sure that the USB controller that you buy is the exact same chip and version that worked well for someone else's passthrough...
 
  • Like
Reactions: gouthamravee
Make sure that vfio-pci is loaded before xhci_pci by adding softdep xhci_pci pre: vfio-pci to /etc/modprobe.d/usb-ports.conf, otherwise vfio-pci might not be able to bind the device before xhci_pci touches it. Binding devices early to vfio-pci typically makes a device work more stable inside a VM.

PCI(e) device don't always reset properly, even when they advertise Function Level Reset (flr). This prevents them from working a second time after a VM is shut down, until the whole Proxmox host is restarted. Sometimes VMs won't even boot because the device takes forever to reset. This might be happening to you when you try to restart the VM. There are probably messages in journalctl (scroll to the right time using the arrow keys).

I don't know why the device stops working after a while. It might be a firmware issue, hardware issue, temperature issue. Any messages in the VM logs or Proxmox host journalctl about it maybe?
I think your device is not known for working well with passthrough (a niche that manufacturers don't design nor test for) or even known to not work very well. I know that it's hard to make sure that the USB controller that you buy is the exact same chip and version that worked well for someone else's passthrough...


I saw something about softdep xhci_pci pre: vfio-pci on another post, but didn't understand where it went. Thank you for clarifying that. I just updated the conf file with that and restarted the host. Seems to be picking up the chipset as expected now, we'll see how long it stays connected.

I agree and do realize there could be more problems not related to proxmox here, could be the chipset, could be the external drive enclosures.

They gave me issues even when I had USB pass through, I have 2 connected and sometimes the host won't boot unless both are turned off.
I will definitely check journalctl, the last few times this happened I was in a rush and don't remember if I checked the system journal.
 
Welp, no luck. This time it didn't cause the VM to stop functioning, but I was doing a badblocks scan on one of the drives attached through USB and after 17 hours the process failed.

The only warning or error I see in the logs of the VM is
Code:
reset SuperSpeed USB device number 3 using xhci_hcd

I don't see any sort of error or warning related to USB or the passthroughed chipset in either the Host or VM journalctl logs.

Any idea what else I could check?
 
Wouldn't it be easier to just use the native USB passthrough instead of PCIe passthrough?
That does not work well for high bandwidth or low latency workloads. It might be worth to try and see how it works. Maybe it's a bit slower but more stable? However, in my experience it uses a low of CPU and is not suitable for most except input devices. Maybe usbip might work, which comes with Debian (and therefore Proxmox)?
 
  • Like
Reactions: gouthamravee
That does not work well for high bandwidth or low latency workloads. It might be worth to try and see how it works. Maybe it's a bit slower but more stable? However, in my experience it uses a low of CPU and is not suitable for most except input devices. Maybe usbip might work, which comes with Debian (and therefore Proxmox)?

Ah but easy is no fun!
So from what I understand, USB passthrough works but adds significant overhead. Passing the chipset through would reduce that overhead. I need this because I've started putting in refurbished 8TB drives into my NAS and I do a full check of these using badblocks. I then usually need to copy over all the content on the drive that's being replaced. Both of these tasks see significant speedups when the chipset is pass instead of the USB.
 
In one of my personal labs I use this to passthrough a USB drive to a PBS VM and get backups of an old NAS. It just works and maxes out the NAS performance, so never looked about overhead. Will do!

I'm still to have "fun" with PCIe passthrough as my limited experience with it is trying to passthrough the embeded graphics card of an Hades Canyon NUC. No success yet. I'm still to find the need of passthrough in any of my production clusters.
 
  • Like
Reactions: gouthamravee
I think my problem might be on the external enclosure hardware, even USB passthrough wasn't stable over a long period of time for me.
 
  • Like
Reactions: leesteken
Sorry for waking an old post up but... are you sure you're not passing an pcie root port/switch or something?
I'm also passing an usb controller directly to VM - no issues whatsoever, but did a loooot of checking before passing it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!