PCIe passthrough issue with PCH (W680) connected device

hans66

New Member
Oct 14, 2023
7
0
1
Hi All,
  • Motherboard: Supermicro X13SAE-F
  • Proxmox 8.1.3
I want to passthrough a PCIEe device to Debian VM.
intel_iommu=on iommu=pt options set for kernel.

All works fine if the PCIe device (Google Coral/TPU) is inserted in PCIe slot connected to CPU.
If I insert the TPU in a PCIe slot connected to the PCH/Southbridge/Intel W680, the Debian VM does
not start; cannot connect to console, with error "Failed to run vncproxy"

I want to use the PCH slot, as they are x4 slots, I need to the x8 or x16 slot for other PCIe devices.
In case I use PCH slot, I get dmesg/kernel errors (see end of post)
Any clues?

Code:
Jan 23 21:48:06 pve kernel: [ 2769.817595] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:48:07 pve kernel: [ 2770.825603] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:48:08 pve kernel: [ 2772.081612] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:48:09 pve kernel: [ 2773.081493] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:48:09 pve kernel: [ 2773.081501] vfio-pci 0000:0a:00.0: not ready 1023ms after resume; waiting
Jan 23 21:48:10 pve kernel: [ 2774.129515] vfio-pci 0000:0a:00.0: not ready 2047ms after resume; waiting
Jan 23 21:48:12 pve kernel: [ 2776.245497] vfio-pci 0000:0a:00.0: not ready 4095ms after resume; waiting
Jan 23 21:48:17 pve kernel: [ 2780.593461] vfio-pci 0000:0a:00.0: not ready 8191ms after resume; waiting
Jan 23 21:48:25 pve kernel: [ 2789.041441] vfio-pci 0000:0a:00.0: not ready 16383ms after resume; waiting
Jan 23 21:48:43 pve kernel: [ 2806.961472] vfio-pci 0000:0a:00.0: not ready 32767ms after resume; waiting
Jan 23 21:49:18 pve kernel: [ 2841.777209] vfio-pci 0000:0a:00.0: not ready 65535ms after resume; giving up
Jan 23 21:49:18 pve kernel: [ 2841.777225] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 23 21:49:18 pve kernel: [ 2841.777230] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 23 21:49:18 pve kernel: [ 2841.777318] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 23 21:49:19 pve kernel: [ 2842.825207] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:49:20 pve kernel: [ 2843.833200] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:49:21 pve kernel: [ 2845.073237] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:49:22 pve kernel: [ 2846.073281] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:49:22 pve kernel: [ 2846.073302] vfio-pci 0000:0a:00.0: not ready 1023ms after bus reset; waiting
Jan 23 21:49:23 pve kernel: [ 2847.121216] vfio-pci 0000:0a:00.0: not ready 2047ms after bus reset; waiting
Jan 23 21:49:25 pve kernel: [ 2849.201268] vfio-pci 0000:0a:00.0: not ready 4095ms after bus reset; waiting
Jan 23 21:49:30 pve kernel: [ 2853.553177] vfio-pci 0000:0a:00.0: not ready 8191ms after bus reset; waiting
Jan 23 21:49:38 pve kernel: [ 2862.001219] vfio-pci 0000:0a:00.0: not ready 16383ms after bus reset; waiting
Jan 23 21:49:55 pve kernel: [ 2878.641056] vfio-pci 0000:0a:00.0: not ready 32767ms after bus reset; waiting
Jan 23 21:50:29 pve kernel: [ 2913.456988] vfio-pci 0000:0a:00.0: not ready 65535ms after bus reset; giving up
Jan 23 21:50:29 pve kernel: [ 2913.457141] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
 
  • Motherboard: Supermicro X13SAE-F
All works fine if the PCIe device (Google Coral/TPU) is inserted in PCIe slot connected to CPU.

In case I use PCH slot, I get dmesg/kernel errors:
Code:
Jan 23 21:48:06 pve kernel: [ 2769.817595] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:48:07 pve kernel: [ 2770.825603] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:48:08 pve kernel: [ 2772.081612] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:48:09 pve kernel: [ 2773.081493] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:48:09 pve kernel: [ 2773.081501] vfio-pci 0000:0a:00.0: not ready 1023ms after resume; waiting
Jan 23 21:48:10 pve kernel: [ 2774.129515] vfio-pci 0000:0a:00.0: not ready 2047ms after resume; waiting
Jan 23 21:48:12 pve kernel: [ 2776.245497] vfio-pci 0000:0a:00.0: not ready 4095ms after resume; waiting
Jan 23 21:48:17 pve kernel: [ 2780.593461] vfio-pci 0000:0a:00.0: not ready 8191ms after resume; waiting
Jan 23 21:48:25 pve kernel: [ 2789.041441] vfio-pci 0000:0a:00.0: not ready 16383ms after resume; waiting
Jan 23 21:48:43 pve kernel: [ 2806.961472] vfio-pci 0000:0a:00.0: not ready 32767ms after resume; waiting
Jan 23 21:49:18 pve kernel: [ 2841.777209] vfio-pci 0000:0a:00.0: not ready 65535ms after resume; giving up
Jan 23 21:49:18 pve kernel: [ 2841.777225] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 23 21:49:18 pve kernel: [ 2841.777230] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 23 21:49:18 pve kernel: [ 2841.777318] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 23 21:49:19 pve kernel: [ 2842.825207] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:49:20 pve kernel: [ 2843.833200] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:49:21 pve kernel: [ 2845.073237] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:49:22 pve kernel: [ 2846.073281] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:49:22 pve kernel: [ 2846.073302] vfio-pci 0000:0a:00.0: not ready 1023ms after bus reset; waiting
Jan 23 21:49:23 pve kernel: [ 2847.121216] vfio-pci 0000:0a:00.0: not ready 2047ms after bus reset; waiting
Jan 23 21:49:25 pve kernel: [ 2849.201268] vfio-pci 0000:0a:00.0: not ready 4095ms after bus reset; waiting
Jan 23 21:49:30 pve kernel: [ 2853.553177] vfio-pci 0000:0a:00.0: not ready 8191ms after bus reset; waiting
Jan 23 21:49:38 pve kernel: [ 2862.001219] vfio-pci 0000:0a:00.0: not ready 16383ms after bus reset; waiting
Jan 23 21:49:55 pve kernel: [ 2878.641056] vfio-pci 0000:0a:00.0: not ready 32767ms after bus reset; waiting
Jan 23 21:50:29 pve kernel: [ 2913.456988] vfio-pci 0000:0a:00.0: not ready 65535ms after bus reset; giving up
Jan 23 21:50:29 pve kernel: [ 2913.457141] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Sounds like a motherboard PCIe layout/chip or BIOS/firmware issue. Maybe ask Supermicro for support or a newer BIOS? I expect that you'll have to live with it, and swap with another PCIe slot or use a different motherboard.
 
I've been looking at this board as a potential upgrade for a while now, so my question is:
Has the problem been solved in the meantime?
I would be very grateful for an answer.
 
I've been looking at this board as a potential upgrade for a while now, so my question is:
Has the problem been solved in the meantime?
I would be very grateful for an answer.
I know that there is a new BIOS. I am going to update BIOS in next weeks, and will give it a try. Can take a few weeks though....
 
I have upgraded to latest BIOS (3.1) and latest Proxmox....still doesn't work. Not sure if it Coral TPU related or also for other PCIe devices.
 
Any news here? I just upgraded my HW to the same MoBo. I´m trying to passthrough a Broadcom 9500-8i HBA to a TrueNAS Core VM. But I see the same console output as you describe...
 
Last edited: