PCIe passthrough issue with PCH (W680) connected device

hans66

New Member
Oct 14, 2023
7
0
1
Hi All,
  • Motherboard: Supermicro X13SAE-F
  • Proxmox 8.1.3
I want to passthrough a PCIEe device to Debian VM.
intel_iommu=on iommu=pt options set for kernel.

All works fine if the PCIe device (Google Coral/TPU) is inserted in PCIe slot connected to CPU.
If I insert the TPU in a PCIe slot connected to the PCH/Southbridge/Intel W680, the Debian VM does
not start; cannot connect to console, with error "Failed to run vncproxy"

I want to use the PCH slot, as they are x4 slots, I need to the x8 or x16 slot for other PCIe devices.
In case I use PCH slot, I get dmesg/kernel errors (see end of post)
Any clues?

Code:
Jan 23 21:48:06 pve kernel: [ 2769.817595] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:48:07 pve kernel: [ 2770.825603] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:48:08 pve kernel: [ 2772.081612] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:48:09 pve kernel: [ 2773.081493] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:48:09 pve kernel: [ 2773.081501] vfio-pci 0000:0a:00.0: not ready 1023ms after resume; waiting
Jan 23 21:48:10 pve kernel: [ 2774.129515] vfio-pci 0000:0a:00.0: not ready 2047ms after resume; waiting
Jan 23 21:48:12 pve kernel: [ 2776.245497] vfio-pci 0000:0a:00.0: not ready 4095ms after resume; waiting
Jan 23 21:48:17 pve kernel: [ 2780.593461] vfio-pci 0000:0a:00.0: not ready 8191ms after resume; waiting
Jan 23 21:48:25 pve kernel: [ 2789.041441] vfio-pci 0000:0a:00.0: not ready 16383ms after resume; waiting
Jan 23 21:48:43 pve kernel: [ 2806.961472] vfio-pci 0000:0a:00.0: not ready 32767ms after resume; waiting
Jan 23 21:49:18 pve kernel: [ 2841.777209] vfio-pci 0000:0a:00.0: not ready 65535ms after resume; giving up
Jan 23 21:49:18 pve kernel: [ 2841.777225] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 23 21:49:18 pve kernel: [ 2841.777230] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 23 21:49:18 pve kernel: [ 2841.777318] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 23 21:49:19 pve kernel: [ 2842.825207] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:49:20 pve kernel: [ 2843.833200] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:49:21 pve kernel: [ 2845.073237] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:49:22 pve kernel: [ 2846.073281] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:49:22 pve kernel: [ 2846.073302] vfio-pci 0000:0a:00.0: not ready 1023ms after bus reset; waiting
Jan 23 21:49:23 pve kernel: [ 2847.121216] vfio-pci 0000:0a:00.0: not ready 2047ms after bus reset; waiting
Jan 23 21:49:25 pve kernel: [ 2849.201268] vfio-pci 0000:0a:00.0: not ready 4095ms after bus reset; waiting
Jan 23 21:49:30 pve kernel: [ 2853.553177] vfio-pci 0000:0a:00.0: not ready 8191ms after bus reset; waiting
Jan 23 21:49:38 pve kernel: [ 2862.001219] vfio-pci 0000:0a:00.0: not ready 16383ms after bus reset; waiting
Jan 23 21:49:55 pve kernel: [ 2878.641056] vfio-pci 0000:0a:00.0: not ready 32767ms after bus reset; waiting
Jan 23 21:50:29 pve kernel: [ 2913.456988] vfio-pci 0000:0a:00.0: not ready 65535ms after bus reset; giving up
Jan 23 21:50:29 pve kernel: [ 2913.457141] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
 
  • Motherboard: Supermicro X13SAE-F
All works fine if the PCIe device (Google Coral/TPU) is inserted in PCIe slot connected to CPU.

In case I use PCH slot, I get dmesg/kernel errors:
Code:
Jan 23 21:48:06 pve kernel: [ 2769.817595] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:48:07 pve kernel: [ 2770.825603] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:48:08 pve kernel: [ 2772.081612] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:48:09 pve kernel: [ 2773.081493] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:48:09 pve kernel: [ 2773.081501] vfio-pci 0000:0a:00.0: not ready 1023ms after resume; waiting
Jan 23 21:48:10 pve kernel: [ 2774.129515] vfio-pci 0000:0a:00.0: not ready 2047ms after resume; waiting
Jan 23 21:48:12 pve kernel: [ 2776.245497] vfio-pci 0000:0a:00.0: not ready 4095ms after resume; waiting
Jan 23 21:48:17 pve kernel: [ 2780.593461] vfio-pci 0000:0a:00.0: not ready 8191ms after resume; waiting
Jan 23 21:48:25 pve kernel: [ 2789.041441] vfio-pci 0000:0a:00.0: not ready 16383ms after resume; waiting
Jan 23 21:48:43 pve kernel: [ 2806.961472] vfio-pci 0000:0a:00.0: not ready 32767ms after resume; waiting
Jan 23 21:49:18 pve kernel: [ 2841.777209] vfio-pci 0000:0a:00.0: not ready 65535ms after resume; giving up
Jan 23 21:49:18 pve kernel: [ 2841.777225] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 23 21:49:18 pve kernel: [ 2841.777230] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 23 21:49:18 pve kernel: [ 2841.777318] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 23 21:49:19 pve kernel: [ 2842.825207] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:49:20 pve kernel: [ 2843.833200] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:49:21 pve kernel: [ 2845.073237] pcieport 0000:00:1d.0: broken device, retraining non-functional downstream link at 2.5GT/s
Jan 23 21:49:22 pve kernel: [ 2846.073281] pcieport 0000:00:1d.0: retraining failed
Jan 23 21:49:22 pve kernel: [ 2846.073302] vfio-pci 0000:0a:00.0: not ready 1023ms after bus reset; waiting
Jan 23 21:49:23 pve kernel: [ 2847.121216] vfio-pci 0000:0a:00.0: not ready 2047ms after bus reset; waiting
Jan 23 21:49:25 pve kernel: [ 2849.201268] vfio-pci 0000:0a:00.0: not ready 4095ms after bus reset; waiting
Jan 23 21:49:30 pve kernel: [ 2853.553177] vfio-pci 0000:0a:00.0: not ready 8191ms after bus reset; waiting
Jan 23 21:49:38 pve kernel: [ 2862.001219] vfio-pci 0000:0a:00.0: not ready 16383ms after bus reset; waiting
Jan 23 21:49:55 pve kernel: [ 2878.641056] vfio-pci 0000:0a:00.0: not ready 32767ms after bus reset; waiting
Jan 23 21:50:29 pve kernel: [ 2913.456988] vfio-pci 0000:0a:00.0: not ready 65535ms after bus reset; giving up
Jan 23 21:50:29 pve kernel: [ 2913.457141] vfio-pci 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Sounds like a motherboard PCIe layout/chip or BIOS/firmware issue. Maybe ask Supermicro for support or a newer BIOS? I expect that you'll have to live with it, and swap with another PCIe slot or use a different motherboard.
 
I've been looking at this board as a potential upgrade for a while now, so my question is:
Has the problem been solved in the meantime?
I would be very grateful for an answer.
 
I've been looking at this board as a potential upgrade for a while now, so my question is:
Has the problem been solved in the meantime?
I would be very grateful for an answer.
I know that there is a new BIOS. I am going to update BIOS in next weeks, and will give it a try. Can take a few weeks though....
 
I have upgraded to latest BIOS (3.1) and latest Proxmox....still doesn't work. Not sure if it Coral TPU related or also for other PCIe devices.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!