Sorry for the long title, but the behaviour is a bit confusing so I said it in full.
My spec is:
Threadripper 1950X
AMD X399 Taichi
AMD Radeon Vii
I clean installed Proxmox 6.2 and then also clean installed ubuntu server 20.04.1. I was trying to pass the GPU from Proxmox to the ubuntu guest. I followed the wiki and got the remapping to generate output as described.
By assigning the GPU from host to the guest, I was able to boot the guest. From within the guest, I am able to see the information of the AMD GPU I passed. I thought I was successful there. Then I went ahead to install the AMD ROCm for the GPU for computing. When doing so, I needed to reboot the guest a couple time.
I realized that upon rebooting the guest, while it could shutdown, it will never start again. I tried to manually stop it from Proxmox, it does stop after some time, but then trying to start it will not work. Going back to Proxmox's shell, I typed lspci -vvv, and I found that Proxmox can no longer detect the GPU anymore:
Before booting the guest, Proxmox could see the GPU. Even when the guest is still running, Proxmox could display the GPU information correctly:
Any ideas will be appreciated.
My spec is:
Threadripper 1950X
AMD X399 Taichi
AMD Radeon Vii
I clean installed Proxmox 6.2 and then also clean installed ubuntu server 20.04.1. I was trying to pass the GPU from Proxmox to the ubuntu guest. I followed the wiki and got the remapping to generate output as described.
By assigning the GPU from host to the guest, I was able to boot the guest. From within the guest, I am able to see the information of the AMD GPU I passed. I thought I was successful there. Then I went ahead to install the AMD ROCm for the GPU for computing. When doing so, I needed to reboot the guest a couple time.
I realized that upon rebooting the guest, while it could shutdown, it will never start again. I tried to manually stop it from Proxmox, it does stop after some time, but then trying to start it will not work. Going back to Proxmox's shell, I typed lspci -vvv, and I found that Proxmox can no longer detect the GPU anymore:
Code:
44:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon VII] (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: vfio-pci
Kernel modules: amdgpu
44:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 HDMI Audio [Radeon VII] (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
Before booting the guest, Proxmox could see the GPU. Even when the guest is still running, Proxmox could display the GPU information correctly:
Code:
44:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon VII] (rev c1) (prog-if 00 [VGA controller])
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon VII]
Flags: bus master, fast devsel, latency 0, IRQ 255
Memory at 7fce0000000 (64-bit, prefetchable) [size=256M]
Memory at 7fcf0000000 (64-bit, prefetchable) [size=2M]
I/O ports at 4000 [disabled] [size=256]
Memory at 82300000 (32-bit, non-prefetchable) [size=512K]
Expansion ROM at 82380000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] #15
Capabilities: [270] #19
Capabilities: [2a0] Access Control Services
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [320] Latency Tolerance Reporting
Kernel driver in use: vfio-pci
Kernel modules: amdgpu
Any ideas will be appreciated.
Last edited: