GPU Passthrough: added intel_iommu=on to AMD CPU node, after I fixed it and finished the steps, I'm getting black screen, no output from VM

dengydongn

New Member
Oct 7, 2022
10
4
3
TL;DR: I added "intel_iommu=on" to an AMD server, update-grub then reboot the host, then I realized it was not right, edited grub again with "amd_iommu=on", update-grub and reboot again, then completed rest of the steps, and I got black screen, I can tell the monitor is on (backlit), but nothing is printed,

Output from dmesg | grep -e DMAR -e IOMMU
(I don't see anything like "DMAR: IOMMU enabled" as mentioned in Wiki and worried the wrong intel_iommu caused some damage here, although I fixed it later)
1665202162728.png

PCI information
1665202238538.png

vfio and blocked drivers
1665202324716.png

VM configuration
1665202055902.png

Is there anything obviously wrong? how can I troubleshoot further? thanks!!
 
Last edited:
TL;DR: I added "intel_iommu=on" to an AMD server, update-grub then reboot the host, then I realized it was not right, edited grub again with "amd_iommu=on", update-grub and reboot again, then completed rest of the steps, and I got black screen, I can tell the monitor is on (backlit), but nothing is printed,
amd_iommu is on by default. amd_iommu=on and intel_iommu=on are ignored on your system.
Output from dmesg | grep -e DMAR -e IOMMU
(I don't see anything like "DMAR: IOMMU enabled" as mentioned in Wiki and worried the wrong intel_iommu caused some damage here, although I fixed it later)
View attachment 42016
This is a test for Intel system and you don't need it because on AMD IOMMU is on by default. What is the output of cat /proc/cmdline?
Please use (inline) code -tags and actual text instead of screenshots.
PCI information
View attachment 42017

vfio and blocked drivers
View attachment 42018
You don't need the blacklist if you make sure vfio-pci is loaded first but it should not hurt. Did you run update-initramfs -u to make those changes active? After a reboot of the system and before starting the VM, what is the output of ls-pci -nnks 0b:00?
VM configuration
View attachment 42015

Is there anything obviously wrong? how can I troubleshoot further? thanks!!
You don't need to passthrough all four functions of the GPU separately. Remove hostpci1, 2 and 3 and enable All Functions on the first.
Is this the only GPU in the system? Is this the GPU that shows the BIOS POST on the screen when starting your system? You might need a work-around.
Any errors in the Task log, Syslog or journalctl when starting the VM?
 
amd_iommu is on by default. amd_iommu=on and intel_iommu=on are ignored on your system.

This is a test for Intel system and you don't need it because on AMD IOMMU is on by default. What is the output of cat /proc/cmdline?
Please use (inline) code -tags and actual text instead of screenshots.

You don't need the blacklist if you make sure vfio-pci is loaded first but it should not hurt. Did you run update-initramfs -u to make those changes active? After a reboot of the system and before starting the VM, what is the output of ls-pci -nnks 0b:00?

You don't need to passthrough all four functions of the GPU separately. Remove hostpci1, 2 and 3 and enable All Functions on the first.
Is this the only GPU in the system? Is this the GPU that shows the BIOS POST on the screen when starting your system? You might need a work-around.
Any errors in the Task log, Syslog or journalctl when starting the VM?

Here's the output from proc/cmdline

Code:
root@3950x:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.15.60-1-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt initcall_blacklist=sysfb_init

Here's the output from lspci

Code:
root@3950x:~# lspci -nnks 0b:00
0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] [10de:1f08] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] TU106 [GeForce RTX 2060 Rev. A] [1462:3752]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau
0b:00.1 Audio device [0403]: NVIDIA Corporation TU106 High Definition Audio Controller [10de:10f9] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] TU106 High Definition Audio Controller [1462:3752]
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel
0b:00.2 USB controller [0c03]: NVIDIA Corporation TU106 USB 3.1 Host Controller [10de:1ada] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] TU106 USB 3.1 Host Controller [1462:3752]
        Kernel driver in use: vfio-pci
        Kernel modules: xhci_pci
0b:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C UCSI Controller [10de:1adb] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] TU106 USB Type-C UCSI Controller [1462:3752]
        Kernel driver in use: vfio-pci
        Kernel modules: i2c_nvidia_gpu

I applied the work around, and this time I can see something on the monitor, e.g. Proxmox Logo, the Ubuntu menu [try / install Ubuntu] and some boot logs, then it becomes black screen again
 
Last edited:
  • Like
Reactions: leesteken
I applied the work around, and this time I can see something on the monitor, e.g. Proxmox Logo, the Ubuntu menu [try / install Ubuntu] and some boot logs, then it becomes black screen again
That means that PCIe passthrough of the GPU in principle works now.
Maybe the Ubuntu open-source drivers (nouveau) don't work well with your GPU? Maybe you need to first install NVidia drivers on the VM? Sorry, but I have no experience with NVidia.
 
That means that PCIe passthrough of the GPU in principle works now.
Maybe the Ubuntu open-source drivers (nouveau) don't work well with your GPU? Maybe you need to first install NVidia drivers on the VM? Sorry, but I have no experience with NVidia.
I unselected the passed through GPU as primary GPU, and restored the default one, so I have dual output on monitor and console, and was able to capture the error before external monitor went out, any idea?

1665244094171.png

Here's a gif of the console output, the external monitor stopped at above screen, I guess you're right, the Ubuntu installer might have trouble with the passed through GPU?

1665244159934.gif
 
I unselected the passed through GPU as primary GPU, and restored the default one, so I have dual output on monitor and console, and was able to capture the error before external monitor went out, any idea?
I think consumer NVidia GPUs require Primary GPU, but as I said, I have no experience with NVidia (because they used to frustrate passthrough).

View attachment 42039

Here's a gif of the console output, the external monitor stopped at above screen, I guess you're right, the Ubuntu installer might have trouble with the passed through GPU?

View attachment 42040
The Ubuntu installer probably uses the open-source nouveau driver and it does not fully support all NVidia GPUs (because of missing documentation and help from NVidia). Did you try the latest version of Ubuntu?
Maybe first install Ubuntu (as a test) and install the NVidia drivers, and only afterwards add the GPU passthrough?
 
My GPU is 2060 and Ubuntu installer is the latest one, but I think what you said makes sense, I'll install using console then add the passthrough and install drivers.
 
I think consumer NVidia GPUs require Primary GPU, but as I said, I have no experience with NVidia (because they used to frustrate passthrough).


The Ubuntu installer probably uses the open-source nouveau driver and it does not fully support all NVidia GPUs (because of missing documentation and help from NVidia). Did you try the latest version of Ubuntu?
Maybe first install Ubuntu (as a test) and install the NVidia drivers, and only afterwards add the GPU passthrough?

It's working! after I installed Ubuntu Desktop on the VM with the virtual display, added PCIE passthrough then installed NVidia drivers, mark it as primary GPU then reboot, now I have the VM on GPU/Keyboard passthrough and it works perfectly!!

The reason I needed this is to run the checkmk dashboard to show realtime status of my servers on a spare monitor, I wanted to use RPi4 but hell it needs a macro HDMI cable that I don't have, so I figured I'd use the GPU on my server and it finally worked!!!

See for yourself!!

1665254393963.png
 
  • Like
Reactions: leesteken