MOBO: Aorus b550i ax Wifi
CPU: Ryzen 7 3700x
GPU : 2x 2080 HP OEM One with a screen(HDMI) and one with a dummy HDMI plugged on it
I got inspired by a friend to set up all my server and homelab needs within a proxmox environment, and thought I'd follow his foodsteps. Most of the times when I'm not home I'm running an ubuntu with cocalc, and possibly with other data science tools next, so I can just access them over the network. However in some cases when I need to edit videos or photos, or (mostly during the breaks) game a bit, and for that I sadly need Windows.
I have 2 2080s, NVlinked for the pooled memory. It works perfectly with CoCalc.
Ideally I'd have the CoCalc Ubuntu and the Windows switch back and forth therefore not causing a collision with the cards and the few extra services I'd run were in a third and fourth VM (NAS, Nginx, mail, own website etc).
I've installed proxmox 7.4, and followed a guide by my friend which is of the following :
edit /etc/default/grub
edit /etc/modules
edit /etc/modprobe.d/pve-blacklist.conf
run update-grub
After this I've created a VM with q35, and OVMF enabled, efi and tmp disabled, pulled in my old windows install and my old ubuntu install, and the following is happening :
My 2 cards are
09:00.0 and
08:00.0 and by default they are added with pcie enabled, all functions, and ROM-Bar options
If I try to add the 08:00.0 2080, then both VM-s boot up fine but no GPU is detected
When only the 09:00.0 is added, then that gpu is detected by nvidia-smi, and Nvidia Control panel respectively.
When both of them are enabled on a given VM then black screen
If I try to enable the primary gpu for either one of them, I cannot connect to the VMC.
Last time I tried to boot up the windows machine with both gpu-s given to the VM, I got the following dmesg:
I've also found a tutorial (here) from where I've adopted a few extra lines with no luck
Added extra parameters to
Edit the module file
VFIO = Virtual Function I/O
nano /etc/modules
Add these lines
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
IOMMU remapping (some systems are not good at mapping the IOMMU, this will help)
nano /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1
nano /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1
Adding GPU to VFIO
lspci -v
Look for your GPU and take note of the first set of numbers this is your PCI card address.
Then run this command
lspci -n -s (PCI card address)
This command gives use the GPU vendors number.Use those numbers in this command
nano /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1e82,10de:10f8 disable_vga=1 #HERE I couldnt follow it as both my GPU's had the same ID
Run this command to update everything
update-initramfs -u
It is very likely that I've given it a few too many extra commands, and at the same time I might be missing something. I do not need a physical display for the proxmox as long as I have the webserver, and I can still SSH on it. From what I'm understanding, especially due to the BAR message, it is the host that is not releasing the GPU properly, and hence I'd get an addressing collision(?). My friend was working on it for quite some time although he was using a single 3060 on an intel platform without discrete gpu.
However it is already very fun thing to work with, but I don't know what else should I check. I've found a few clues in this forum, but now I'm trying to narrow it down whether its a platform specific issue(ryzen+dual nvidia card) or a settings issue.
CPU: Ryzen 7 3700x
GPU : 2x 2080 HP OEM One with a screen(HDMI) and one with a dummy HDMI plugged on it
I got inspired by a friend to set up all my server and homelab needs within a proxmox environment, and thought I'd follow his foodsteps. Most of the times when I'm not home I'm running an ubuntu with cocalc, and possibly with other data science tools next, so I can just access them over the network. However in some cases when I need to edit videos or photos, or (mostly during the breaks) game a bit, and for that I sadly need Windows.
I have 2 2080s, NVlinked for the pooled memory. It works perfectly with CoCalc.
Ideally I'd have the CoCalc Ubuntu and the Windows switch back and forth therefore not causing a collision with the cards and the few extra services I'd run were in a third and fourth VM (NAS, Nginx, mail, own website etc).
I've installed proxmox 7.4, and followed a guide by my friend which is of the following :
edit /etc/default/grub
#to enable PCI passthrough
intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction
#nofb nomodeset video=vesafb:off video=efifb:off
nofb nomodeset video=vesafb:off video=efifb:off
edit /etc/modules
vfiovfio_iommu_type1vfio_pcivfio_virqfd
edit /etc/modprobe.d/pve-blacklist.conf
blacklist nvidiafbblacklist nvidiablacklist radeonblacklist nouveau
run update-grub
After this I've created a VM with q35, and OVMF enabled, efi and tmp disabled, pulled in my old windows install and my old ubuntu install, and the following is happening :
My 2 cards are
09:00.0 and
08:00.0 and by default they are added with pcie enabled, all functions, and ROM-Bar options
If I try to add the 08:00.0 2080, then both VM-s boot up fine but no GPU is detected
When only the 09:00.0 is added, then that gpu is detected by nvidia-smi, and Nvidia Control panel respectively.
When both of them are enabled on a given VM then black screen
If I try to enable the primary gpu for either one of them, I cannot connect to the VMC.
Last time I tried to boot up the windows machine with both gpu-s given to the VM, I got the following dmesg:
Code:
[ 1025.585814] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[ 1025.585842] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 1025.587176] vfio-pci 0000:08:00.0: BAR 3: can't reserve [mem 0xe0000000-0xe1ffffff 64bit pref]
[ 1025.587378] vfio-pci 0000:08:00.0: No more image in the PCI ROM
[ 1025.753804] vfio-pci 0000:09:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[ 1025.753831] vfio-pci 0000:09:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 1030.676765] vfio-pci 0000:08:00.0: No more image in the PCI ROM
[ 1030.676789] vfio-pci 0000:08:00.0: No more image in the PCI ROM
I've also found a tutorial (here) from where I've adopted a few extra lines with no luck
Added extra parameters to
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off,efifb:off"
Edit the module file
VFIO = Virtual Function I/O
nano /etc/modules
Add these lines
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
IOMMU remapping (some systems are not good at mapping the IOMMU, this will help)
nano /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1
nano /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1
Adding GPU to VFIO
lspci -v
Look for your GPU and take note of the first set of numbers this is your PCI card address.
Then run this command
lspci -n -s (PCI card address)
This command gives use the GPU vendors number.Use those numbers in this command
nano /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1e82,10de:10f8 disable_vga=1 #HERE I couldnt follow it as both my GPU's had the same ID
Run this command to update everything
update-initramfs -u
It is very likely that I've given it a few too many extra commands, and at the same time I might be missing something. I do not need a physical display for the proxmox as long as I have the webserver, and I can still SSH on it. From what I'm understanding, especially due to the BAR message, it is the host that is not releasing the GPU properly, and hence I'd get an addressing collision(?). My friend was working on it for quite some time although he was using a single 3060 on an intel platform without discrete gpu.
However it is already very fun thing to work with, but I don't know what else should I check. I've found a few clues in this forum, but now I'm trying to narrow it down whether its a platform specific issue(ryzen+dual nvidia card) or a settings issue.
Last edited: