Issues with multiple GPU pass through when adding to vm's

alcatail

New Member
Apr 8, 2021
4
0
1
39
I followed the guide https://pve.proxmox.com/wiki/Pci_passthrough with success and can map a gtx 1060 to a windows 10 machine successfully. The issue I have is that I have 2 duplicate gtx 1060's and depending on which is installed in the highest slot dictates which card will work. I was able to see both cards work one at a time (one works one fails) by swapping them up and down the pci slots. Can only get one card to work period but proxmox does see both cards and the slot references are correct.

There error that presents with the lower slot number is
Stopped : Start failed qemu exited with code 1
it lists the pid process.

could some one shed some light with any ideas?
I have been trying to work it out for 2 days but can't find it in the threads and google turning over nothing helpful.

It is not the dreaded error 43 ,the card when it shows up works perfectly, it is like the process sees both cards as one but across two slots and the lower slot number is the one that it allows to boot in the vm.

I have built an awesome gaming rig in a vm, the chip is an intel i7 10700K on a msi z490 motherboard and running 16 gigs of ram.
the idea is to split the cards up across different vm's. Load wise the 10700k eats it up and is playing forza 4 flawless and I mine on another vm and a freenas. So far I have the miner and the game rig vm's built and I am aware that the video cards can only be used by one vm at a time.

The miner when running uses very little resorce.

One thing to note with the gaming rig is when the sound was crackling I threw a heap of cpu at it and it fixed all issues, you wouldn't know the game was inside a vm.

edit: using proxmox 6.3

any help appreciated,

kind regards

Mick


edit in reply to Ramalama

1. for a in /sys/kernel/iommu_groups/*; do find $a -type l; done | sort --version-sort

/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:02.0
/sys/kernel/iommu_groups/2/devices/0000:00:12.0
/sys/kernel/iommu_groups/3/devices/0000:00:14.0
/sys/kernel/iommu_groups/3/devices/0000:00:14.2
/sys/kernel/iommu_groups/4/devices/0000:00:16.0
/sys/kernel/iommu_groups/5/devices/0000:00:17.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.3
/sys/kernel/iommu_groups/6/devices/0000:00:1c.4
/sys/kernel/iommu_groups/6/devices/0000:01:00.0
/sys/kernel/iommu_groups/6/devices/0000:01:00.1
/sys/kernel/iommu_groups/6/devices/0000:03:00.0
/sys/kernel/iommu_groups/6/devices/0000:03:00.1
/sys/kernel/iommu_groups/7/devices/0000:00:1f.0
/sys/kernel/iommu_groups/7/devices/0000:00:1f.3
/sys/kernel/iommu_groups/7/devices/0000:00:1f.4
/sys/kernel/iommu_groups/7/devices/0000:00:1f.5
/sys/kernel/iommu_groups/7/devices/0000:00:1f.6


2. edit's so far >>> Have followed the guide's stock standard and have the card working flawlessly one up >>https://pve.proxmox.com/wiki/Pci_passthrough and https://pve.proxmox.com/wiki/Pci_passthrough#GPU_Passthrough

issue is related to more than one card setups.

3. journalctl -b

-- Logs begin at Fri 2021-04-09 13:20:45 AEST, end at Fri 2021-04-09 13:54:00 AE
Apr 09 13:20:45 mumselectrical kernel: Linux version 5.4.73-1-pve (build@pve) (g
Apr 09 13:20:45 mumselectrical kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.
Apr 09 13:20:45 mumselectrical kernel: KERNEL supported cpus:
Apr 09 13:20:45 mumselectrical kernel: Intel GenuineIntel
Apr 09 13:20:45 mumselectrical kernel: AMD AuthenticAMD
Apr 09 13:20:45 mumselectrical kernel: Hygon HygonGenuine
Apr 09 13:20:45 mumselectrical kernel: Centaur CentaurHauls
Apr 09 13:20:45 mumselectrical kernel: zhaoxin Shanghai
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Supporting XSAVE feature 0x001:
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Supporting XSAVE feature 0x002:
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Supporting XSAVE feature 0x004:
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Supporting XSAVE feature 0x008:
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Supporting XSAVE feature 0x010:
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Supporting XSAVE feature 0x200:
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: xstate_offset[2]: 576, xstate_s
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: xstate_offset[3]: 832, xstate_s
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: xstate_offset[4]: 896, xstate_s
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: xstate_offset[9]: 960, xstate_s
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Enabled xstate features 0x21f, c
Apr 09 13:20:45 mumselectrical kernel: BIOS-provided physical RAM map:
Apr 09 13:20:45 mumselectrical kernel: BIOS-e820: [mem 0x0000000000000000-0x0000
Apr 09 13:20:45 mumselectrical kernel: BIOS-e820: [mem 0x000000000005d000-0x0000
lines 1-23

4. Proxmox version. 6.3


error code thrown kvm: -device vfio-pci,host=0000:03:00.0,id=hostpci0.0,bus=pci.0,addr=0x10.0,multifunction=on: vfio 0000:03:00.0: failed to open /dev/vfio/6: Device or resource busy

but it is actually the other card being used, not this one.
 
Last edited:

Ramalama

Active Member
Dec 26, 2020
319
37
28
32
you probably need to blacklist you card with an grub cmdline edit.
It sounds simply, like one card gets shortly accessed during boot, or before boot.

If that happens, it's like 99% that the passthrough itself will work, but the guest vm will render an error in the device manager, or boot it black etc...
In your case qemu fails, so you could have a different issue.

But in any way, you need to provide more info, basically 4 things:
1. for a in /sys/kernel/iommu_groups/*; do find $a -type l; done | sort --version-sort
2. What you exactly edited so far.
3. journalctl -b
4. Proxmox version.

Cheers

Edit, you can pipe those commands with > anyfile.txt at the end of the command, to make it easier for you, to copy here the output. Or upload those files.
 
Last edited:

alcatail

New Member
Apr 8, 2021
4
0
1
39
you probably need to blacklist you card with an grub cmdline edit.
It sounds simply, like one card gets shortly accessed during boot, or before boot.

If that happens, it's like 99% that the passthrough itself will work, but the guest vm will render an error in the device manager, or boot it black etc...
In your case qemu fails, so you could have a different issue.

But in any way, you need to provide more info, basically 4 things:
1. for a in /sys/kernel/iommu_groups/*; do find $a -type l; done | sort --version-sort
2. What you exactly edited so far.
3. journalctl -b
4. Proxmox version.

Cheers

Edit, you can pipe those commands with > anyfile.txt at the end of the command, to make it easier for you, to copy here the output. Or upload those files.
thanks for the reply I will be back at home n the morning and will update it.
I have followed the general guide to a tee, with just one card in the machine it works perfectly as well.
I will get the ingo tomorrow and repost the info you asked for.
Also it takes the card with the lower slot number, I found a post where the same thing ws happening with ethernet nics and it was the same behaviour.

Is there somewhere wehre I can spellout for the vm config file pci to pass through. I currently do it as per the guide using the q35 and omvf bios.
 

alcatail

New Member
Apr 8, 2021
4
0
1
39
you probably need to blacklist you card with an grub cmdline edit.
It sounds simply, like one card gets shortly accessed during boot, or before boot.

If that happens, it's like 99% that the passthrough itself will work, but the guest vm will render an error in the device manager, or boot it black etc...
In your case qemu fails, so you could have a different issue.

But in any way, you need to provide more info, basically 4 things:
1. for a in /sys/kernel/iommu_groups/*; do find $a -type l; done | sort --version-sort
2. What you exactly edited so far.
3. journalctl -b
4. Proxmox version.

Cheers

Edit, you can pipe those commands with > anyfile.txt at the end of the command, to make it easier for you, to copy here the output. Or upload those files.
1. for a in /sys/kernel/iommu_groups/*; do find $a -type l; done | sort --version-sort

/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:02.0
/sys/kernel/iommu_groups/2/devices/0000:00:12.0
/sys/kernel/iommu_groups/3/devices/0000:00:14.0
/sys/kernel/iommu_groups/3/devices/0000:00:14.2
/sys/kernel/iommu_groups/4/devices/0000:00:16.0
/sys/kernel/iommu_groups/5/devices/0000:00:17.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.3
/sys/kernel/iommu_groups/6/devices/0000:00:1c.4
/sys/kernel/iommu_groups/6/devices/0000:01:00.0
/sys/kernel/iommu_groups/6/devices/0000:01:00.1
/sys/kernel/iommu_groups/6/devices/0000:03:00.0
/sys/kernel/iommu_groups/6/devices/0000:03:00.1
/sys/kernel/iommu_groups/7/devices/0000:00:1f.0
/sys/kernel/iommu_groups/7/devices/0000:00:1f.3
/sys/kernel/iommu_groups/7/devices/0000:00:1f.4
/sys/kernel/iommu_groups/7/devices/0000:00:1f.5
/sys/kernel/iommu_groups/7/devices/0000:00:1f.6


2. edit's so far >>> Have followed the guide's stock standard and have the card working flawlessly one up >>https://pve.proxmox.com/wiki/Pci_passthrough and https://pve.proxmox.com/wiki/Pci_passthrough#GPU_Passthrough

issue is related to more than one card setups.

3. journalctl -b

the vm's are windows10 h20 builds

-- Logs begin at Fri 2021-04-09 13:20:45 AEST, end at Fri 2021-04-09 13:54:00 AE
Apr 09 13:20:45 mumselectrical kernel: Linux version 5.4.73-1-pve (build@pve) (g
Apr 09 13:20:45 mumselectrical kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.
Apr 09 13:20:45 mumselectrical kernel: KERNEL supported cpus:
Apr 09 13:20:45 mumselectrical kernel: Intel GenuineIntel
Apr 09 13:20:45 mumselectrical kernel: AMD AuthenticAMD
Apr 09 13:20:45 mumselectrical kernel: Hygon HygonGenuine
Apr 09 13:20:45 mumselectrical kernel: Centaur CentaurHauls
Apr 09 13:20:45 mumselectrical kernel: zhaoxin Shanghai
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Supporting XSAVE feature 0x001:
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Supporting XSAVE feature 0x002:
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Supporting XSAVE feature 0x004:
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Supporting XSAVE feature 0x008:
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Supporting XSAVE feature 0x010:
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Supporting XSAVE feature 0x200:
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: xstate_offset[2]: 576, xstate_s
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: xstate_offset[3]: 832, xstate_s
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: xstate_offset[4]: 896, xstate_s
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: xstate_offset[9]: 960, xstate_s
Apr 09 13:20:45 mumselectrical kernel: x86/fpu: Enabled xstate features 0x21f, c
Apr 09 13:20:45 mumselectrical kernel: BIOS-provided physical RAM map:
Apr 09 13:20:45 mumselectrical kernel: BIOS-e820: [mem 0x0000000000000000-0x0000
Apr 09 13:20:45 mumselectrical kernel: BIOS-e820: [mem 0x000000000005d000-0x0000
lines 1-23

4. Proxmox version. 6.3
 

alcatail

New Member
Apr 8, 2021
4
0
1
39
I have managed to get it working tonight, I revisited my mother board options and enable some options around pci express. I also re did vmid.conf and this seemed to be the winning number.
hostpci0: 01:00.0;01:00.1
hostpci1: 03:00.0;03:00.1

making sure these corelate 100 % in the vm file is a must otherwise it throws code 18 or 43 in the driver conflicts. and adding them one at a time after a windows 10 reboot seems to help.

One thing I noted and am ok with is that the vfio module can only be used on one vm at a time, you can't use the graphics cards across the different vm's such as a gtx1060 to vm 1 and a 1070 to a separate vm, it appears to be all or none to one vm and the others don't have access. Again no complaints as this is an amazing setup for what it is doing for me, I have it configured for 2 gtx1060 cards mining on one vm and when my kids want to do gaming they shut the mining vm down and fire up the gaming rig which uses just one 1060 card. I am happy with that and will work on getting my third card added to the mining vm over the next week,
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!