[TUTORIAL] VGPU Step by step guide needed for noobs Proxmox VE 7.X

LordAoshi

Member
Oct 13, 2021
8
2
8
39
Hi Community,
@dcsapak

is there any instruction available except this one "https://pve.proxmox.com/wiki/MxGPU_with_AMD_S7150_under_Proxmox_VE_5.x#cite_note-gim-1" ?
The mentioned link is clear for experts, but noobs do struggle quite a lot here. Is here someone who would be kind enough to make a step by step guide or help me guide (even paid)?

Thanks for your support.
My hardware:
MB: supermicro x11dpi-nt
CPU: Intel Xeon Gold 6230R
RAM. 256 Gb DDR4
SSD: 10 TB Intel Server and Kingston SSD (Kingston for proxmox/local and Intel SSD Storage for VMs)
GPU: AMD Firepro S7150 x2
BIOS: UEFI

VM BIOS: OVMF (UEFI)
VM OS´s: Win10, Win Server 2022
 
Last edited:
i think here in the forum is a post somewhere with a more detailed guide, but i cannot seem to find it atm.

what exactly do you struggle with?

the basic steps are:
* compile kernel module (possibly as dkms module, so it gets auto-compiled on kernel upgrade)
* load the module with the correct parameters (number of gpus, etc)
* select the gpus in the gui :)
 
Hi @dcsapak,

thank you very very much for the reply. It is really giving me hope as a noob.
Im struggling actually to understand how to compile. im going through all kind of tutorials to understand the basics of the compiling and the diverse differences.
So in other words i dont actually know how to compile (i come from the windows world with gui only :( and now im interested in the CLI linux interface and like proxmox very much)

If possible step by step guide with all the complete commands would be highly appreciated to get this work done.

Thank you very much.
 
ok, so imho the easiest way is to let the DKMS[0] system build the module automatically

NOTE: please do not simply execute the code snippets below, but try to read and understand the documentation and manpages.
I am not responsible for anything that goes wrong!


first you have to install it:
Code:
apt install dkms

in my configuration here i have checked out the source code of the gim project into /usr/src/gim-3.0

then i added a '/usr/src/gim-3.0/dmks.conf' file there with the following content:
(the manpage of dkms should explain what the options and fields do)

Code:
PACKAGE_NAME=gim                                                                                 
PACKAGE_VERSION=3.0                                                                              
MAKE[0]="make -C ${kernel_source_dir} M=${dkms_tree}/${PACKAGE_NAME}/${PACKAGE_VERSION}/build"   
CLEAN="make -C ${kernel_source_dir} M=${dkms_tree}/${PACKAGE_NAME}/${PACKAGE_VERSION}/build clean"
BUILT_MODULE_NAME=gim                                                                            
DEST_MODULE_LOCATION=/extra                                                                      
REMAKE_INITRD=yes                                                                                
AUTOINSTALL=yes

then i enabled dkms with
Code:
dkms add -m gim -v 3.0
dkms build -m gim -v 3.0
dkms install -m gim -v 3.0

dkms should build the module automatically on kernel upgrade, but double check the docs/manpage for that

0: https://wiki.debian.org/KernelDKMS
 
  • Like
Reactions: cjones
ok, so imho the easiest way is to let the DKMS[0] system build the module automatically

NOTE: please do not simply execute the code snippets below, but try to read and understand the documentation and manpages.
I am not responsible for anything that goes wrong!


first you have to install it:
Code:
apt install dkms
done - this is clear.
in my configuration here i have checked out the source code of the gim project into /usr/src/gim-3.0
what do you mean by that? did you copy it to some place, download or how can this be understood?
then i added a '/usr/src/gim-3.0/dmks.conf' file there with the following content:
(the manpage of dkms should explain what the options and fields do)

Code:
PACKAGE_NAME=gim                                                                                
PACKAGE_VERSION=3.0                                                                             
MAKE[0]="make -C ${kernel_source_dir} M=${dkms_tree}/${PACKAGE_NAME}/${PACKAGE_VERSION}/build"  
CLEAN="make -C ${kernel_source_dir} M=${dkms_tree}/${PACKAGE_NAME}/${PACKAGE_VERSION}/build clean"
BUILT_MODULE_NAME=gim                                                                           
DEST_MODULE_LOCATION=/extra                                                                     
REMAKE_INITRD=yes                                                                               
AUTOINSTALL=yes
i think you did then "nano /usr/src/gim-3.0/dkms.conf" and put in the above lines , saved and closed nano right? So i went through the manpages and understand what this config does. however im not sure with the "{kernel_source_dir}" and so on - is that to be replaced by some content? Maybe an example ?
then i enabled dkms with
Code:
dkms add -m gim -v 3.0
dkms build -m gim -v 3.0
dkms install -m gim -v 3.0

dkms should build the module automatically on kernel upgrade, but double check the docs/manpage for that

0: https://wiki.debian.org/KernelDKMS
this is clear.

After that "modeprobe gim" has to be done right?
I read in other post that gim3.0 is for kernel 4.x and not for the 5.1?
I have other repositories like
https://github.com/kasperlewau/MxGPU-Virtualization
https://github.com/flumm/MxGPU-Virtualization/tree/kernel5.11

Is the procedure the same for these ? Thank you very very much for your help. I will be definitely the next subscriber ;)
 
and as i didnt get the dkms process quite clearly yet, i cloned git into /usr/src/GIM-3.0 folder and enter "MxGPU-Virtualization"-folder. After that i typed in the console "./gim.sh" and got this error.

What am i doing wrong??


EDIT: I did it the DKMS-Way (see post before)

Steps:
1. apt install sudo
2. sudo apt update
3. sudo apt upgrade and follow https://pve.proxmox.com/wiki/Pci_passthrough#Intel_CPU until "IOMMU interrupt remapping"
4. sudo apt install dkms (automatic compiling of driver/making driver if kernel is updated)
5. sudo apt install git-all
6. check in browser for GIM Driver for linux kernel >5
Sourcelink 1: https://github.com/kasperlewau/MxGPU-Virtualization
Sourcelink 2: https://github.com/flumm/MxGPU-Virtualization/tree/kernel5.11
7. cd /usr/src/
8. mkdir gim-3.0
9. git clone https://github.com/flumm/MxGPU-Virtualization.git --> i used sourcelink 2
10. git checkout kernel5.11 or git switch kernel5.11 --> we have to switch branch to "kernel5.11" for sourcelink 2 (however i dont know the difference quite clear yet)
10. cp /usr/src/MxGPU-Virtualization /usr/src/gim-3.0 (move all files to the gim-3.0 folder without that MxGPU-Virtualization folder)
11. nano dkms.conf (in the directory /usr/src/gim-3.0 creat dkms.conf file with nano editor or vi)
12. Write these lines into the dkms.conf
PACKAGE_NAME=gim
PACKAGE_VERSION=3.0
MAKE[0]="make -C ${kernel_source_dir} M=${dkms_tree}/${PACKAGE_NAME}/${PACKAGE_VERSION}/build"
CLEAN="make -C ${kernel_source_dir} M=${dkms_tree}/${PACKAGE_NAME}/${PACKAGE_VERSION}/build clean"
BUILT_MODULE_NAME=gim
DEST_MODULE_LOCATION=/extra
REMAKE_INITRD=yes
AUTOINSTALL=yes
13. dkms add -m gim -v 3.0
14. dkms build -m gim -v 3.0 ---> an error might appear - please read what exactly is missing... in my case the pve-headers-5.11.22-4-pve was missing
14. sudo apt install pve-headers-5.11.22-4-pve (install missing pkg)
15. dkms build -m gim -v 3.0
16. dkms install -m gim -v 3.0
17. modprobe gim
18. lspci --> All Vgpus should now be visible (see attachement 2)

After reboot it is gone ? why?

EDIT:
To load this module "gim" you need to make it load on startup -->
19. cd /etc/modules-load.d
20. nano gim
21. gim --> ctrl + X (save and exit)
22. restart server and check with lspci if gim/amdgpu is loaded

EDIT2:
23. You will encounter ERROR 182 when you try to install any guest software except the one mentioned downstairs (
https://www.amd.com/de/support/professional-graphics/firepro/firepro-s-series/firepro-s7150-x2
(Radeon™ Pro Software Adrenalin Edition for Windows® 10 (64-bit), Guest Driver: Technical preview driver only for use with Open Source GIM Driver for KVM-based VDI deployments for Microsoft Windows® 10 (64-bit) platforms.)

SOLUTION TO 23:
Get the appropriate latest guest Driver for your Windows System (Win10, server 2016, 2019 etc.) -> Error 182 will appear -> Install it like you usually would, and let it fail. What we actually want are the folders that will be created in C:\AMD
24. Open devmgmt.msc (windows hardwaremanager) and look for a GPU Adapter called " Windows Basic Display Driver" marked with yellow "!"
25. Right-Click and choose "Update Drivers". Choose "Look for Drivers on my Computer" and point it towards "C:\AMD\<Radeon Software Version>\Packages\Drivers\Display\WT6A_INF"
26. Reboot Windows VM
27. Once you are back in windows, go to
"C:\AMD\<Radeon Software Version>\Packages\Drivers\Display\WT6A_INF\B355483"
(This folder´s name is propably going to change with different software versions, so keep that in mind)
In there, you should find a "ccc2_install.exe". Run it, and you should now be able to install the Adrenalin Software. Congratulations, you fixed Error 182!

*****Big thanks to "Flixilux" (https://www.reddit.com/r/radeon/comments/hmus0y/how_to_fix_radeon_software_error_182/)*****

Encountered PROBLEMS:
------------------------------------------------------------------------------------------------------
I want to assign the gpu but the next iommu error comes...
It says "No IOMMU detected, please activate it.See Documentation for further information."
I have done all the steps with iommu according to the proxmox wiki.
What am i missing??
also it says there should be a "group of iommu" or something like this....
so i did "find /sys/kernel/iommu_groups/ -type l" and it shows nothing....
Whats wrong here?

SOLUTION:
I wrote the command "quiet intel_iommu=on" in a second line in file /etc/kernel/cmdline..... WRITE IT IN THE FIRST LINE AFTER THE PREVIOUS COMMANDS (if any)
And now the iommu error is gone and everything is working fine until now finally.
Will report if Windows Server 2022 went well :)
EDIT: Windows Server 2022 no problems until now and all working fine. Will test more with 16VMs :)
------------------------------------------------------------------------------------------------------

@dcsapak
How did you install or passthrough the vgpu?
I did install the drivers recommended for gim and getting no acceleration even though installer finishes without errors...
I have configured the vm like in screenshot visible:
Any exact driver link or something like that ?

SOLUTION:
MxGPU finally working with Windows Server 2022 :) will post some screenshots and small video soon if required
Follow all steps above and download following driver:
https://www.amd.com/de/support/professional-graphics/firepro/firepro-s-series/firepro-s7150-x2
(Radeon™ Pro Software Adrenalin Edition for Windows® 10 (64-bit), Guest Driver: Technical preview driver only for use with Open Source GIM Driver for KVM-based VDI deployments for Microsoft Windows® 10 (64-bit) platforms.)
or follow step 23 above :)

proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve) pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e) pve-kernel-helper: 7.1-2 pve-kernel-5.11: 7.0-8 pve-kernel-5.11.22-5-pve: 5.11.22-10 pve-kernel-5.11.22-4-pve: 5.11.22-9 ceph-fuse: 15.2.14-pve1 corosync: 3.1.5-pve1 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.22-pve1 libproxmox-acme-perl: 1.3.0 libproxmox-backup-qemu0: 1.2.0-1 libpve-access-control: 7.0-4 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.0-9 libpve-guest-common-perl: 4.0-2 libpve-http-server-perl: 4.0-2 libpve-storage-perl: 7.0-11 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 4.0.9-4 lxcfs: 4.0.8-pve2 novnc-pve: 1.2.0-3 proxmox-backup-client: 2.0.9-2 proxmox-backup-file-restore: 2.0.9-2 proxmox-mini-journalreader: 1.2-1 proxmox-widget-toolkit: 3.3-6 pve-cluster: 7.0-3 pve-container: 4.0-10 pve-docs: 7.0-5 pve-edk2-firmware: 3.20200531-1 pve-firewall: 4.2-4 pve-firmware: 3.3-2 pve-ha-manager: 3.3-1 pve-i18n: 2.5-1 pve-qemu-kvm: 6.0.0-4 pve-xtermjs: 4.12.0-1 qemu-server: 7.0-14 smartmontools: 7.2-1 spiceterm: 3.2-2 vncterm: 1.7-1 zfsutils-linux: 2.0.5-pve1

Suggestions and Thanks are welcome and make me enjoy that someone valued this information ;)
Have a nice day :)
 

Attachments

  • error with gim.sh.PNG
    error with gim.sh.PNG
    21.9 KB · Views: 89
  • vgpus.PNG
    vgpus.PNG
    93.3 KB · Views: 84
  • vgpu pcie.PNG
    vgpu pcie.PNG
    38.5 KB · Views: 105
  • vm configuration.PNG
    vm configuration.PNG
    27.6 KB · Views: 102
  • Benchmark Unigene.jpg
    Benchmark Unigene.jpg
    329.2 KB · Views: 86
Last edited:
well it seems you figured it out on your own :)

btw. https://github.com/flumm/MxGPU-Virtualization/tree/kernel5.11 is my git and it's what i use here. i'll be updating my git with new branches as they are necessary
(i normally open pull requests on the upstream git, but it seems they ignore them completely, not sure if that's because it is a 'dead' project or some other reason)
 
well it seems you figured it out on your own :)

btw. https://github.com/flumm/MxGPU-Virtualization/tree/kernel5.11 is my git and it's what i use here. i'll be updating my git with new branches as they are necessary
(i normally open pull requests on the upstream git, but it seems they ignore them completely, not sure if that's because it is a 'dead' project or some other reason)
ohhh ok then i did that intentionally right way ;) Thank you very much btw.
What about the AMD GIM Driver for Windows ? Will this driver date 2017 (Adrenalin 17.12.2) forever ?
Or is there a development ongoing somewhere else?

@dcsapak did you also try to get the new AMD "MI25 etc" to work as vgpu?
 
What about the AMD GIM Driver for Windows ? Will this driver date 2017 (Adrenalin 17.12.2) forever ?
Or is there a development ongoing somewhere else?
i'd use the official windows driver from the amd site: https://www.amd.com/en/support/prof...firepro-s-series/firepro-s7150-active-cooling
(the one under "KVM Open Source" "Guest Driver for KVM Open Source", currently 20.Q2.2)

@dcsapak did you also try to get the new AMD "MI25 etc" to work as vgpu?
sadly no, it does not seem that it's an 'off-the-shelf' part available anywhere...
but afaics, it does not use the gim driver, but the in kernel one?
 
did anyone get this running on proxmox 7 with an hp proliant g9? i get errors and more errors... and when i archive to load the windows driver the vm crashes or the host crashes...
 
which guest driver did you use exactly?

but i am already in legacy boot modus. where can i change the rom to "legacy" on hp gen9?
that depends on the bios. on a supermicro board here i can change that setting for every pci device seperately (the settings is called oprom or something like this, can't remember right now)
 
its an hp dl360 gen9.

guest driver is:
https://www.amd.com/en/support/prof...firepro-s-series/firepro-s7150-active-cooling
"KVM Open Source" "Guest Driver for KVM Open Source", currently 20.Q2.2

but i get also the error: gim error:(init_register_init_state:3624) Failed to INIT PF for initial register 'init-state' (when i do modprobe gim)
but the virtual cards show as expected. can i ignore this error ? i dont think

the only difference is that i do not use uefi boot for the vm as it does not boot after install together with q35, at the moment not virtio drivers in use just for testing...
 
meanwhile i managed to get a q35 with uefi wokring,. but same same. at the moment the driver n the guest vm is installed the host crashes...
 
meanwhile i managed to get a q35 with uefi wokring,. but same same. at the moment the driver n the guest vm is installed the host crashes...
reading this far i think its an issue with the sr-iov or iommu not passing correctly. Maybe you should go again into bios and set everything to legacy - especially the pcie port for the gpu. then configure it again step by step and do the compile of drivers. Also which VM Guest driver you use is actually not really relevant as all seem to work somehow. For my first shot the GIM Driver of AMD website worked flawless.
 
the only thing i can set to legacy is the boot. nothing else to do here. i found some more posts about people who could not get firepro s7150 to work with gen9 dl360. maybe its not possible...
 
i just gave esxi a fast try with the same hardware and bios config - and it works without problems. maybe i will try another day with other mainboard and kvm/proxmox. i would love to get it running because all infrastructure is proxmox...
 
I have another problem now :( I can only get 6 VMS working and then it wont start the 7. or more
What do i have to configure still? Maybe the gim? If yes what exactly must be configured? @dcsapak
Thanks for the help ;)
TASK ERROR: start failed: command '/usr/bin/kvm -id 106
-name test106
-no-shutdown
-chardev 'socket,id=qmp,path=/var/run/qemu-server/106.qmp,server=on,wait=off'
-mon 'chardev=qmp,mode=control'
-chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5'
-mon 'chardev=qmp-event,mode=control'
-pidfile /var/run/qemu-server/106.pid
-daemonize -smbios 'type=1,uuid=1cf934dc-cfcf-474b-a652-a8b9db23c25c'
-drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd'
-drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/zvol/vm-pool/vm-106-disk-0'
-smp '4,sockets=1,cores=4,maxcpus=4'
-nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg'
-vnc 'unix:/var/run/qemu-server/106.vnc,password=on' -no-hpet
-cpu kvm64,enforce,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep'
-m 16384 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=9fbe175a-bb78-4d8e-8c97-cbfc86f64aa3'
-device 'usb-tablet,id=tablet,bus=ehci.0,port=1'
-device 'vfio-pci,host=0000:dc:03.3,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0'
-device 'VGA,id=vga,bus=pcie.0,addr=0x1'
-chardev 'socket,path=/var/run/qemu-server/106.qga,server=on,wait=off,id=qga0'
-device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8'
-device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0'
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3'
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:b38780127a42'
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/vm-pool/vm-106-disk-1,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=io_uring,detect-zeroes=unmap'
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100'
-netdev 'type=tap,id=net0,ifname=tap106i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=AA:D8:53:A2:FF:1E,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=101'
-rtc 'driftfix=slew,base=localtime'
-machine 'type=pc-q35-6.0+pve0'
-global 'kvm-pit.lost_tick_policy=discard''
failed: got timeout

UPDATE: I managed to get more than 6 VMs started after GIM Configuration, especially "vf_num = 16" (to be found /etc/gim_config) and then restart.... somehow it worked up to 13 vms, after i started cloning them i shutdown the vms and restarted....then same error like above....
after restart and setting fb_clear=1 i could start even 19 VMs
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!