[TUTORIAL] AMD S7150 MxGPU with Proxmox VE 5.x

So that's finally sorted. Thanks for the help with that.
great, just for my curiosity: how much memory and what pci(e) devices do you have in your machine? maybe this makes a difference?

On another, unrelated note: have you done any experimentation with assigning multiple vGPU? The current tutorial in wiki doesn't discuss it other than to say something is passed through. I know in ESXi and XCP-ng you're supposed to "split" the card into multiples of vGPU. See: https://drivers.amd.com/relnotes/amd_mxgpu_deploymentguide_vmware.pdf
what do you mean exactly? you have x virtual function of which you can all pass through? (e.g. i have set the number to 4 here but am only passing through 2 at a time)
 
Memory is 256GB, as shown by TOM. Or at least, that's what should be shown, sometimes the registers seem to stay on a hot reboot.

Not sure why it says kernel module is mpt3sas for the 2308, but otherwise this is all the additional PCIe devices as seen from the system report (with the vGPUs cut out): (edit for code and cleanup for reading, again and again)

Code:
06:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:f1a8] (rev 03) Subsystem: Intel Corporation Device [8086:390d] Kernel driver in use: nvme
07:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:f1a8] (rev 03) Subsystem: Intel Corporation Device [8086:390d] Kernel driver in use: nvme
08:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:f1a8] (rev 03) Subsystem: Intel Corporation Device [8086:390d] Kernel driver in use: nvme
09:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:f1a8] (rev 03) Subsystem: Intel Corporation Device [8086:390d] Kernel driver in use: nvme
21:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS3224 PCI-Express Fusion-MPT SAS-3 [1000:00c4] (rev 01) Subsystem: LSI Logic / Symbios Logic SAS3224 PCI-Express Fusion-MPT SAS-3 [1000:31a0] Kernel driver in use: mpt3sas Kernel modules: mpt3sas
24:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:f1a8] (rev 03) Subsystem: Intel Corporation Device [8086:390d] Kernel driver in use: nvme
27:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05) Subsystem: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:3070] Kernel driver in use: mpt3sas Kernel modules: mpt3sas
29:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05) Subsystem: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:3070] Kernel driver in use: mpt3sas Kernel modules: mpt3sas
41:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02) Subsystem: Super Micro Computer Inc SAS3008 PCI-Express Fusion-MPT SAS-3 (AOC-S3008L-L8e) [15d9:0808] Kernel driver in use: mpt3sas Kernel modules: mpt3sas
44:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150] [1002:6929] Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150] [1002:0334] Kernel modules: amdgpu
46:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150] [1002:6929] Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150] [1002:0334] Kernel modules: amdgpu
63:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01) Subsystem: Intel Corporation Ethernet Server Adapter X520-2 [8086:000c] Kernel driver in use: ixgbe Kernel modules: ixgbe
63:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01) Subsystem: Intel Corporation Ethernet Server Adapter X520-2 [8086:000c] Kernel driver in use: ixgbe Kernel modules: ixgbe

Right, how do you configure the virtual functionality? I know in XCP and ESXi it's just an "attach this many", is it the same for PVE? The tutorial just states "pass through a vGPU" and it's unclear to me how you do that with more than "1" unit at a time since you're supposed to be able to give PCs more juice if they need it.
 
Last edited:
Hi,

i've just compile gim on a:

- HPE DL380 gen8 / firepro s7150 x2
- Proxmox 5.4 (4.15.18-24-pve)

But when i launch the module, i've got this output:

Code:
[    4.622412] gim: loading out-of-tree module taints kernel.
[    4.623521] gim info:(gim_init:149) Start AMD open source GIM initialization
[    4.623522] gim info:(gim_init:152) GPU IOV MODULE - version 0.0
[    4.623522] gim info:(gim_init:154) Copyright (c) 2014-2017 Advanced Micro Devices, Inc. All rights reserved.
[    4.639927] gim info:(parse_config_file:219) AMD GIM fb_option = 0
[    4.639928] gim info:(parse_config_file:219) AMD GIM sched_option = 0
[    4.639929] gim info:(parse_config_file:219) AMD GIM vf_num = 0
[    4.639929] gim info:(parse_config_file:219) AMD GIM pf_fb = 0
[    4.639930] gim info:(parse_config_file:219) AMD GIM vf_fb = 0
[    4.639931] gim info:(parse_config_file:219) AMD GIM sched_interval = 0
[    4.639932] gim info:(parse_config_file:219) AMD GIM sched_interval_us = 0
[    4.639932] gim info:(parse_config_file:219) AMD GIM fb_clear = 0
[    4.639933] gim info:(init_config:341) INIT CONFIG
[    4.640382] gim info:(set_new_adapter:572) curr allocated at         (ptrval)
[    4.640383] gim info:(set_new_adapter:579) SRIOV is supported
[    4.640386] gim info:(set_new_adapter:587) found PCI bridge device
[    4.640387] gim info:(set_new_adapter:591) found: 05:8.0
[    4.640421] gim info:(set_new_adapter:608) mmio_base =         (ptrval)
[    4.640425] gim info:(set_new_adapter:610) doorbell =         (ptrval)
[    4.640426] gim error:(map_fb:369) can't iomap for BAR 0
[    4.640484] gim info:(set_new_adapter:612) pf.fb_va =           (null)
[    4.640495] gim info:(sriov_is_ari_enabled:164) PCI_SRIOV_CAP = 0x00000002
[    4.640496] gim info:(sriov_is_ari_enabled:174) PCI_SRIOV_CTRL = 0x00000010
[    4.640497] gim info:(sriov_is_ari_enabled:177) PCI_SRIOV_CTRL_ARI is set --> ARI is supported
[    4.640499] gim info:(program_ari_mode:441) Read bif_strap8 = 0x00200004
[    4.640500] gim info:(program_ari_mode:446) program_ari_mode - Set ARI_Mode = PF_BUS
[    4.640500] gim info:(program_ari_mode:456) Write bif_strap8 = 0x00000004
[    4.640501] gim info:(gim_read_rom_from_reg:181) Reading VBios from ROM
[    4.640621] gim info:(gim_read_vbios:243) VBIOS starts:  0x55, 0xaa
[    4.640621] gim info:(gim_read_vbios:246) VBios size is 0x10000
[    4.640627] gim info:(gim_read_vbios:249) vbios allocated at         (ptrval)
[    4.640628] gim info:(gim_read_rom_from_reg:181) Reading VBios from ROM
[    4.809113] gim info:(gim_read_vbios:257) BIOS Version Major 0xF Minor 0x31
[    4.809157] gim info:(gim_read_vbios:270) Valid video BIOS image,
[    4.809158] gim info:(gim_read_vbios:271) size = 0x10000, check sum is 0x541000
[    4.809163] gim info:(gim_post_vbios:302) Init Parser passed!, continue
[    4.809168] gim info:(atom_chk_asic_status:333) ATOM_CheckAsicStatus - BIOS_SCRATCH_7 = 0x00000000
[    4.809168] gim info:(atom_chk_asic_status:336) Isolate ATOM_S7_ASIC_INIT_COMPLETE_MASK bit(s) = 0x00000000
[    4.809172] gim info:(atom_chk_asic_status:339) RLC_CNTL = 0x00000000
[    4.809172] gim info:(atom_chk_asic_status:341) Isolate RLC_CNTL__RLC_ENABLE_F32_MASK = 0x00000000
[    4.809173] gim info:(atom_chk_asic_status:348) ATOM_ASIC_NEED_POST
[    4.809173] gim info:(gim_post_vbios:305) Asic needs a VBios post
[    4.809174] gim info:(atom_post_vbios:200) ATOM_PostVBIOS: firmware_info passed
[    4.809175] gim info:(atom_post_vbios:253) asic_init before, engine clock = 7530; memory clock =1e848
[    5.143765] gim info:(atom_post_vbios:256) asic_init after
[    5.143766] gim info:(atom_post_vbios:263) atom_init_fan_cntl before
[    5.143772] gim info:(atom_post_vbios:265) atom_init_fan_cntl after
[    5.143773] gim info:(gim_post_vbios:311) Post INIT_ASIC successfully!
[    5.143787] gim info:(firmware_requires_update:510) SMU option ROM version 0x111700
[    5.143788] gim info:(firmware_requires_update:511) versus patch version 0x111a00
[    5.143800] gim info:(firmware_requires_update:521) RLCV option ROM version 113 versus patch version 129
[    5.143800] gim info:(firmware_requires_update:526) TOC found, update it
[    5.143809] gim info:(patch_firmware:586) Update smc_init table
[    5.599479] Modules linked in: ttm intel_cstate drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt hpilo shpchp joydev cdc_ether intel_rapl_perf input_leds ioatdma usbnet ipmi_si(+) r8152 ipmi_devintf serio_raw mii lpc_ich dca pcspkr ipmi_msghandler acpi_power_meter mac_hid gim(O+) sunrpc vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq hid_generic psmouse usbkbd usbmouse pata_acpi usbhid hid hpsa tg3 scsi_transport_sas ptp pps_core
[    5.600517]  ? copy_firmware_from_rom_to_reserved.isra.0.part.1+0x1b/0x20 [gim]
[    5.600601]  patch_firmware+0x481/0x620 [gim]
[    5.600672]  gim_post_vbios+0xa0/0x280 [gim]
[    5.600742]  set_new_adapter+0x278/0xaf0 [gim]
[    5.600812]  gim_probe+0xe/0x40 [gim]
[    5.600882]  gim_init+0x7c/0x150 [gim]
[    5.601016]  ? gim_probe+0x40/0x40 [gim]

Do you have any advises?

Thank you.
 
Hi, @aracno I have the same Issue on a ASUS KGPED16 with latest BIOS. Did you solve your problem?

The server is booted with quiet reboot=cold mem=256G rcu_nocbs=0-31 amd_iommu=on iommu=pt pci=realloc enable_mtrr_cleanup=1 video=efifb:off
 
HR
Hi, @aracno I have the same Issue on a ASUS KGPED16 with latest BIOS. Did you solve your problem?

The server is booted with quiet reboot=cold mem=256G rcu_nocbs=0-31 amd_iommu=on iommu=pt pci=realloc enable_mtrr_cleanup=1 video=efifb:off
Hi,
in my case, the HP gen8 was not compatible with the card (too old SR-IOV implementation). So i swap to a gen9 and it work at first try but...with terrible performance issues.
With One full gpu allocated to 1 vm, i can't get more than 25-30 fps with GpuTest.
Using Autocad/3dsmax with mid size projects was nearly unusable because of constant freeze when zoom in zoom out (full redraw each time) ...
Conclusion: a fucking big waste of money and time.
I wish you better luck...
 
I recently purchased a S7150X2 off of eBay. At least I think it is, looks exactly like one. But when I run lspci |grep VGA this is what I
get. This is a fresh install of ProxMox, no drivers installed; GIM for example. I only set up the IOMMU applicable settings in grub and modules; haven't messed with any SR-IOV settings yet.

Have I been had? Or am I just jumping the gun and I need to install drivers for it to appropriately be recognized as a S7150x2? I don't wan to proceed if I need to return this thing. Thank you!

Capture.JPG

here's some pics of the card:
Top
Airflow sticker facing wrong direction
Bottom

IF that helps at all.
 
Last edited:
Have I been had? Or am I just jumping the gun and I need to install drivers for it to appropriately be recognized as a S7150x2? I don't wan to proceed if I need to return this thing. Thank you!
well hard to say, the text from the pci database can of course be wrong, this is not something that comes from the card itself

you can post the detailed output:

lspci -vvvv ID

or simply try the gim driver and see if it works for you..
 
well hard to say, the text from the pci database can of course be wrong, this is not something that comes from the card itself

you can post the detailed output:

lspci -vvvv ID

or simply try the gim driver and see if it works for you..
I returned the GPU; gave up basically :( But I can't get SR-IOV working for my i350-T4v2 either. So there's obviously something wrong somewhere :(
I posted about the NIC here: https://www.reddit.com/r/Proxmox/comments/khpgwh/sriov_i350t4v2_cant_get_virtual_funcitons/
 
Sorry to bring up this post again.

I can't get this to work. I now have a SR-IOV supported Mainboard and a new CPU. I don't think the MB / CPU is the problem.

I have installed and cumpiled the latest MxGPU-Drivers from kasperlewau/MxGPU-Virtualization. I can set the virtual-functions and get the vGPUs listed.

But when I assign a vGPU to a QEMU then I get the following message in dmesg -w:

vfio_pci: Cannot bind to PF with SR-IOV enabled

This is the full output when I trigger the boot command:

Code:
vfio-pci 0000:83:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[  183.542554] vfio-pci 0000:83:02.1: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[  183.559680] vfio-pci 0000:85:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[  183.576768] vfio-pci 0000:85:02.1: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[  183.577421] vfio-pci 0000:83:02.1: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[  183.610481] vfio-pci 0000:83:02.1: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[  183.627708] vfio-pci 0000:83:00.0: vfio_pci: Cannot bind to PF with SR-IOV enabled
[  183.628282] vfio-pci: probe of 0000:83:00.0 failed with error -16
[  183.661602] vfio-pci 0000:85:00.0: vfio_pci: Cannot bind to PF with SR-IOV enabled
[  183.662174] vfio-pci: probe of 0000:85:00.0 failed with error -16
[  183.695453] vfio-pci 0000:83:00.0: vfio_pci: Cannot bind to PF with SR-IOV enabled
[  183.696054] vfio-pci: probe of 0000:83:00.0 failed with error -16

every GPU and vGPU is bind to vifo-pci and are in the same IOMMU-Group. Maybe this is the problem? but I can't get this solved although I used the following GRUB parameters: pcie_acs_override=downstream,multifunction,

This is my GRUB_CMDLINE_LINUX_DEFAULT:

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=efifb:eek:off"

Can anyone help who get this worked? I use the latest PVE: pve-manager/6.3-6/2184247e (running kernel: 5.4.106-1-pve).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!