[TUTORIAL] HPE ML/DL server series PCI/GPU passthrough - PVE8

emunt6

Active Member
Oct 3, 2022
158
31
28
Hi Everyone!

I decided to write a tutorial for PCI/GPU passthrough for the HPE ML/DL servers, because many information you can found is old/misleading and not working.
Here you you can found the a working method.


1., Requirements:
Code:
-HPE ML/DL series server
-PCI-E card / GPU card
-ILO access


2., BIOS Settings:
Code:
F9 BIOS (RBSU), and Press CTR+A (Hidden menu will appear at the bottom "Service Options"):

- "System Options" > "Intel(R) VT-d" > "Enabled"
- "Advanced Options" > "Video Options" > "Embedded video primary, optional video secondary"
- "Advanced Options" > "Remote Graphics Mode" > "Enabled"
- "Service Options" > "Processor Power and Utilization Monitoring" > "Enabled"
- "Service Options" > "Shared Memory Communication" > "Enabled"
- "Service Options" > "PCI Express 64bit BAR Support" > "Enabled"

The Video settings are needed, the BIOS to do not initialize/grab the GPU card just leave as is, use the integrated onboard GPU.
Otherwise the GPU is initialized, maybe you need the "vbios" dump from you GPU card to correctly "start/restart the card" in the VM ( some settings only adjusted behind this process, later cannot thats why need the "vbios") - we don't want this, dont need any vbios.


3., Proxmos host kernel settings:

Code:
/etc/default/grub

# INTEL processor:
GRUB_CMDLINE_LINUX=" intel_iommu=on iommu=pt initcall_blacklist=sysfb_init"
# AMD processor:
GRUB_CMDLINE_LINUX=" amd_iommu=on iommu=pt initcall_blacklist=sysfb_init"

Code:
/etc/modprobe.d/kvm.conf

options kvm ignore_msrs=1 report_ignored_msrs=0

Code:
/etc/modprobe.d/nvidia.conf

blacklist nvidiafb
blacklist nouveau
blacklist nvidia
blacklist nvidia_drm

Code:
/etc/modprobe.d/radeon.conf

blacklist radeon
blacklist amdgpu

Code:
/etc/modules

vfio
vfio_iommu_type1
vfio_pci

Code:
$> update-initramfs -c -d -u
$> update-grub

Reboot the Machine


4.,HPE IOMMU configuration
After the restart, we need to adjust IOMMU config, we need the "hp-scripting-tools" from the "http://downloads.linux.hpe.com/SDR/repo/stk/" website.
We dont need to add the repo, just download the latest available version, the XML is needed too.
Code:
$> wget "http://downloads.linux.hpe.com/SDR/repo/stk/Debian/pool/non-free/hp-scripting-tools_11.60-20_amd64.deb"
$> wget -O conrep_rmrds.xml "https://downloads.hpe.com/pub/softlib2/software1/pubsw-linux/p1472592088/v95853/conrep_rmrds.xml"
$> dpkg -i hp-scripting-tools_11.60-20_amd64.deb

We need the Physical portnumber, where the PCI-E card is located:
Code:
(Example: AMD GPU card)

$> lspci -nnk | grep 'AMD'
0c:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi  ...
0d:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi ...
0e:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi ...
0e:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]  ...

$> lspci -s 0c:00.0 -vvv | grep 'Physical Slot'
Physical Slot: 3
$> lspci -s 0d:00.0 -vvv | grep 'Physical Slot'
$> lspci -s 0e:00.0 -vvv | grep 'Physical Slot'
$> lspci -s 0e:00.1 -vvv | grep 'Physical Slot'

Create the following file "exclude.dat" with following content ( RMRDS_SlotX -> "RMRDS_Slot3" - The card 'Physical Slot' number )
Code:
<Conrep> <Section name="RMRDS_Slot3" helptext=".">Endpoints_Excluded</Section> </Conrep>

Apply the config
Code:
$> conrep -l -x conrep_rmrds.xml -f exclude.dat

Query the status
Code:
$> conrep -s -x conrep_rmrds.xml -f verify.dat

Check the config is okay:
Code:
$> cat verify.dat | grep -i excluded

<Section name="RMRDS_Slot3" helptext=".">Endpoints_Excluded</Section>

( If you see the following, you are okay )

If you have more PCI/GPU card that you want to passthrough, you need to repeat the "conrep" process.

Reboot the Machine

You can check the IOMMU is working:
Code:
$> journalctl -xb 0 | grep -ie DMAR -ie IOMMU -ie VFIO

If you see somthing like this, then it is working:
Code:
kernel: DMAR: IOMMU enabled
kernel: DMAR: Host address width 46
kernel: DMAR: DRHD base: 0x000000fabfe000 flags: 0x0
kernel: DMAR: dmar0: reg_base_addr fabfe000 ver 1:0 cap d2078c106f0466 ecap f020de
kernel: DMAR: DRHD base: 0x000000f4ffe000 flags: 0x1
kernel: DMAR: dmar1: reg_base_addr f4ffe000 ver 1:0 cap d2078c106f0466 ecap f020de
kernel: DMAR: RMRR base: 0x000000bdffd000 end: 0x000000bdffffff
kernel: DMAR: RMRR base: 0x000000bdff6000 end: 0x000000bdffcfff
kernel: DMAR: RMRR base: 0x000000bdf83000 end: 0x000000bdf84fff
kernel: DMAR: RMRR base: 0x000000bdf7f000 end: 0x000000bdf82fff
kernel: DMAR: RMRR base: 0x000000bdf6f000 end: 0x000000bdf7efff
kernel: DMAR: RMRR base: 0x000000bdf6e000 end: 0x000000bdf6efff
kernel: DMAR: RMRR base: 0x000000000f4000 end: 0x000000000f4fff
kernel: DMAR: RMRR base: 0x000000000e8000 end: 0x000000000e8fff
kernel: DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000000e8000-0x00000000000e8fff], contact BIOS vendor for fixes
kernel: DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000000e8000-0x00000000000e8fff]
kernel: DMAR: RMRR base: 0x000000bddde000 end: 0x000000bdddefff
kernel: DMAR: ATSR flags: 0x0
kernel: DMAR-IR: IOAPIC id 10 under DRHD base  0xfabfe000 IOMMU 0
kernel: DMAR-IR: IOAPIC id 8 under DRHD base  0xf4ffe000 IOMMU 1
kernel: DMAR-IR: IOAPIC id 0 under DRHD base  0xf4ffe000 IOMMU 1
kernel: DMAR-IR: HPET id 0 under DRHD base 0xf4ffe000
kernel: DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit.
kernel: DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS setting.
kernel: DMAR-IR: Enabled IRQ remapping in xapic mode
kernel: iommu: Default domain type: Passthrough (set via kernel command line)
kernel: DMAR: No SATC found
kernel: DMAR: dmar0: Using Queued invalidation
kernel: DMAR: dmar1: Using Queued invalidation
kernel: pci 0000:40:00.0: Adding to iommu group 0
kernel: pci 0000:40:01.0: Adding to iommu group 1
kernel: pci 0000:40:01.1: Adding to iommu group 2
kernel: pci 0000:40:02.0: Adding to iommu group 3
kernel: pci 0000:40:02.1: Adding to iommu group 4
kernel: pci 0000:40:02.2: Adding to iommu group 5
kernel: pci 0000:40:02.3: Adding to iommu group 6
kernel: pci 0000:40:03.0: Adding to iommu group 7
kernel: pci 0000:40:03.1: Adding to iommu group 8
kernel: pci 0000:40:03.2: Adding to iommu group 9
kernel: pci 0000:40:03.3: Adding to iommu group 10
kernel: pci 0000:40:04.0: Adding to iommu group 11
kernel: pci 0000:40:04.1: Adding to iommu group 12
kernel: pci 0000:40:04.2: Adding to iommu group 13
kernel: pci 0000:40:04.3: Adding to iommu group 14
kernel: pci 0000:40:04.4: Adding to iommu group 15
kernel: pci 0000:40:04.5: Adding to iommu group 16
kernel: pci 0000:40:04.6: Adding to iommu group 17
kernel: pci 0000:40:04.7: Adding to iommu group 18
kernel: pci 0000:41:00.0: Adding to iommu group 19
kernel: pci 0000:47:00.0: Adding to iommu group 20
kernel: pci 0000:47:00.1: Adding to iommu group 20
kernel: pci 0000:00:00.0: Adding to iommu group 21
kernel: pci 0000:00:01.0: Adding to iommu group 22
kernel: pci 0000:00:01.1: Adding to iommu group 23
kernel: pci 0000:00:02.0: Adding to iommu group 24
kernel: pci 0000:00:02.1: Adding to iommu group 25
kernel: pci 0000:00:02.2: Adding to iommu group 26
kernel: pci 0000:00:02.3: Adding to iommu group 27
kernel: pci 0000:00:03.0: Adding to iommu group 28
kernel: pci 0000:00:03.1: Adding to iommu group 29
kernel: pci 0000:00:03.2: Adding to iommu group 30
kernel: pci 0000:00:03.3: Adding to iommu group 31
kernel: pci 0000:00:04.0: Adding to iommu group 32
kernel: pci 0000:00:04.1: Adding to iommu group 33
kernel: pci 0000:00:04.2: Adding to iommu group 34
kernel: pci 0000:00:04.3: Adding to iommu group 35
kernel: pci 0000:00:04.4: Adding to iommu group 36
kernel: pci 0000:00:04.5: Adding to iommu group 37
kernel: pci 0000:00:04.6: Adding to iommu group 38
kernel: pci 0000:00:04.7: Adding to iommu group 39
kernel: pci 0000:00:05.0: Adding to iommu group 40
kernel: pci 0000:00:05.2: Adding to iommu group 41
kernel: pci 0000:00:05.4: Adding to iommu group 42
kernel: pci 0000:00:11.0: Adding to iommu group 43
kernel: pci 0000:00:1a.0: Adding to iommu group 44
kernel: pci 0000:00:1c.0: Adding to iommu group 45
kernel: pci 0000:00:1c.4: Adding to iommu group 46
kernel: pci 0000:00:1c.7: Adding to iommu group 47
kernel: pci 0000:00:1d.0: Adding to iommu group 48
kernel: pci 0000:00:1e.0: Adding to iommu group 49
kernel: pci 0000:00:1f.0: Adding to iommu group 50
kernel: pci 0000:00:1f.2: Adding to iommu group 50
kernel: pci 0000:0f:00.0: Adding to iommu group 51
kernel: pci 0000:0f:00.1: Adding to iommu group 51
kernel: pci 0000:0f:00.2: Adding to iommu group 51
kernel: pci 0000:0f:00.3: Adding to iommu group 51
kernel: pci 0000:04:00.0: Adding to iommu group 52
kernel: pci 0000:05:04.0: Adding to iommu group 53
kernel: pci 0000:05:05.0: Adding to iommu group 54
kernel: pci 0000:05:08.0: Adding to iommu group 55
kernel: pci 0000:07:00.0: Adding to iommu group 56
kernel: pci 0000:08:00.0: Adding to iommu group 57
kernel: pci 0000:03:00.0: Adding to iommu group 58
kernel: pci 0000:0c:00.0: Adding to iommu group 59
kernel: pci 0000:0d:00.0: Adding to iommu group 60
kernel: pci 0000:0e:00.0: Adding to iommu group 61
kernel: pci 0000:0e:00.1: Adding to iommu group 62
kernel: pci 0000:02:00.0: Adding to iommu group 63
kernel: pci 0000:02:00.1: Adding to iommu group 63
kernel: pci 0000:02:00.2: Adding to iommu group 63
kernel: pci 0000:02:00.3: Adding to iommu group 63
kernel: pci 0000:01:00.0: Adding to iommu group 64
kernel: pci 0000:01:00.1: Adding to iommu group 64
kernel: pci 0000:01:00.2: Adding to iommu group 64
kernel: pci 0000:01:00.4: Adding to iommu group 64
kernel: pci 0000:40:05.0: Adding to iommu group 65
kernel: pci 0000:40:05.2: Adding to iommu group 66
kernel: pci 0000:40:05.4: Adding to iommu group 67
kernel: pci 0000:3f:08.0: Adding to iommu group 68
kernel: pci 0000:3f:08.2: Adding to iommu group 68
kernel: pci 0000:3f:08.6: Adding to iommu group 69
kernel: pci 0000:3f:09.0: Adding to iommu group 70
kernel: pci 0000:3f:09.2: Adding to iommu group 70
kernel: pci 0000:3f:09.6: Adding to iommu group 71
kernel: pci 0000:3f:0a.0: Adding to iommu group 72
kernel: pci 0000:3f:0a.1: Adding to iommu group 72
kernel: pci 0000:3f:0a.2: Adding to iommu group 72
kernel: pci 0000:3f:0a.3: Adding to iommu group 72
kernel: pci 0000:3f:0b.0: Adding to iommu group 73
kernel: pci 0000:3f:0b.3: Adding to iommu group 73
kernel: pci 0000:3f:0c.0: Adding to iommu group 74
kernel: pci 0000:3f:0c.1: Adding to iommu group 74
kernel: pci 0000:3f:0c.2: Adding to iommu group 74
kernel: pci 0000:3f:0c.3: Adding to iommu group 74
kernel: pci 0000:3f:0c.4: Adding to iommu group 74
kernel: pci 0000:3f:0d.0: Adding to iommu group 75
kernel: pci 0000:3f:0d.1: Adding to iommu group 75
kernel: pci 0000:3f:0d.2: Adding to iommu group 75
kernel: pci 0000:3f:0d.3: Adding to iommu group 75
kernel: pci 0000:3f:0d.4: Adding to iommu group 75
kernel: pci 0000:3f:0e.0: Adding to iommu group 76
kernel: pci 0000:3f:0e.1: Adding to iommu group 76
kernel: pci 0000:3f:0f.0: Adding to iommu group 77
kernel: pci 0000:3f:0f.1: Adding to iommu group 78
kernel: pci 0000:3f:0f.2: Adding to iommu group 79
kernel: pci 0000:3f:0f.3: Adding to iommu group 80
kernel: pci 0000:3f:0f.4: Adding to iommu group 81
kernel: pci 0000:3f:0f.5: Adding to iommu group 82
kernel: pci 0000:3f:10.0: Adding to iommu group 83
kernel: pci 0000:3f:10.1: Adding to iommu group 84
kernel: pci 0000:3f:10.2: Adding to iommu group 85
kernel: pci 0000:3f:10.3: Adding to iommu group 86
kernel: pci 0000:3f:10.4: Adding to iommu group 87
kernel: pci 0000:3f:10.5: Adding to iommu group 88
kernel: pci 0000:3f:10.6: Adding to iommu group 89
kernel: pci 0000:3f:10.7: Adding to iommu group 90
kernel: pci 0000:3f:13.0: Adding to iommu group 91
kernel: pci 0000:3f:13.1: Adding to iommu group 91
kernel: pci 0000:3f:13.4: Adding to iommu group 91
kernel: pci 0000:3f:13.5: Adding to iommu group 91
kernel: pci 0000:3f:16.0: Adding to iommu group 92
kernel: pci 0000:3f:16.1: Adding to iommu group 92
kernel: pci 0000:3f:16.2: Adding to iommu group 92
kernel: pci 0000:5f:08.0: Adding to iommu group 93
kernel: pci 0000:5f:08.2: Adding to iommu group 93
kernel: pci 0000:5f:08.6: Adding to iommu group 94
kernel: pci 0000:5f:09.0: Adding to iommu group 95
kernel: pci 0000:5f:09.2: Adding to iommu group 95
kernel: pci 0000:5f:09.6: Adding to iommu group 96
kernel: pci 0000:5f:0a.0: Adding to iommu group 97
kernel: pci 0000:5f:0a.1: Adding to iommu group 97
kernel: pci 0000:5f:0a.2: Adding to iommu group 97
kernel: pci 0000:5f:0a.3: Adding to iommu group 97
kernel: pci 0000:5f:0b.0: Adding to iommu group 98
kernel: pci 0000:5f:0b.3: Adding to iommu group 98
kernel: pci 0000:5f:0c.0: Adding to iommu group 99
kernel: pci 0000:5f:0c.1: Adding to iommu group 99
kernel: pci 0000:5f:0c.2: Adding to iommu group 99
kernel: pci 0000:5f:0c.3: Adding to iommu group 99
kernel: pci 0000:5f:0c.4: Adding to iommu group 99
kernel: pci 0000:5f:0d.0: Adding to iommu group 100
kernel: pci 0000:5f:0d.1: Adding to iommu group 100
kernel: pci 0000:5f:0d.2: Adding to iommu group 100
kernel: pci 0000:5f:0d.3: Adding to iommu group 100
kernel: pci 0000:5f:0d.4: Adding to iommu group 100
kernel: pci 0000:5f:0e.0: Adding to iommu group 101
kernel: pci 0000:5f:0e.1: Adding to iommu group 101
kernel: pci 0000:5f:0f.0: Adding to iommu group 102
kernel: pci 0000:5f:0f.1: Adding to iommu group 103
kernel: pci 0000:5f:0f.2: Adding to iommu group 104
kernel: pci 0000:5f:0f.3: Adding to iommu group 105
kernel: pci 0000:5f:0f.4: Adding to iommu group 106
kernel: pci 0000:5f:0f.5: Adding to iommu group 107
kernel: pci 0000:5f:10.0: Adding to iommu group 108
kernel: pci 0000:5f:10.1: Adding to iommu group 109
kernel: pci 0000:5f:10.2: Adding to iommu group 110
kernel: pci 0000:5f:10.3: Adding to iommu group 111
kernel: pci 0000:5f:10.4: Adding to iommu group 112
kernel: pci 0000:5f:10.5: Adding to iommu group 113
kernel: pci 0000:5f:10.6: Adding to iommu group 114
kernel: pci 0000:5f:10.7: Adding to iommu group 115
kernel: pci 0000:5f:13.0: Adding to iommu group 116
kernel: pci 0000:5f:13.1: Adding to iommu group 116
kernel: pci 0000:5f:13.4: Adding to iommu group 116
kernel: pci 0000:5f:13.5: Adding to iommu group 116
kernel: pci 0000:5f:16.0: Adding to iommu group 117
kernel: pci 0000:5f:16.1: Adding to iommu group 117
kernel: pci 0000:5f:16.2: Adding to iommu group 117
kernel: DMAR: Intel(R) Virtualization Technology for Directed I/O
kernel: DMAR: DRHD: handling fault status reg 2
kernel: DMAR: [INTR-REMAP] Request device [01:00.0] fault index 0x67 [fault reason 0x26] Blocked an interrupt request due to source-id verification failure
kernel: VFIO - User Level meta-driver version: 0.3
kernel: vfio-pci 0000:0e:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
kernel: vfio-pci 0000:0e:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
kernel: vfio-pci 0000:0e:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
kernel: vfio-pci 0000:0e:00.0: enabling device (0040 -> 0043)
kernel: vfio-pci 0000:0e:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
kernel: vfio-pci 0000:0e:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
kernel: vfio-pci 0000:0e:00.0: vfio_ecap_init: hiding ecap 0x26@0x410
kernel: vfio-pci 0000:0e:00.0: vfio_ecap_init: hiding ecap 0x27@0x450

There is a common misleading line:
Code:
kernel: DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000000e8000-0x00000000000e8fff], contact BIOS vendor for fixes
kernel: DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000000e8000-0x00000000000e8fff]
....
kernel: DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit.
kernel: DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS setting.

This is not a bug or broken IOMMU, because we enabled only the "Slot3 PCI-E" for IOMMU with the "hp-scripting-tools - conrep command" - Just ignore this fake errors.


5., Configure the VM:
The last step is to add the PCI/GPU card to the VM config, you can use the WebGUI.
Code:
Add PCI-Device
(*) Raw Device []
 [X] Primary GPU
 [X] All Functions
 [X] PCI-Express
 [X] ROM-BAR
 
Last edited:
On DL360 G6 (BIOS last version) CTRL+A does not work in RBSU. I followed the instructions to the end, but it didn't work. I'll try again in the next few days/weekend. Thank you!
 
Last edited:
Worked flawlessly with NVDIA Tesla P40 and NVDIA Tesla P100 (with nvidia-driver-535 on ubuntu server 22.04) on PROLIANT ML350P Gen 8 running proxmox 8.2-1.

I have lost 4 days (mostly nights really) before I found your tutorial.

Tank you so much !
 
Thank you, I will try this on my ML350, Gen11 and report back. I have a few questions in the meanwhile:

1.
$> update-initramfs -c -d -u
This command (-c -d -u) attempts to create, delete, and update the initramfs image for the current kernel, which is in my opinion not a practical or valid combination of options. I suggest to use `update-initramfs -u -k all` as indicated in other tutorials.

2.
I assume a host driver (in my case nvidia) is still needed, correct?

3.
I assume that I need to add a .conf file adding the ID for my GPU as e.g.
`echo "options vfio-pci ids=10de:1f82" > /etc/modprobe.d/vfio.conf`
correct?

Many thanks!
 
Hello,

did anyone got this working in the past with GPU passthrough on an gen9 server (ML350p)?
I had a working GPU passthrough on a gen8 server, but I want to move to a gen9 server. The last days I tried so much things that I can´t remember now what I already tested but nothing works. I put my NVIDIA GTX 1070 back to the gen8 server and everything is working fine. But not on the gen9 server.
In the mentime I belive that I have to change some BIOS settings what I have already done but I can´t find the correct combination.

Many thanks in advance.

Manfred
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!