[TUTORIAL] Compile Proxmox VE with patched intel-iommu driver to remove RMRR check

I tested the packages throughly and prepared a complete rundown of the issue with all the possible fixes & technical reasons.

https://github.com/kiler129/relax-intel-rmrr

Anyone interested can either download precompiled debs or build them from sources. After installation flipping a kernel switch will activate the patch.

Enjoy. Open source FTW :D

One thing I would say is it would be better if you gave instructions to build the actual patch rather than clone the repo to download it. It good for the user to know exactly what changes they are making, as per the original instructions in this thread.
 
One thing I would say is it would be better if you gave instructions to build the actual patch rather than clone the repo to download it. It good for the user to know exactly what changes they are making, as per the original instructions in this thread.

Of course, downloading packages is just really a shortcut for people who don't want to go through the process of building the kernel.
The repo contains two install instructions for Proxmox:
Is there anything you're missing here you think can be added?

I'm personally a fan of putting all documentation in the repo as it's being version and can be updated in one place with new versions of e.g. Proxmox being released. The repo contains not only the patches, which by itself are mysterious, but also extensive rundown of technical details. If you want to just an explanation about what the patch does there's a section for that :)
 
Of course, downloading packages is just really a shortcut for people who don't want to go through the process of building the kernel.
The repo contains two install instructions for Proxmox:
Is there anything you're missing here you think can be added?

I'm personally a fan of putting all documentation in the repo as it's being version and can be updated in one place with new versions of e.g. Proxmox being released. The repo contains not only the patches, which by itself are mysterious, but also extensive rundown of technical details. If you want to just an explanation about what the patch does there's a section for that :)
What I meant is in the Building from sources section you include the following line:
Code:
git clone --depth=1 https://github.com/kiler129/relax-intel-rmrr.git
The patch is already built rather than the user creating and editing files to build it (as per the original post in this thread).

You really have done a fantastic job with this, seemingly out of nowhere... so impressive. I had not seen that explanation section before. I can’t wait till the weekend where I will be able to have a play with this. Thank you! :)
 
The patch is already built rather than the user creating and editing files to build it (as per the original post in this thread).
Oh, I see what you mean. I was thinking about it but decided to not include an additional guide for creating a patch yourself. I did this because I believe there are mutually exclusive groups of users:

  1. Most users will simply install the debs and forget (I actually have three friends also running the patch as of today and they all picked that route)
  2. If you're more advanced you will go further and compile your own kernel applying the patches
  3. If you want to go beyond compilation and you actually want to understand the changes being made you obviously want to see how is that modification different from the vanilla kernel
It seems to me that you're talking about the third group. However, IMO if you're that advanced you will look at the patch code as it's just a text file with operations (which, TIL, GitHub even color-codes ;)).

Am I missing something here? :)

You really have done a fantastic job with this, seemingly out of nowhere... so impressive. I had not seen that explanation section before. I can’t wait till the weekend where I will be able to have a play with this. Thank you! :)
Thanks! It's always nice to hear that. I just had a couple of days with nothing to do and a problem on hand. I googled quite a bit and everybody was either removing that error or complaining that it doesn't work on a new kernel. This prompted "oh right, so WHY and HOW this actually works like that? And wtf is even RMRR". That's how that deep-dive document was created and this is why it contains a ton of links to parts of the kernel and different specs (I was going hint-by-hint and just leaving scratch notes on the side).

The patch itself when I got what is the problem and how kernel works in that area took me literally 10 minutes to do. Let me know if it works properly for you.
 
Last edited:
  • Like
Reactions: Whitterquick
This is awesome! You are a MS Gen8 expert lol
Haha thanks ;) Gen8 is an amazing piece of hardware for a homelab, especially since you can plop Xeon E3-1240v2 & 16GB ram there. Mine is actually running with 9 hard drives in it.
IMG_5696.jpg


How do you know which USB port corresponds to which number?
The easiest way is just plugging something into the port you want to forward and assigning using "Use USB Port". This way you know which port is which, and the port itself is available to the VM. I'm not sure how it works under the hood (vs. forwarding just the device), but I'm suspecting that the host still processes all the hot-plug events and low-level stuff but as soon as device is plugged it dynamically forwards it to the guest.
This shouldn't cause problems with ordinary USB devices, but if this is the case you may have issues with some specialized purpose-build devices which don't fully implement USB spec (e.g. some hardware license dongles). In such case you will have to pass the whole controller.
Screen Shot 2020-11-01 at 6.35.31 PM.png
 
Haha thanks ;) Gen8 is an amazing piece of hardware for a homelab, especially since you can plop Xeon E3-1240v2 & 16GB ram there. Mine is actually running with 9 hard drives in it.
View attachment 20889



The easiest way is just plugging something into the port you want to forward and assigning using "Use USB Port". This way you know which port is which, and the port itself is available to the VM. I'm not sure how it works under the hood (vs. forwarding just the device), but I'm suspecting that the host still processes all the hot-plug events and low-level stuff but as soon as device is plugged it dynamically forwards it to the guest.
This shouldn't cause problems with ordinary USB devices, but if this is the case you may have issues with some specialized purpose-build devices which don't fully implement USB spec (e.g. some hardware license dongles). In such case you will have to pass the whole controller.
View attachment 20890
9!! Wow, including a big one outside the bays lol! Do you keep the cover off like that?
I have 6 in mine and could also manage 9 but don’t have any real need for it at this time.

Why Xeon E3-1240v2 and not E3-1265v2 ??

I have passed through the whole controller to add extra external drives but was wanting to have a couple of different VMs have access to a USB port each. When the whole controller is passed through only one VM can access it directly unfortunately. Trial and error I guess...

Still not sure what the Spice port is (despite reading the documentation manuals).
 
9!! Wow, including a big one outside the bays lol! Do you keep the cover off like that?
I have 6 in mine and could also manage 9 but don’t have any real need for it at this time.
No, it closes sungly. You cannot really keep it open as this screws up the airflow. With the case open and 100% load on all cores I hit ~97°C while with case closed the max I saw was 80 (running stock "35W" cooler + 40mm Noctua).

Why Xeon E3-1240v2 and not E3-1265v2 ??
Two-and-a-half reasons:
  1. 1240v2 is substantially faster than 1265Lv2 (Intel and their confusing naming schema... the later is actually a "low power" version)
  2. Non-lowepower versions are much easier to come by and thus much cheaper
  3. 1230 or 40 are the sweet spot price-vs-performance. Anything above 1240 is usually pricy.
There's a lot of discussion about whether running 69W TDP CPU is fine, but HPE officially offers 65W CPUs with a better cooler. VRMs are fine, the power supply (at least the 200W EMEA one) is sufficient too (in US the server has 150W one, which with 9 HDDs in my system was putting surges of ~145W... I didn't like it so I swapped it for $25 to a 250W Seasonic one :D). The stock "35W" cooler is perfectly fine as long as the ambient is not much higher than ~25°C while running at 100%.

I have passed through the whole controller to add extra external drives but was wanting to have a couple of different VMs have access to a USB port each. When the whole controller is passed through only one VM can access it directly unfortunately. Trial and error I guess...
This is why you want to pass the port only. For example I just passed my UPS to a separate VM with just NUT running to inform other devices of the state of the battery (container would be fine too, but an Alpine Linux VM eats ~30MB of RAM so...).

Still not sure what the Spice port is (despite reading the documentation manuals).
Have you ever used Windows RDP/Remote Desktop? You can pass a USB device connected to the client (e.g. a printer on your desk) to the remote system running in a VM. SPICE port is essentially that but in Proxmox/Linux world.
 
No, it closes sungly. You cannot really keep it open as this screws up the airflow. With the case open and 100% load on all cores I hit ~97°C while with case closed the max I saw was 80 (running stock "35W" cooler + 40mm Noctua).
I did not know that! I always thought open air would give the best/lowest temperatures.


Two-and-a-half reasons:
  1. 1240v2 is substantially faster than 1265Lv2 (Intel and their confusing naming schema... the later is actually a "low power" version)
  2. Non-lowepower versions are much easier to come by and thus much cheaper
  3. 1230 or 40 are the sweet spot price-vs-performance. Anything above 1240 is usually pricy.
There's a lot of discussion about whether running 69W TDP CPU is fine, but HPE officially offers 65W CPUs with a better cooler. VRMs are fine, the power supply (at least the 200W EMEA one) is sufficient too (in US the server has 150W one, which with 9 HDDs in my system was putting surges of ~145W... I didn't like it so I swapped it for $25 to a 250W Seasonic one :D). The stock "35W" cooler is perfectly fine as long as the ambient is not much higher than ~25°C while running at 100%.
Oh yes, the TDP, that would be why I opted against anything higher. I had wondered about swapping out the PSU but it seemed complicated and I was unsure if the temperature sensor and acoustics would be affected.


Have you ever used Windows RDP/Remote Desktop? You can pass a USB device connected to the client (e.g. a printer on your desk) to the remote system running in a VM. SPICE port is essentially that but in Proxmox/Linux world.
I believe this is exactly what I have been looking for, but was not sure if it was possible. I will be testing this when I get a chance, thanks! :)
 
  • Like
Reactions: kiler129
Does this patch work with the latest version?
I tested it a few minutes ago with VE 6.3.2

this is the rror whgile starting a VM

Code:
kvm: -device vfio-pci,host=0000:04:00.0,id=hostpci0,bus=pci.0,addr=0x10,rombar=0: VFIO_MAP_DMA failed: Invalid argument
kvm: -device vfio-pci,host=0000:04:00.0,id=hostpci0,bus=pci.0,addr=0x10,rombar=0: vfio 0000:04:00.0: failed to setup container for group 36: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x7f1486c67580, 0x0, 0x80000000, 0x7f1203400000) = -22 (Invalid argument)
TASK ERROR: start failed: QEMU exited with code 1

not sure if i made a mistake
 
Last edited:
I tested it a few minutes ago with VE 6.3.2

this is the rror whgile starting a VM

Code:
kvm: -device vfio-pci,host=0000:04:00.0,id=hostpci0,bus=pci.0,addr=0x10,rombar=0: VFIO_MAP_DMA failed: Invalid argument
kvm: -device vfio-pci,host=0000:04:00.0,id=hostpci0,bus=pci.0,addr=0x10,rombar=0: vfio 0000:04:00.0: failed to setup container for group 36: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x7f1486c67580, 0x0, 0x80000000, 0x7f1203400000) = -22 (Invalid argument)
TASK ERROR: start failed: QEMU exited with code 1

not shure if i made a mistake
Hmm that sucks, hopefully u done something wrong lol
 
It worked before but doesn’t work with the new release, right?
I installed proxmox today

output of pveversion:
Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve-relaxablermrr)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-libc-dev: 5.4.73-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.73-1-pve-relaxablermrr: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve3
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
  • Like
Reactions: Whitterquick
Does this patch work with the latest version?
while reading this thread I found the link to this github reop and tested it and it worked!

I was testing it before releasing new packages to the public ;) With any new release you can keep the kernel (especially with such small updates like here, we're still on 5.4).

New packages are released on my GH (https://github.com/kiler129/relax-intel-rmrr/releases) and indeed it runs perfectly on 6.3:
Code:
# pveversion
pve-manager/6.3-2/22f57405 (running kernel: 5.4.78-1-pve-relaxablermrr)

# dmesg | grep 'Intel-IOMMU'
[    0.048833] DMAR: Intel-IOMMU: assuming all RMRRs are relaxable. This can lead to instability or data loss


Also, if you're planning to mess with the built-in RAID/SAS controller you should probably look at another piece of info too: https://gist.github.com/kiler129/4f765e8fdc41e1709f1f34f7f8f41706
 
  • Like
Reactions: Whitterquick

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!