PCIe Passthrough freezes Proxmox

Kuakski

New Member
May 4, 2016
9
0
1
Hi everyone!

I have a small homelab running Proxmox. On one of my VMs I'm running pfSense and I'm pretty satisfied with the results. The only thing that doesn't work as I want is the wlan device, a Ralink RT3090. pfSense gets access to wlan via pci-passthrough, and that part works very well with the RT3090, but the card has little support in FreeBSD, and even though it is a B/G/N card, I can only use B/G mode into pfSense, with a real performance that is around 18Mbps.

At this point I know that many would suggest to drop the idea to control wireless access into pfSense and go with an access point (maybe an Uibiquiti?). I just think it would be nice to have all I need in a single box, and also I feel that I've so much to learn from this experience, so I just don't want to give up so easily.

I tried to find another Mini PCI Express wireless card that works well with pfSense, and I bought 2 different cards for the purpose. The results so far are not so encouraging, so I'm asking for help to see if I'm doing something wrong.

Qualcomm Atheros AR928X Wireless Network Adapter (PCI-Express) (rev 01)
Replacing the Ralink with this Atheros gives me the following error when I try to start the VM:

Code:
Running as unit 100.scope.
kvm: -device vfio-pci,host=04:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio: msix_init failed
kvm: -device vfio-pci,host=04:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio: 0000:04:00.0 Error adding PCI capability 0x11[0x6f]@0x90: -22
kvm: -device vfio-pci,host=04:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: Device initialization failed

Qualcomm Atheros AR9485 Wireless Network Adapter (rev 01)
With this one it goes even worse: as soon as I start the VM, the whole Proxmox freezes and I then need to force-stop the machine. No error is shown on console, and I've found nothing about the freeze in the syslog.


Is passthrough a thing that only the CPU has to support? Is it just possible that these Atheros cards don't allow passthrough?
 
Dealing with this myself right now on a PCEngines APU2C4 board running Proxmox, and from my findings it seems that the
msix_init error on the AR928X is an issue with the card, and sadly there isn't much that can be done for it. As for the AR9485, it probably has the same issue my QCA9882 has, which is needs a fixup in the PCI driver to prevent it from hanging. This can be found at https://github.com/torvalds/linux/blob/master/drivers/pci/quirks.c#L3192, and I am working on making a patch for my QCA9882 card for the PVE kernel to confirm if this is the case or not.
 
riptide_wave, thanks for your reply. Your answer generated a few more questions:

1) How do I make a patch for my card?
2) Will I need to do it again every time I'll update the kernel?
3) Is there any chance this fixup will be included in the PVE kernel?
4) Isn't it better to find another wifi card that does the job? In this case, is there a list of wireless controllers that support passthrough?
 
riptide_wave, thanks for your reply. Your answer generated a few more questions:

1) How do I make a patch for my card?
2) Will I need to do it again every time I'll update the kernel?
3) Is there any chance this fixup will be included in the PVE kernel?
4) Isn't it better to find another wifi card that does the job? In this case, is there a list of wireless controllers that support passthrough?

1: Get the PCI ID for your specific card, which can be got from the "lspci -n -s 05:00" command, where 05:00 is the host port for the card. This will return something like:
05:00.0 0280: 168c:002a (rev 01)

The 002a is the information you need, as this is the PCI ID for the specific card. With this, you will want to download the source for the pve-kernel and edit the /drivers/pci/quirks.c to add a new fixup for this board, which would look like:
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x002a, quirk_no_bus_reset);

And would be added around line 3137~ by the other Atheros patch.

2. Yep, this patch requires you to edit, and re-compile the kernel source, and manually generate a .deb install package for the PVE kernel.

3. If you can confirm it works, you can try submitting it. It seems they already have other patches for quirks.c merged in their source at https://git.proxmox.com/?p=pve-kernel.git;a=tree;hb=HEAD

4. I have heard good things about newer Intel wireless cards playing nice with passthrough (due to working MSI-X with passthrough) but I have personally not confirmed this. May be worth googling/looking around for confirmation.

EDIT: Made a patch for my QCA988X board, and is working as intended with OpenWRT! :) If you share with me your PCI ID for the board, I can add it to my patch, and try building a kernel you can test. Also be sure to blacklist all ath9k/ath10k modules from your Proxmox node, as you only want the VM to be loading these.
 
Last edited:
  • Like
Reactions: Kuakski
It sounds very interesting, and kind of doable. I never recompiled a linux kernel, but now I have a good reason to do it!
I'll get back as soon as I can dedicate some time to it.

Thanks, you've been so inspiring! :)
 
Ok, I've installed Proxmox on a VM on my laptop and tried to understand how to edit and recompile the pve-kernel, but I feel quite stuck.
Is there any guide out there? I haven't found anything on this forum that I could use...

So far I've tried this:
Code:
git clone git://git.proxmox.com/git/pve-kernel.git
git pull
cd pve-kernel
make

This way I didn't even edit the code, and it already throws the following error:
Code:
gcc --version|grep "4\.9" || false
gcc (Debian 4.9.2-10) 4.9.2
cp config-4.4.8.org ubuntu-xenial/.config
cd ubuntu-xenial; ./scripts/config -d CONFIG_SND_PCM_OSS -d CONFIG_TRANSPARENT_HUGEPAGE_MADVISE -d CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS -e CONFIG_TRANSPARENT_HUGEPAGE_NEVER -m CONFIG_CEPH_FS -m CONFIG_BLK_DEV_NBD -m CONFIG_BLK_DEV_RBD -m CONFIG_BCACHE -m CONFIG_JFS_FS -m CONFIG_HFS_FS -m CONFIG_HFSPLUS_FS -e CONFIG_BRIDGE -e CONFIG_BRIDGE_NETFILTER -e CONFIG_BLK_DEV_SD -e CONFIG_BLK_DEV_SR -e CONFIG_BLK_DEV_DM -e CONFIG_BLK_DEV_NVME -d CONFIG_INPUT_EVBUG -d CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND -e CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE -d CONFIG_MODULE_SIG -d CONFIG_MEMCG_DISABLED -e CONFIG_MEMCG_SWAP_ENABLED -e CONFIG_MEMCG_KMEM -d CONFIG_DEFAULT_CFQ --set-str CONFIG_DEFAULT_IOSCHED deadline -d CONFIG_DEFAULT_SECURITY_DAC -e CONFIG_DEFAULT_SECURITY_APPARMOR --set-str CONFIG_DEFAULT_SECURITY apparmor
cd ubuntu-xenial; make oldconfig
make[1]: Entering directory '/usr/src/pve-kernel/ubuntu-xenial'
scripts/kconfig/conf  --oldconfig Kconfig
.config:4244:warning: override: M686 changes choice state
#
# configuration written to .config
#
make[1]: Leaving directory '/usr/src/pve-kernel/ubuntu-xenial'
cd ubuntu-xenial; make -j 8
make[1]: Entering directory '/usr/src/pve-kernel/ubuntu-xenial'
scripts/kconfig/conf  --silentoldconfig Kconfig
  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  HOSTCC  scripts/extract-cert
scripts/extract-cert.c:21:25: fatal error: openssl/bio.h: No such file or directory
#include <openssl/bio.h>
                         ^
compilation terminated.
scripts/Makefile.host:91: recipe for target 'scripts/extract-cert' failed
make[2]: *** [scripts/extract-cert] Error 1
Makefile:555: recipe for target 'scripts' failed
make[1]: *** [scripts] Error 2
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory '/usr/src/pve-kernel/ubuntu-xenial'
Makefile:221: recipe for target '.compile_mark' failed
make: *** [.compile_mark] Error 2


What am I missing?
 
From the following error:
scripts/extract-cert.c:21:25: fatal error: openssl/bio.h: No such file or directory

Looks like you are missing openssl-dev on your system, so the make process is erroring out. Expect lots of random errors like this on the first compile, just look up the package each dependency is from and install it on your system.
 
Ok, after a few tries I finally got a compiled kernel.
I got some warnings along the way, but I guess it'll be fine.

Sometimes it just wrote "please install package X" but the process continued anyway, like here:
Code:
Auto-detecting system features:
...                         dwarf: [ OFF ]
...                         glibc: [ on  ]
...                          gtk2: [ on  ]
...                      libaudit: [ on  ]
...                        libbfd: [ OFF ]
...                        libelf: [ on  ]
...                       libnuma: [ on  ]
...        numa_num_possible_cpus: [ on  ]
...                       libperl: [ on  ]
...                     libpython: [ on  ]
...                      libslang: [ on  ]
...                     libunwind: [ OFF ]
...            libdw-dwarf-unwind: [ OFF ]
...                          zlib: [ on  ]
...                          lzma: [ on  ]
...                     get_cpuid: [ on  ]
...                           bpf: [ on  ]

config/Makefile:270: No libdw DWARF unwind found, Please install elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR
config/Makefile:274: No libdw.h found or old libdw.h found or elfutils is older than 0.138, disables dwarf support. Please install new elfutils-devel/libdw-dev
config/Makefile:332: No libunwind found. Please install libunwind-dev[el] >= 1.1 and/or set LIBUNWIND_DIR
config/Makefile:350: Disabling post unwind, no support found.
config/Makefile:552: No bfd.h/libbfd found, please install binutils-dev[el]/zlib-static/libiberty-dev to gain symbol demangling


Then is created the file pve-kernel-4.4.8-1-pve_4.4.8-51_amd64.deb and continued building the pve-firmware. At that point I got another error:

Code:
Checking connectivity... done.
./find-firmware.pl data/lib/modules/4.4.8-1-pve >fwlist.tmp
mv fwlist.tmp fwlist-4.4.8-1-pve
rm -rf fwdata
mkdir -p fwdata/lib/firmware
./assemble-firmware.pl fwlist-4.4.8-1-pve fwdata/lib/firmware
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
found dvb-demod-mn88473-01.fw in dvb-firmware.git/firmware/dvb-demod-mn88473-01.fw
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
unable to find firmware: aic94xx-seq.fw kernel/drivers/scsi/aic94xx/aic94xx.ko
Makefile:389: recipe for target 'pve-firmware_1.1-8_all.deb' failed
make: *** [pve-firmware_1.1-8_all.deb] Error 1

Now, I installed the new kernel on the VM using dpkg -i and it seems to work without problems: do I really need to build and install pve-firmware to install the kernel on the homelab?

It would be nice to understand what went wrong with the missing driver, but I'm much more curious to see if this solved my original problem!

EDIT: The web interface is not working anymore on the VM where I installed the new kernel. The service pveproxy is not running and if I try to start it I get:
Code:
root@pve:~# service pveproxy start
Failed to restart pveproxy.service: Unit pveproxy.service is masked.

A quick cat /var/log/syslog | grep pveproxy didn't show anything strange, it just stopped logging since I've installed the kernel. I restarted the VM, but it didn't help.

Did I forget to do something that must be done after a kernel installation?
 
Last edited:
Ok, after a few tries I finally got a compiled kernel.
I got some warnings along the way, but I guess it'll be fine.

Sometimes it just wrote "please install package X" but the process continued anyway, like here:
Code:
Auto-detecting system features:
...                         dwarf: [ OFF ]
...                         glibc: [ on  ]
...                          gtk2: [ on  ]
...                      libaudit: [ on  ]
...                        libbfd: [ OFF ]
...                        libelf: [ on  ]
...                       libnuma: [ on  ]
...        numa_num_possible_cpus: [ on  ]
...                       libperl: [ on  ]
...                     libpython: [ on  ]
...                      libslang: [ on  ]
...                     libunwind: [ OFF ]
...            libdw-dwarf-unwind: [ OFF ]
...                          zlib: [ on  ]
...                          lzma: [ on  ]
...                     get_cpuid: [ on  ]
...                           bpf: [ on  ]

config/Makefile:270: No libdw DWARF unwind found, Please install elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR
config/Makefile:274: No libdw.h found or old libdw.h found or elfutils is older than 0.138, disables dwarf support. Please install new elfutils-devel/libdw-dev
config/Makefile:332: No libunwind found. Please install libunwind-dev[el] >= 1.1 and/or set LIBUNWIND_DIR
config/Makefile:350: Disabling post unwind, no support found.
config/Makefile:552: No bfd.h/libbfd found, please install binutils-dev[el]/zlib-static/libiberty-dev to gain symbol demangling


Then is created the file pve-kernel-4.4.8-1-pve_4.4.8-51_amd64.deb and continued building the pve-firmware. At that point I got another error:

Code:
Checking connectivity... done.
./find-firmware.pl data/lib/modules/4.4.8-1-pve >fwlist.tmp
mv fwlist.tmp fwlist-4.4.8-1-pve
rm -rf fwdata
mkdir -p fwdata/lib/firmware
./assemble-firmware.pl fwlist-4.4.8-1-pve fwdata/lib/firmware
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
found dvb-demod-mn88473-01.fw in dvb-firmware.git/firmware/dvb-demod-mn88473-01.fw
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
find: `firmware-misc': No such file or directory
unable to find firmware: aic94xx-seq.fw kernel/drivers/scsi/aic94xx/aic94xx.ko
Makefile:389: recipe for target 'pve-firmware_1.1-8_all.deb' failed
make: *** [pve-firmware_1.1-8_all.deb] Error 1

Now, I installed the new kernel on the VM using dpkg -i and it seems to work without problems: do I really need to build and install pve-firmware to install the kernel on the homelab?

It would be nice to understand what went wrong with the missing driver, but I'm much more curious to see if this solved my original problem!

EDIT: The web interface is not working anymore on the VM where I installed the new kernel. The service pveproxy is not running and if I try to start it I get:
Code:
root@pve:~# service pveproxy start
Failed to restart pveproxy.service: Unit pveproxy.service is masked.

A quick cat /var/log/syslog | grep pveproxy didn't show anything strange, it just stopped logging since I've installed the kernel. I restarted the VM, but it didn't help.

Did I forget to do something that must be done after a kernel installation?

So to start, as long as your server was on the 4.4.8-51 kernel before hand, you should not need the pve-firmware .deb as it would just compile the same version you already have. As for the errors in building, normally just means your git pull maybe was missing some files on the original sync, or there may be something wrong with your build environment. I built my packages on a Debian 8 host and had no issues after installing the packages that were asked for.

As for the pveproxy error, this is a userspace error and nothing with the kernel. May want to try the following to see if this helps you get it fixed https://fedoramagazine.org/systemd-masking-units/.

Also, as for your patch for your ath10k card, did you make a .patch file and add it to the Makefile script? If not, it most likely was not applied unless you edited the file in the .tgz in the repo. You may want to look at my patch as an example: http://servernetworktech.com/uploads/files/fix_no_bus_reset_qca988X.patch

As mentioned earlier I am also willing to build you a pve-kernel .deb with the patch applied if you would like, just share with me the PCI ID of your card, and I can add it to my patch/kernel for you to test. If all works well, I may also submit it upstream to linux-pci to get it officially added.
 
It worked! :D
I installed my kernel and rebooted, everything was working fine. Changed the wireless card, I could start the pfSense VM without troubles, and now I'm using the AR9485 card for my wireless network. So sweet!

When the VM starts and the passthrough is activated, Proxmox writes this on the console:
Code:
vfio-pci 0000:04:00.0: Invalid ROM contents
I don't really know what that means, but it doesn't seem to cause any issue...

I saw that the Makefile script deletes the ubuntu-xenial folder and extracts the .tgz file, so I created a new .tgz with the updated quirks.c file. I know that a patch is the right way to do it, but this time I was just curious to see if it worked. It seems like I have to use git to create the patch file, I'll soon look into it.

The pci-device code for the Atheros AR9485 is 0032. I'll be even more thankful if you can add the code to your patch when you submit it.
Thank you so much for all the precious help!
 
Awesome! :)

As for the ROM error, this can be ignored as it won't affect the functionality. As for the patch, I went ahead and submitted them upstream at http://comments.gmane.org/gmane.linux.kernel.pci/52039

EDIT: Patch was merged, targeted for Linux 4.8 release.

EDIT2: Patch was accepted, and was also back-ported to 4.4 so once Ubuntu updates their repo it should be available within a future Proxmox kernel update. :)

EDIT3: Patch is now in the Proxmox kernel in the No-Subscription repo!
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!