Passing a RTX A2000 back and forth between VM and LXC: Failed to initialize NVML: Unknown Error

alpha754293

Member
Jan 8, 2023
94
18
8
I am trying to set up my Proxmox system so that I will be able to use my one Nvidia RTX A2000 across different VMs or LXC containers, but with only one of them running at a time.

I recently watched a Jim's Garage YouTube video on how to split your GPU between LXC containers.

My setup is a little bit different as I am testing out different combination of this in terms of what works and what doesn't.

What I have found is that if I boot up my Proxmox server, my CentOS 7.9.2009 LXC container will be able to use my A2000 for GPU accelerated CFD simulations.

But if I shutdown said LXC container, and then spin up my CentOS VM, it will then be able to use the A2000 for other GPU accelerated tasks. No problem, right?

But if I shut the VM down, and then start the LXC container back up -- when I check to make sure that it is able to "see" the A2000 via "nvidia-smi", I get the error message:

"Failed to initialize NVML: Unknown Error"

I know that the configuration for /etc/modprobe.d/blacklist.conf is configured properly, as well as /etc/modules-load.d/modules.conf, /etc/udev/rules.d/70-nvidia.rules, as well as the changes that are needed to the <<CTID>>.conf files because it runs with a fresh boot.

If I reboot the Promox server, then I can get the CT to be able to "see" and use the A2000 again.

But the moment that I shut down the CT and spin up the VM, I won't be able to spin up the CT again and have it "see" and use said A2000.

So if I want to be able to pass the GPU back and forth between CT <-> VM, what would be the best way for me to do this?

I imagine that there's got to be a way to be able to do this given that cloud providers need to be able to provision the hardware "at will", depending on the different needs and use cases, for their different customers.

Your help and advice is greatly appreciated.

Thank you.
 
Last edited:
The vfio-pci driver is still bound to the device after the VM is shut down.
You can unbind it manually:
Code:
echo "0000:01:00.0" > "/sys/bus/pci/devices/0000:01:00.0/driver/unbind"
Replace 0000:01:00.0 with your PCI device path.

Now you can bind the device to the host driver, which in the case of an Nvidia card is probably nouveau
Code:
echo "0000:01:00.0" > "/sys/bus/pci/drivers/nouveau/bind"
 
The vfio-pci driver is still bound to the device after the VM is shut down.
You can unbind it manually:
Code:
echo "0000:01:00.0" > "/sys/bus/pci/devices/0000:01:00.0/driver/unbind"
Replace 0000:01:00.0 with your PCI device path.

Now you can bind the device to the host driver, which in the case of an Nvidia card is probably nouveau
Code:
echo "0000:01:00.0" > "/sys/bus/pci/drivers/nouveau/bind"
Will I have to do this after each startup and shutdown (whether it's a VM or a LXC container) or will doing this once suffice?

Your help is greatly appreciated.

Thank you.
 
This will have to be done after each shutdown of the VM. But it can be automated with a hookscript that runs on VM shutdown.
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_hookscripts

Thank you.

I shouldn't have to do this with the LXC container, correct?

And if I have a LXC container that's using the GPU and then spin up the VM -- will I still be able to "share" said GPU between the CT and the VM, or will the VM "take over"/assume all control over said GPU?

Your help is greatly appreciated.

Thank you.
 
I shouldn't have to do this with the LXC container, correct?
Correct

And if I have a LXC container that's using the GPU and then spin up the VM -- will I still be able to "share" said GPU between the CT and the VM, or will the VM "take over"/assume all control over said GPU?
The VM will take over, since the host driver gets unbound.
 
Correct


The VM will take over, since the host driver gets unbound.

So here is the output of lspci -k:

Code:
05:00.0 VGA compatible controller: NVIDIA Corporation Device 2531 (rev a1)
        Subsystem: NVIDIA Corporation Device 151d
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
05:00.1 Audio device: NVIDIA Corporation Device 228e (rev a1)
        Subsystem: NVIDIA Corporation Device 151d
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

Here is the contents of /var/lib/vz/snippets/gpu-hookscript.pl:

Code:
#!/usr/bin/perl
# Exmple hook script for PVE guests (hookscript config option)
# You can set this via pct/qm with
# pct set <vmid> -hookscript <volume-id>
# qm set <vmid> -hookscript <volume-id>
# where <volume-id> has to be an executable file in the snippets folder
# of any storage with directories e.g.:
# qm set 100 -hookscript local:snippets/hookscript.pl
use strict;
use warnings;
print "GUEST HOOK: " . join(' ', @ARGV). "\n";
# First argument is the vmid
my $vmid = shift;
# Second argument is the phase
my $phase = shift;
if ($phase eq 'pre-start') {
# First phase 'pre-start' will be executed before the guest
    # ist started. Exiting with a code != 0 will abort the start
print "$vmid is starting, doing preparations.\n";
system('echo 1 > /sys/bus/pci/devices/0000\:05\:00.0/remove');
system('echo 1 > /sys/bus/pci/rescan');
# print "preparations failed, aborting."
    # exit(1);
} elsif ($phase eq 'post-start') {
# Second phase 'post-start' will be executed after the guest
    # successfully started.
print "$vmid started successfully.\n";
} elsif ($phase eq 'pre-stop') {
# Third phase 'pre-stop' will be executed before stopping the guest
    # via the API. Will not be executed if the guest is stopped from
    # within e.g., with a 'poweroff'
print "$vmid will be stopped.\n";
} elsif ($phase eq 'post-stop') {
# Last phase 'post-stop' will be executed after the guest stopped.
    # This should even be executed in case the guest crashes or stopped
    # unexpectedly.
print "$vmid stopped. Doing cleanup.\n";
system('echo "0000:05:00.0" > "/sys/bus/pci/devices/0000:05:00.0/driver/unbind"');
system('echo "0000:05:00.0" > "/sys/bus/pci/drivers/nouveau/bind"');
} else {
    die "got unknown phase '$phase'\n";
}
exit(0);

I set up a Windows 10 VM, installed Halo Infinite in it, ran it, and then shutdown said VM, and then on the Proxmox host, typed in nvidia-smi and it returned Failed to initialize NVML: Unknown Error, which suggests to me that it still isn't releasing the GPU properly.

Any suggestions or ideas in regards to what else I would be able to try?

Your help is greatly appreciated.
 
Any suggestions or ideas in regards to what else I would be able to try?
Maybe it's because you only unbind and load the actual driver for 05:00.0 and not also 05:00.1?
Maybe it's because you load the open-source driver nouveau but you use NVidia software and maybe that requires the nvidia driver?
 
The issue you are having is due to the GPU being available on the Host. You have to make sure to disable the GPU on the Host Hypervisor either by blacklisting the drivers or blacklisting the GPUs hardware IDs. Once this is done, it will most likely correct the problem you are having.

https://pve.proxmox.com/wiki/PCI(e)_Passthrough
It has already been blacklisted.

Code:
root@pve:~# cat /etc/default/grub
...
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream nofb nomodeset initcall_blacklist=sysfb_init video=vesafbff,efifbff vfio-pci.ids=10de:2531,10de:228e disable_vga=1 systemd.unified_cgroup_hierarchy=0"
...
root@pve:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2531,10de:228e disable_vga=1
root@pve:~# cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1
root@pve:~# cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1
root@pve:~# cat /etc/modprobe.d/pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE 

# nvidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
blacklist nvidia
blacklist nouveau
blacklist radeon
root@pve:~# cat /etc/modprobe.d/blacklist.conf
blacklist nvidiafb
blacklist nvidia
blacklist nouveau
blacklist radeon
root@pve:~# cat /etc/modules
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Thank you.
 
is there any method to do this gpu from vm to host/lxc yet?I met the same situation, after I bind the gpu with nvidia driver again after vm shutdown, everything seems alright until I restart the gdm3 service on the host,host graphic blackout and journalctl shows there is "acpi bios error ae_already_exists" error,whole system still running but everything associate with this gpu just end with timeout unless reboot the whole system

thank you for any reply
 
Hi,

I'm willing to do the same: switch my nvidia GPU passthrough between 1 VM and 1 LXC.
My server is already configured to do the passthrough to the VM and its working perfectly.

I need help to get it working with the LXC.
For now, I try to get inspiration from this tutorial : https://theorangeone.net/posts/lxc-nvidia-gpu-passthrough/
But since my GPU uses vfio-pci driver on boot, the host doesn't see the GPU and I can't pass it to the LXC.

Should I remove the binding to vfio from the bootloader and the modprobe conf files ? Then, how to easily readd them to pass the GPU back to the VM ? I'm a bit lost here.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!