Pardon my stupid question about unbinding the Nvidia driver/GPU from a VM

alpha754293 · Nov 6, 2024

I ran the search here for "unbind" and found some examples for how to unbind it for AMD GPUs.

I am not a programmer nor a developer, so I am not sure how I would "convert"/"translate"/"adapt" that for Nvidia GPUs.

Here is the hookscript that I have (which someone else wrote using Perl), that I am using to bind the GPU to a VM:

Code:

#!/usr/bin/perl
# Exmple hook script for PVE guests (hookscript config option)
# You can set this via pct/qm with
# pct set <vmid> -hookscript <volume-id>
# qm set <vmid> -hookscript <volume-id>
# where <volume-id> has to be an executable file in the snippets folder
# of any storage with directories e.g.:
# qm set 100 -hookscript local:snippets/hookscript.pl
use strict;
use warnings;
print "GUEST HOOK: " . join(' ', @ARGV). "\n";
# First argument is the vmid
my $vmid = shift;
# Second argument is the phase
my $phase = shift;
if ($phase eq 'pre-start') {
# First phase 'pre-start' will be executed before the guest
    # ist started. Exiting with a code != 0 will abort the start
print "$vmid is starting, doing preparations.\n";
system('echo 1 > /sys/bus/pci/devices/0000\:81\:00.0/remove');
system('echo 1 > /sys/bus/pci/rescan');
# print "preparations failed, aborting."
    # exit(1);
} elsif ($phase eq 'post-start') {
# Second phase 'post-start' will be executed after the guest
    # successfully started.
print "$vmid started successfully.\n";
} elsif ($phase eq 'pre-stop') {
# Third phase 'pre-stop' will be executed before stopping the guest
    # via the API. Will not be executed if the guest is stopped from
    # within e.g., with a 'poweroff'
print "$vmid will be stopped.\n";
} elsif ($phase eq 'post-stop') {
# Last phase 'post-stop' will be executed after the guest stopped.
    # This should even be executed in case the guest crashes or stopped
    # unexpectedly.
print "$vmid stopped. Doing cleanup.\n";
} else {
    die "got unknown phase '$phase'\n";
}
exit(0);

The two questions that I have are:

1) How do I adapt this so that it will unbind the Nvidia GPU from said VM?

and 2), it says in the script there that if I shutdown the VM from inside the VM, that it will not unbind the GPU from said VM.

So given that, is there another way that I can re-write this Perl script such that it will unbind the GPU from the VM, even if the VM is shut down from inside the VM?

Your help is greatly appreciated.

(P.S. I am using Proxmox 7.4-17 still. I assume that the procedure will still be valid for Proxmox 8 as even with a mapped device (or even a raw device, in Proxmox 8), it still won't unbind the device automatically, even though the Proxmox GUI in Proxmox 8 helps with adding a PCI device, but doesn't really help to unbind the PCI device from the VM, correct? Please educate me if my understanding about Proxmox 8 PCIe device unbinding is incorrect. Thank you.)

alpha754293 · Nov 6, 2024

Also, is it possible for me to share a GPU between LXCs and a VM?

(i.e. such that when the VM isn't running, the LXCs are able to use the GPU, but if the VM is running, then the LXCs will relinquish said GPU until the VM stops?)

dcsapak · Nov 7, 2024

hi,

alpha754293 said:
1) How do I adapt this so that it will unbind the Nvidia GPU from said VM?

what exactly do you mean with this? if a vm is shut off, the vm is not running, so the card is not 'bound' to the vm anymore
do you mean you want to rebind it to the 'nvidia' driver after a vm shutdown?

alpha754293 said:
and 2), it says in the script there that if I shutdown the VM from inside the VM, that it will not unbind the GPU from said VM.

what do you mean here/where does it say this?

alpha754293 said:
Also, is it possible for me to share a GPU between LXCs and a VM?

(i.e. such that when the VM isn't running, the LXCs are able to use the GPU, but if the VM is running, then the LXCs will relinquish said GPU until the VM stops?)

it could work, but the containers would have to be stopped before the vm is started, since they probably access parts via the driver/kernel
also the would have to be restarted after the vm is shut down again, so the device path can be bound again to the container

what exactly do you want to solve? while most of this is technically possible in some way, it seems very brittle to me and will most probably cause some issues (e.g. if the containers were not shutdown before the vm starts, e.g.)
it's probably way easier to add a second gpu, so you have one for the host+containers and one for the vm

alpha754293 · Nov 9, 2024

dcsapak said:
what exactly do you mean with this? if a vm is shut off, the vm is not running, so the card is not 'bound' to the vm anymore
do you mean you want to rebind it to the 'nvidia' driver after a vm shutdown?

That has not been the observed behaviour.

When the VM is shut off/isn't running anymore, the GPU still appears to be "bound"/"allocated" the VM, regardless of whether the VM is running or not.

i.e. LXCs can't use it.

dcsapak said:
what do you mean here/where does it say this?

Code:

# Third phase 'pre-stop' will be executed before stopping the guest
    # via the API. Will not be executed if the guest is stopped from
    # within e.g., with a 'poweroff'

dcsapak said:
it could work, but the containers would have to be stopped before the vm is started, since they probably access parts via the driver/kernel
also the would have to be restarted after the vm is shut down again, so the device path can be bound again to the container

what exactly do you want to solve? while most of this is technically possible in some way, it seems very brittle to me and will most probably cause some issues (e.g. if the containers were not shutdown before the vm starts, e.g.)
it's probably way easier to add a second gpu, so you have one for the host+containers and one for the vm

One of the LLMs that the LXC runs takes about 40 GB of VRAM, split over two 3090s already.

Don't have any free/open slots for this. (More specifically/technically speaking, the 3rd slot, which is I think a PCIe 3.0 x8 in a x16 slot physically, is blocked by the 2nd GPU and there isn't enough clearance in there for me to snake a PCIe riser/extension in there.)

Ollama/WebUI is set such that it will unload the LLM model after about 5 minutes or so.

This means that if I am not using my locally hosted LLM, then I can allocate that GPU to a gaming VM. (Whether that's Windows or Nobara Linux.)

But when I want to run said LLM, I would run the LLM.

The other alternative, I suppose, would be to just run everything through Nobara at which point, I guess that I can go with a bare metal install.

leesteken · Nov 9, 2024

alpha754293 said:
That has not been the observed behaviour.

When the VM is shut off/isn't running anymore, the GPU still appears to be "bound"/"allocated" the VM, regardless of whether the VM is running or not.

i.e. LXCs can't use it.

The vfio-pci driver is indeed still loaded/bound to the GPU after the VM shuts down. You can unbind the functions of the GPU from vfio-pci and bind the actual drivers like amdgpu snd_hda_intel yourself.
I posted a hookscript for something like that here: https://forum.proxmox.com/threads/amd-rx-6650-xt-high-temperatures-when-idle.143056/post-642341

alpha754293 · Nov 11, 2024

leesteken said:
The vfio-pci driver is indeed still loaded/bound to the GPU after the VM shuts down. You can unbind the functions of the GPU from vfio-pci and bind the actual drivers like amdgpu snd_hda_intel yourself.
I posted a hookscript for something like that here: https://forum.proxmox.com/threads/amd-rx-6650-xt-high-temperatures-when-idle.143056/post-642341

Thank you for your reply.

Can I do this without needing to stop/start the LXC?

Or will I need to stop/start the LXC whenever the VM shuts down, so that the GPU will can be used by the LXC again?

Your help is greatly appreciated.

Thank you.

leesteken · Nov 11, 2024

alpha754293 said:
Can I do this without needing to stop/start the LXC?

I don't know.

alpha754293 said:
Or will I need to stop/start the LXC whenever the VM shuts down, so that the GPU will can be used by the LXC again?

I don't understand your question. If you want to load the drivers to on the Proxmox host for the GPU (so that is can be used by the host and containers) then unbind vfio-pci and bind the actual drivers when the VM shuts down (like the example script does, if you add it as a hookscript to the VM).

alpha754293 · Nov 11, 2024

leesteken said:
I don't understand your question. If you want to load the drivers to on the Proxmox host for the GPU (so that is can be used by the host and containers) then unbind vfio-pci and bind the actual drivers when the VM shuts down (like the example script does, if you add it as a hookscript to the VM).

My understanding is that when the resource is "taken" by the VM, the VM will "hold on to the resource" until the unbind happens.

And the hookscript that you have provided above, helps with the unbind process.

I understand that.

But my question is "how would the LXC know that the resource is available to the LXC again, once the hookscript unbinds the resources from the VM?"

Suppose that LXC is a person, for this example, "Person A". And the GPU is a chocolate bar.

Before the VM ("Person B") starts/runs, Person A starts eating the chocolate bar. And things are working well.

But then when you start the VM ("Person B"), Person B STEALS the chocolate bar from Person A, and starts eating it.

But when you shut down the VM, Person B puts the chocolate bar back down on the table. (unbind)

My question is does "Person A" KNOW that the chocolate bar has been put back on the table (and they can take the chocolate bar and resume eating it) or will they AUTOMATICALLY take the chocolate bar back from Person B, once Person B puts it down, because the chocolate bar was originally given to Person A to begin with?

That's the question that I am trying to figure out the answer to.

i.e. do I have to "reboot" Person A so that it then knows to take the chocolate bar from the table again, or will it automatically reclaim the chocolate bar, once Person B has put it back down on to the table?

Hopefully this analogy helps makes my question more clear.

Thanks.

leesteken · Nov 11, 2024

alpha754293 said:
Hopefully this analogy helps makes my question more clear.

I have no fix for your problem of accidentally or maliciously trying to use the same resource more than once at a time. You'll have to think about what will work for your situation and implement any technological support yourself (or find something made by others).
On my systems, I do this mostly manually as I'm the only user. I do have two VMs that want to use the same GPU (one WIndows and on Linux) and each VM does a shut down of the other before starting using a hookscript (like a dual boot solution, except with two VMs).

alpha754293 · Nov 11, 2024

leesteken said:
I have no fix for your problem of accidentally or maliciously trying to use the same resource more than once at a time. You'll have to think about what will work for your situation and implement any technological support yourself (or find something made by others).
On my systems, I do this mostly manually as I'm the only user. I do have two VMs that want to use the same GPU (one WIndows and on Linux) and each VM does a shut down of the other before starting using a hookscript (like a dual boot solution, except with two VMs).

Got it.

So you aren't trying to share a GPU between a VM and a LXC, like I am.

(Sharing a GPU between LXCs has been working out quite well for me, because that way, my RTX A2000 6 GB SFF GPU can be used either for LLM OR for AI image generation, but due to the limited amount of VRAM, I can't run both tasks, simultaneously.

LXCs have been great for managing that because I can log into one LXC for LLM and a separate LXC for the AI image generation and they can both be running and enabled simultaneously.

What I want to do now, is to "level that up" where on my system where I have two 3090s, I want to use one of the 3090s for gaming (under Nobara Linux) from time to time, and then give the 3090 "back to the system" so that my LLM LXC will be able to use both 3090s again, because the Llama 3.1 70B parameter model, requires around 40 GB of VRAM to run (which Ollama will easily split it between the two 3090s).

If unbinding the GPU from the VM works, but the LXCs don't know about it, and therefore; requires the LXC to be restarted so that it will pick up the 2nd GPU again -- it's not the end of the world, but I just need to know that a LXC restart may be required for my LLM to work again.

But if I don't need to restart the LXC, and it automatically knows that the 2nd GPU is "back online" and available to said LXC again, then that would, of course, be better.

Thus my question (and wanting to see if other people have tried this before already, and what their experiences were with this kind of a set up).

Thank you.

dcsapak · Nov 11, 2024

you have to restart the containers currently, for the following reason:

the device node you pass through to the container only exists when the 'real' driver is loaded (e.g. nvidia), but one needs to remove the driver from the device to pass it through to a vm
also the device passthrough for containers is currently not hot-pluggable and removing a host device while the container is running, disconnects the passed through file

when now adding the host driver again, the kernel creates a new device file in /dev that has nothing to do with the original, thus it does not get reconnected to the container

hope that makes it clearer why you'd need to restart the container when you rebind the device to the driver after shutting down the vm

alpha754293 · Nov 11, 2024

dcsapak said:
you have to restart the containers currently, for the following reason:

the device node you pass through to the container only exists when the 'real' driver is loaded (e.g. nvidia), but one needs to remove the driver from the device to pass it through to a vm
also the device passthrough for containers is currently not hot-pluggable and removing a host device while the container is running, disconnects the passed through file

when now adding the host driver again, the kernel creates a new device file in /dev that has nothing to do with the original, thus it does not get reconnected to the container

hope that makes it clearer why you'd need to restart the container when you rebind the device to the driver after shutting down the vm

Gotcha.

Do I need to restart ONLY the container, or do I need to restart the entire node/Proxmox server/system?

Your help is greatly appreciated.

Thank you.

dcsapak · Nov 11, 2024

if you bind the card again to the original driver after the vm is shutdown, you should only need to restart the container.
it may happen (depending on the hardware, this is nothing we can influence really) that the card does not want to rebind to the driver properly, then your only possibility is to restart the whole server
this should not be the case for most hardware though

alpha754293 · Nov 11, 2024

dcsapak said:
if you bind the card again to the original driver after the vm is shutdown, you should only need to restart the container.
it may happen (depending on the hardware, this is nothing we can influence really) that the card does not want to rebind to the driver properly, then your only possibility is to restart the whole server
this should not be the case for most hardware though

I appreciate your insights in regards to this topic.

I'll have to play around and experiment with it.

Thank you.

Search

Search

Pardon my stupid question about unbinding the Nvidia driver/GPU from a VM

alpha754293

Member

alpha754293

Member

dcsapak

Proxmox Staff Member

alpha754293

Member

Attachments

leesteken

Distinguished Member

alpha754293

Member

leesteken

Distinguished Member

alpha754293

Member

leesteken

Distinguished Member

alpha754293

Member

dcsapak

Proxmox Staff Member

alpha754293

Member

dcsapak

Proxmox Staff Member

alpha754293

Member