I am trying to set up my Proxmox system so that I will be able to use my one Nvidia RTX A2000 across different VMs or LXC containers, but with only one of them running at a time.
I recently watched a Jim's Garage YouTube video on how to split your GPU between LXC containers.
My setup is a little bit different as I am testing out different combination of this in terms of what works and what doesn't.
What I have found is that if I boot up my Proxmox server, my CentOS 7.9.2009 LXC container will be able to use my A2000 for GPU accelerated CFD simulations.
But if I shutdown said LXC container, and then spin up my CentOS VM, it will then be able to use the A2000 for other GPU accelerated tasks. No problem, right?
But if I shut the VM down, and then start the LXC container back up -- when I check to make sure that it is able to "see" the A2000 via "nvidia-smi", I get the error message:
"Failed to initialize NVML: Unknown Error"
I know that the configuration for /etc/modprobe.d/blacklist.conf is configured properly, as well as /etc/modules-load.d/modules.conf, /etc/udev/rules.d/70-nvidia.rules, as well as the changes that are needed to the <<CTID>>.conf files because it runs with a fresh boot.
If I reboot the Promox server, then I can get the CT to be able to "see" and use the A2000 again.
But the moment that I shut down the CT and spin up the VM, I won't be able to spin up the CT again and have it "see" and use said A2000.
So if I want to be able to pass the GPU back and forth between CT <-> VM, what would be the best way for me to do this?
I imagine that there's got to be a way to be able to do this given that cloud providers need to be able to provision the hardware "at will", depending on the different needs and use cases, for their different customers.
Your help and advice is greatly appreciated.
Thank you.
I recently watched a Jim's Garage YouTube video on how to split your GPU between LXC containers.
My setup is a little bit different as I am testing out different combination of this in terms of what works and what doesn't.
What I have found is that if I boot up my Proxmox server, my CentOS 7.9.2009 LXC container will be able to use my A2000 for GPU accelerated CFD simulations.
But if I shutdown said LXC container, and then spin up my CentOS VM, it will then be able to use the A2000 for other GPU accelerated tasks. No problem, right?
But if I shut the VM down, and then start the LXC container back up -- when I check to make sure that it is able to "see" the A2000 via "nvidia-smi", I get the error message:
"Failed to initialize NVML: Unknown Error"
I know that the configuration for /etc/modprobe.d/blacklist.conf is configured properly, as well as /etc/modules-load.d/modules.conf, /etc/udev/rules.d/70-nvidia.rules, as well as the changes that are needed to the <<CTID>>.conf files because it runs with a fresh boot.
If I reboot the Promox server, then I can get the CT to be able to "see" and use the A2000 again.
But the moment that I shut down the CT and spin up the VM, I won't be able to spin up the CT again and have it "see" and use said A2000.
So if I want to be able to pass the GPU back and forth between CT <-> VM, what would be the best way for me to do this?
I imagine that there's got to be a way to be able to do this given that cloud providers need to be able to provision the hardware "at will", depending on the different needs and use cases, for their different customers.
Your help and advice is greatly appreciated.
Thank you.
Last edited: