Nvidia GPU drivers installed on Proxmox host, but not working in LXC

aitch

New Member
Jun 18, 2023
1
0
1
The end goal is to have a GPU to passthrough to multiple LXC containers. I originally had it mounted as a passthrough to a VM, but have since removed those configurations and want to use the LXCs instead. The primary objective is for my Jellyfin LXC to be able to utilise the GPU for hardware encoding.

Stats:
Proxmox VE 8.2.4 (6.8.8-3-pve)
i5-10400 CPU (not using the integrated graphics)
Nvidia Quadro P400
(Jellyfin LXC Unprivileged from Helper-Scripts)

Context:
I have followed quite a few guides, and each one seemed to have a failure somewhere for me - I suspect this may be the issue. I've installed packages to get the GPU to work such as:
  • nvidia-driver
  • libnvcuvid1
  • libnvidia-encode1
  • nvidia-cuda-dev
  • nvidia-cuda-toolkit
Many guides also suggest downloading the drivers direct from Nvidia, and running them manually using the installer - however I cannot do this as my install complains that I have already installed drivers using apt, and I should remove those before proceeding.
I have removed, and purged the above packages, along with any associated packages, but this error still presents itself when trying to use the installer.

Current Situation:
I thought I had made some good progress, re-installing drivers via apt after many purges and reboots - the GPU seemed to be mounted, and under lspci I was getting the "Kernel driver in use: nvidia" under my GPU. I thought I was sorted, so I successfully (so I thought) passed through the GPU to the Jellyfin LXC (following this video guide), set Jellyfin to NVENC Hardware Acceleration, only to try and get a playback error essentially telling me that "no CUDA-capable device is detected".
I went back to my host, and ran the 'nvidia-smi' command, only to see that the whilst the GPU is identified, no processes are running like in every other screenshot I see in guides:
Code:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P400                    On  | 00000000:01:00.0 Off |                  N/A |
| 34%   47C    P8              N/A /  N/A |      1MiB /  2048MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                       
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

My Question:
Where do I go from here? Is there a way to clean out all the nvidia drivers from proxmox, so I can run the installer that many other users seem to have no issues with?
Have I potentially caused too much mess in my proxmox with all the packages and may need to clean install? Can I wait for a new proxmox update, and allow that new kernel to clean out any packages?


Apologies for the long post and fairly vague questions - I have spent many evenings on this, I feel that I have made this far more complicated for myself and I'm at the edge of my knowledge and fear that any further messing about could put my proxmox install in a bad state.
 
Last edited:
Hi

Did you ever get this sorted Iv been trying to do the same and getting the exact same issues as you
 
Hello,
I too am suffering from a similar issue. `nvidia-smi -q` shows all the right information in ubuntu 22.04 guest vm on pve8.x. But, running samples in cuda-toolkits yield `no CUDA-capable device found`.
 
  • Like
Reactions: MostHated
Here's my solution:

Go to Proxmo host shell under root, and in its home folder /root, run the following scripts:

  1. Delay lxc/vm 30 seconds after boot:
    Bash:
    pvenode config set --startall-onboot-delay 30
    , double check:
    Bash:
    pvenode config get
  2. Download nvidia minimum power script from SpaceinvaderOne (who is very active on Unraid forums helping people building NAS/Homelabs) GitHub repo:
    Bash:
    wget https://raw.githubusercontent.com/SpaceinvaderOne/nvidia_powersave/refs/heads/main/nvidia_power
  3. Add CRON job after rebooting:
    Bash:
    crontab -e
    , add the following line:
    Code:
     @reboot bash /root/nvidia_power >> /var/log/nvidia_power.log 2>&1
    , here we just run the script which calls `nvidia-smi` to set minimum power mode after rebooting once.
  4. by the way, the crontab for root is here:
    Bash:
    cat /var/spool/cron/crontabs/root
    , double check the last line added above.
  5. Reboot and test:
    Bash:
    reboot
    , after rebooting, you will find power script logs;
Bash:
tail /var/log/nvidia_power.log

Power Usage -- 10.71 W

Power State -- P8

--------------------------------------------

Note: Power states range from P0 (maximum performance) to P8 (lowest power state).

--------------------------------------------

Calculating Savings...

--------------------------------------------

Power saved from before for GPU 0 is 14.62 watts.

This could save you up to £35.86 per year.

Potential max savings over year for all GPUs combined: £35.86 per year

It even tells me how much $,£,€ I would save by turning GPU into P8 power saving mode :) His original YouTube video is here https://www.youtube.com/watch?v=KD6G-tpsyKw but I guess unlike Unraid, we don't want to run the power script aggressively for non NAS use cases. Here we only run @afterboot once within a CRON job, thus solved LXC not seeing GPU driver issue. All LXC are configured to restart without any delay settings.

Hope this helps. Let me know if it works for you guys.
 
Last edited: