GPU Passthrough to LXC Container

ProxyUser · Aug 24, 2022

I've been having GPU passthrough issue with Dell R720 passing the GPU to an ubuntu 22.04 container. Proxmox host looks fine and I'm able to see the /dev/nvidia device files in the Ubuntu container. But no CUDA capable device is being detected in the container. I tried this on on Proxmox VE 7.2-7

On the Proxmox host:
---------------------------------
#cat /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"

# cat /etc/pve/lxc/101.conf
lxc.cgroup.devices.allow: c 195:* rwm
lxc.cgroup.devices.allow: c 509:* rwm
lxc.cgroup.devices.allow: c 226:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/fb0 dev/fb0 none bind,optional,create=file

#nvidia-smi

On Ubuntu 22.04 LXC Container:
--------------------------------------------------
1. Verified GPU is available through device files:
# ll /dev/nvidia*
crw-rw-rw- 1 root root 195, 254 Aug 24 15:22 /dev/nvidia-modeset
crw-rw-rw- 1 root root 509, 0 Aug 24 15:22 /dev/nvidia-uvm
crw-rw-rw- 1 root root 509, 1 Aug 24 15:22 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195, 0 Aug 24 15:22 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Aug 24 15:22 /dev/nvidiactl
# ll /dev/dri/*
crw-rw---- 1 root video 226, 0 Aug 24 15:22 /dev/dri/card0
crw-rw---- 1 root video 226, 1 Aug 24 15:22 /dev/dri/card1
crw-rw---- 1 root syslog 226, 128 Aug 24 15:22 /dev/dri/renderD128
# ll /dev/fb*
crw-rw---- 1 root video 29, 0 Aug 24 15:22 /dev/fb0

2. Installed CUDA (same version as host) using this: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu-installation

3. Install the CUDA Samples from here: https://github.com/nvidia/cuda-samples

4. Test GPU available from Container:
# /cudaSamples/cuda-samples/Samples/1_Utilities/deviceQuery/deviceQuery
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 100
-> no CUDA-capable device is detected
Result = FAIL

Why is the GPU not available inside the Ubuntu Container?

Any help appreciated.

dcsapak · Aug 25, 2022

maybe a permission error? i found this, which leans in that direction: https://superuser.com/questions/1017194/no-cuda-capable-device-is-detected-inside-lxc-container

ProxyUser · Aug 25, 2022

Thank you for your response and sharing a useful link. Indeed that is my exact same problem: GPU not visible inside LXC container although cuda is recognized and installed and the nVidia devices are mounted on the LXC container and the host.

But the resolution must be different because Proxmox VE 7.2 on host derives its 5.15 based kernel from the Ubuntu 22.04 kernel which is in the LXC container. Also all guides, that show how to set this up, show that the nVidia devices (ls /dev/nvidia*) are owned by root in the LXC container.

What could I be doing wrong?

dcsapak · Aug 26, 2022

as far as i got from the link, the files should be belong to nobody/nogroup instead of root
(at least according to the original instruction from the superuser post: https://sqream.com/blog/cuda-in-lxc-containers/)

also, the kernel is based on ubuntu 22.04, but the remaining system is a debian 11, maybe you could just try once if that fixes the problem?

ProxyUser · Aug 29, 2022

You are absolutely correct! That fixed it.

The device files (/dev/nvidia*) are correctly set to nobody/nogroup if the "Unpriviledged Container" flag is set (checked) on the container at the time of creation. I made the mistake of setting the "Unpriviledged Container" to off (unchecked) and that cause the device files to be owned by root which caused the CUDA problem.

@dcsapak : Thank you very much for your help. You are a genius and awesome!

CribbageSTARSHIP · Jan 1, 2023

ProxyUser said:
You are absolutely correct! That fixed it.

The device files (/dev/nvidia*) are correctly set to nobody/nogroup if the "Unpriviledged Container" flag is set (checked) on the container at the time of creation. I made the mistake of setting the "Unpriviledged Container" to off (unchecked) and that cause the device files to be owned by root which caused the CUDA problem.

@dcsapak : Thank you very much for your help. You are a genius and awesome!

I'm glad you made this post, as it answered many of my questions. However, I was wondering, *if* I needed to run CUDA on a privileged container could I just chown -R nobody:nogroup /dev/nvidia, or add the LXC user to the LXC root group?

manunited10 · Jan 31, 2023

Hey guys.
These two lines that OP mentioned as well, change their number (509 sometimes goes to 508, 511, etc) after reboot which makes the lxc not to recognize the GPU (I'm running deepstack on an ubuntu lxc).

Code:

crw-rw-rw- 1 root root 509, 0 Aug 24 15:22 /dev/nvidia-uvm
crw-rw-rw- 1 root root 509, 1 Aug 24 15:22 /dev/nvidia-uvm-tools

And then I have to manually update the /etc/pve/lxc/101.conf file accordingly and reboot the container.
What am I missing? Thanks.

CryptoVibe · May 1, 2023

ProxyUser said:
I've been having GPU passthrough issue with Dell R720 passing the GPU to an ubuntu 22.04 container. Proxmox host looks fine and I'm able to see the /dev/nvidia device files in the Ubuntu container. But no CUDA capable device is being detected in the container. I tried this on on Proxmox VE 7.2-7

On the Proxmox host:
---------------------------------
#cat /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"

# cat /etc/pve/lxc/101.conf
lxc.cgroup.devices.allow: c 195:* rwm
lxc.cgroup.devices.allow: c 509:* rwm
lxc.cgroup.devices.allow: c 226:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/fb0 dev/fb0 none bind,optional,create=file

#nvidia-smi
View attachment 40325

On Ubuntu 22.04 LXC Container:
--------------------------------------------------
1. Verified GPU is available through device files:
# ll /dev/nvidia*
crw-rw-rw- 1 root root 195, 254 Aug 24 15:22 /dev/nvidia-modeset
crw-rw-rw- 1 root root 509, 0 Aug 24 15:22 /dev/nvidia-uvm
crw-rw-rw- 1 root root 509, 1 Aug 24 15:22 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195, 0 Aug 24 15:22 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Aug 24 15:22 /dev/nvidiactl
# ll /dev/dri/*
crw-rw---- 1 root video 226, 0 Aug 24 15:22 /dev/dri/card0
crw-rw---- 1 root video 226, 1 Aug 24 15:22 /dev/dri/card1
crw-rw---- 1 root syslog 226, 128 Aug 24 15:22 /dev/dri/renderD128
# ll /dev/fb*
crw-rw---- 1 root video 29, 0 Aug 24 15:22 /dev/fb0

2. Installed CUDA (same version as host) using this: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu-installation

3. Install the CUDA Samples from here: https://github.com/nvidia/cuda-samples

4. Test GPU available from Container:
# /cudaSamples/cuda-samples/Samples/1_Utilities/deviceQuery/deviceQuery
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 100
-> no CUDA-capable device is detected
Result = FAIL

Why is the GPU not available inside the Ubuntu Container?

Any help appreciated.

Hello, do you have this in an ordered list of steps to replicate? I'm trying to get a GPU to work in an LXC like you were able to do.

Thank you,

LnxBil · May 1, 2023

CryptoVibe said:
Hello, do you have this in an ordered list of steps to replicate? I'm trying to get a GPU to work in an LXC like you were able to do.

Isn't that already an ordered list? Just do top-to-bottom and after you reached the bottom (yes DO everything), it should work.

Search

Search

GPU Passthrough to LXC Container

ProxyUser

New Member

dcsapak

Proxmox Staff Member

ProxyUser

New Member

dcsapak

Proxmox Staff Member

ProxyUser

New Member

CribbageSTARSHIP

New Member

manunited10

New Member

CryptoVibe

Member

LnxBil

Distinguished Member