Setting up nvidia gpu for stable diffusion in a LXC container ?

shodan

Member
Sep 1, 2022
49
20
13
Hi,

I'm trying to get stable-diffusion to work in an LXC container, but not succeeding yet !

Here's what I've tried


added this to /etc/apt/sources.list

Code:
deb http://deb.debian.org/debian bookworm main contrib non-free non-free-firmware
deb http://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware

Ran apt update and then

Code:
apt install nvidia-driver nvidia-kernel-dkms

rebooted and nvidia driver is loaded

Code:
root@proxmox:~# lsmod | grep nv
nvidia_uvm           1609728  0
nvidia_drm             81920  0
nvidia_modeset       1314816  2 nvidia_drm
nvidia              56807424  19 nvidia_uvm,nvidia_modeset
video                  69632  2 asus_wmi,nvidia_modeset
nvme                   53248  5
nvme_core             196608  6 nvme
nvme_auth              24576  1 nvme_core

Code:
root@proxmox:~# nvidia-smi
Wed Sep 11 02:09:32 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        On  | 00000000:0F:00.0 Off |                  N/A |
| 31%   32C    P8              17W / 170W |      1MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Found my device number

Code:
root@proxmox:~# ls -lsh `find /dev/dri`
0 lrwxrwxrwx 1 root root          8 Sep 11 01:52 /dev/dri/by-path/pci-0000:0f:00.0-card -> ../card1
0 lrwxrwxrwx 1 root root         13 Sep 11 01:52 /dev/dri/by-path/pci-0000:0f:00.0-render -> ../renderD128
0 lrwxrwxrwx 1 root root          8 Sep 11 01:52 /dev/dri/by-path/platform-simple-framebuffer.0-card -> ../card0
0 crw-rw---- 1 root video  226,   0 Sep 11 01:52 /dev/dri/card0
0 crw-rw---- 1 root video  226,   1 Sep 11 01:52 /dev/dri/card1
0 crw-rw---- 1 root render 226, 128 Sep 11 01:52 /dev/dri/renderD128

/dev/dri:
total 0
0 drwxr-xr-x 2 root root        100 Sep 11 01:52 by-path
0 crw-rw---- 1 root video  226,   0 Sep 11 01:52 card0
0 crw-rw---- 1 root video  226,   1 Sep 11 01:52 card1
0 crw-rw---- 1 root render 226, 128 Sep 11 01:52 renderD128

/dev/dri/by-path:
total 0
0 lrwxrwxrwx 1 root root  8 Sep 11 01:52 pci-0000:0f:00.0-card -> ../card1
0 lrwxrwxrwx 1 root root 13 Sep 11 01:52 pci-0000:0f:00.0-render -> ../renderD128
0 lrwxrwxrwx 1 root root  8 Sep 11 01:52 platform-simple-framebuffer.0-card -> ../card0


Get the group numbers

Code:
root@proxmox:~# cat /etc/group | grep video
video:x:44:root
root@proxmox:~# cat /etc/group | grep render
render:x:104:root


Added the subgid

Code:
root@proxmox:~# cat /etc/subgid
root:100000:65536
root:44:1
root:104:1

Added the groups to root

Code:
usermod -aG render,video root

Lastly created my lxc.conf

Code:
root@proxmox:~# cat /etc/pve/nodes/proxmox/lxc/106.conf
arch: amd64
cmode: shell
cores: 16
features: nesting=1
hostname: gputest
memory: 12000
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=BC:24:11:0F:20:7F,ip=dhcp,ip6=dhcp,type=veth
ostype: debian
rootfs: local-lvm:vm-106-disk-0,size=64G
swap: 512
unprivileged: 1

Then I added this magic part to the lxc.conf file

Code:
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.idmap: u 0 100000 65536
lxc.idmap: g 0 100000 44
lxc.idmap: g 44 44 1
lxc.idmap: g 45 100045 62
lxc.idmap: g 107 104 1
lxc.idmap: g 108 100108 65428


Then I added contrib non-free non-free-firmware to the source list
ran apt install nvidia-driver nvidia-kernel-dkm

and rebooted it all

Now inside the LXC I get the following error


Code:
root@gputest:~/stable-diffusion-webui# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

When I try to run stable diffusion I get



Code:
root@gputest:~/stable-diffusion-webui# ./webui.sh

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################

################################################################
Running on root user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
glibc version is 2.36
Check TCMalloc: libtcmalloc_minimal.so.4
libtcmalloc_minimal.so.4 is linked with libc.so,execute LD_PRELOAD=/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0]
Version: v1.10.1
Commit hash: 82a973c04367123ae98bd9abdf80d9eda9b910e2
Traceback (most recent call last):
  File "/root/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/root/stable-diffusion-webui/launch.py", line 39, in main
    prepare_environment()
  File "/root/stable-diffusion-webui/modules/launch_utils.py", line 387, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
root@gputest:~/stable-diffusion-webui# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.


And I'm kind of stuck at this point, every should work ? But doesn't !?

What am I missing here ?
 
I have not tried it using the distro nvidia drivers and I’m no expert. But I have gotten LXCs to use my GPU. In general, the difference is that the nvidia driver is manually installed on the Proxmox host and also in each LXC, but with —no-kernel-headers. The LXC conf also has a lot more stuff to pass through. And Cuda has to be manually installed.

I’ve used these two guides to get it working. The first one I used later but it gets Cuda working correctly for me.

https://gist.github.com/egg82/90164a31db6b71d36fa4f4056bbee2eb

https://sluijsjes.nl/2024/05/18/cor...to-install-frigate-video-surveillance-server/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!