Hi,
I'm trying to get stable-diffusion to work in an LXC container, but not succeeding yet !
Here's what I've tried
added this to /etc/apt/sources.list
Ran apt update and then
rebooted and nvidia driver is loaded
Found my device number
Get the group numbers
Added the subgid
Added the groups to root
Lastly created my lxc.conf
Then I added this magic part to the lxc.conf file
Then I added contrib non-free non-free-firmware to the source list
ran apt install nvidia-driver nvidia-kernel-dkm
and rebooted it all
Now inside the LXC I get the following error
When I try to run stable diffusion I get
And I'm kind of stuck at this point, every should work ? But doesn't !?
What am I missing here ?
I'm trying to get stable-diffusion to work in an LXC container, but not succeeding yet !
Here's what I've tried
added this to /etc/apt/sources.list
Code:
deb http://deb.debian.org/debian bookworm main contrib non-free non-free-firmware
deb http://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware
Ran apt update and then
Code:
apt install nvidia-driver nvidia-kernel-dkms
rebooted and nvidia driver is loaded
Code:
root@proxmox:~# lsmod | grep nv
nvidia_uvm 1609728 0
nvidia_drm 81920 0
nvidia_modeset 1314816 2 nvidia_drm
nvidia 56807424 19 nvidia_uvm,nvidia_modeset
video 69632 2 asus_wmi,nvidia_modeset
nvme 53248 5
nvme_core 196608 6 nvme
nvme_auth 24576 1 nvme_core
Code:
root@proxmox:~# nvidia-smi
Wed Sep 11 02:09:32 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:0F:00.0 Off | N/A |
| 31% 32C P8 17W / 170W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Found my device number
Code:
root@proxmox:~# ls -lsh `find /dev/dri`
0 lrwxrwxrwx 1 root root 8 Sep 11 01:52 /dev/dri/by-path/pci-0000:0f:00.0-card -> ../card1
0 lrwxrwxrwx 1 root root 13 Sep 11 01:52 /dev/dri/by-path/pci-0000:0f:00.0-render -> ../renderD128
0 lrwxrwxrwx 1 root root 8 Sep 11 01:52 /dev/dri/by-path/platform-simple-framebuffer.0-card -> ../card0
0 crw-rw---- 1 root video 226, 0 Sep 11 01:52 /dev/dri/card0
0 crw-rw---- 1 root video 226, 1 Sep 11 01:52 /dev/dri/card1
0 crw-rw---- 1 root render 226, 128 Sep 11 01:52 /dev/dri/renderD128
/dev/dri:
total 0
0 drwxr-xr-x 2 root root 100 Sep 11 01:52 by-path
0 crw-rw---- 1 root video 226, 0 Sep 11 01:52 card0
0 crw-rw---- 1 root video 226, 1 Sep 11 01:52 card1
0 crw-rw---- 1 root render 226, 128 Sep 11 01:52 renderD128
/dev/dri/by-path:
total 0
0 lrwxrwxrwx 1 root root 8 Sep 11 01:52 pci-0000:0f:00.0-card -> ../card1
0 lrwxrwxrwx 1 root root 13 Sep 11 01:52 pci-0000:0f:00.0-render -> ../renderD128
0 lrwxrwxrwx 1 root root 8 Sep 11 01:52 platform-simple-framebuffer.0-card -> ../card0
Get the group numbers
Code:
root@proxmox:~# cat /etc/group | grep video
video:x:44:root
root@proxmox:~# cat /etc/group | grep render
render:x:104:root
Added the subgid
Code:
root@proxmox:~# cat /etc/subgid
root:100000:65536
root:44:1
root:104:1
Added the groups to root
Code:
usermod -aG render,video root
Lastly created my lxc.conf
Code:
root@proxmox:~# cat /etc/pve/nodes/proxmox/lxc/106.conf
arch: amd64
cmode: shell
cores: 16
features: nesting=1
hostname: gputest
memory: 12000
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=BC:24:11:0F:20:7F,ip=dhcp,ip6=dhcp,type=veth
ostype: debian
rootfs: local-lvm:vm-106-disk-0,size=64G
swap: 512
unprivileged: 1
Then I added this magic part to the lxc.conf file
Code:
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.idmap: u 0 100000 65536
lxc.idmap: g 0 100000 44
lxc.idmap: g 44 44 1
lxc.idmap: g 45 100045 62
lxc.idmap: g 107 104 1
lxc.idmap: g 108 100108 65428
Then I added contrib non-free non-free-firmware to the source list
ran apt install nvidia-driver nvidia-kernel-dkm
and rebooted it all
Now inside the LXC I get the following error
Code:
root@gputest:~/stable-diffusion-webui# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
When I try to run stable diffusion I get
Code:
root@gputest:~/stable-diffusion-webui# ./webui.sh
################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################
################################################################
Running on root user
################################################################
################################################################
Repo already cloned, using it as install directory
################################################################
################################################################
Create and activate python venv
################################################################
################################################################
Launching launch.py...
################################################################
glibc version is 2.36
Check TCMalloc: libtcmalloc_minimal.so.4
libtcmalloc_minimal.so.4 is linked with libc.so,execute LD_PRELOAD=/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0]
Version: v1.10.1
Commit hash: 82a973c04367123ae98bd9abdf80d9eda9b910e2
Traceback (most recent call last):
File "/root/stable-diffusion-webui/launch.py", line 48, in <module>
main()
File "/root/stable-diffusion-webui/launch.py", line 39, in main
prepare_environment()
File "/root/stable-diffusion-webui/modules/launch_utils.py", line 387, in prepare_environment
raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
root@gputest:~/stable-diffusion-webui# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
And I'm kind of stuck at this point, every should work ? But doesn't !?
What am I missing here ?