Setting up nvidia gpu for stable diffusion in a LXC container ?

shodan

Member
Sep 1, 2022
89
33
23
Hi,

I'm trying to get stable-diffusion to work in an LXC container, but not succeeding yet !

Here's what I've tried


added this to /etc/apt/sources.list

Code:
deb http://deb.debian.org/debian bookworm main contrib non-free non-free-firmware
deb http://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware

Ran apt update and then

Code:
apt install nvidia-driver nvidia-kernel-dkms

rebooted and nvidia driver is loaded

Code:
root@proxmox:~# lsmod | grep nv
nvidia_uvm           1609728  0
nvidia_drm             81920  0
nvidia_modeset       1314816  2 nvidia_drm
nvidia              56807424  19 nvidia_uvm,nvidia_modeset
video                  69632  2 asus_wmi,nvidia_modeset
nvme                   53248  5
nvme_core             196608  6 nvme
nvme_auth              24576  1 nvme_core

Code:
root@proxmox:~# nvidia-smi
Wed Sep 11 02:09:32 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        On  | 00000000:0F:00.0 Off |                  N/A |
| 31%   32C    P8              17W / 170W |      1MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Found my device number

Code:
root@proxmox:~# ls -lsh `find /dev/dri`
0 lrwxrwxrwx 1 root root          8 Sep 11 01:52 /dev/dri/by-path/pci-0000:0f:00.0-card -> ../card1
0 lrwxrwxrwx 1 root root         13 Sep 11 01:52 /dev/dri/by-path/pci-0000:0f:00.0-render -> ../renderD128
0 lrwxrwxrwx 1 root root          8 Sep 11 01:52 /dev/dri/by-path/platform-simple-framebuffer.0-card -> ../card0
0 crw-rw---- 1 root video  226,   0 Sep 11 01:52 /dev/dri/card0
0 crw-rw---- 1 root video  226,   1 Sep 11 01:52 /dev/dri/card1
0 crw-rw---- 1 root render 226, 128 Sep 11 01:52 /dev/dri/renderD128

/dev/dri:
total 0
0 drwxr-xr-x 2 root root        100 Sep 11 01:52 by-path
0 crw-rw---- 1 root video  226,   0 Sep 11 01:52 card0
0 crw-rw---- 1 root video  226,   1 Sep 11 01:52 card1
0 crw-rw---- 1 root render 226, 128 Sep 11 01:52 renderD128

/dev/dri/by-path:
total 0
0 lrwxrwxrwx 1 root root  8 Sep 11 01:52 pci-0000:0f:00.0-card -> ../card1
0 lrwxrwxrwx 1 root root 13 Sep 11 01:52 pci-0000:0f:00.0-render -> ../renderD128
0 lrwxrwxrwx 1 root root  8 Sep 11 01:52 platform-simple-framebuffer.0-card -> ../card0


Get the group numbers

Code:
root@proxmox:~# cat /etc/group | grep video
video:x:44:root
root@proxmox:~# cat /etc/group | grep render
render:x:104:root


Added the subgid

Code:
root@proxmox:~# cat /etc/subgid
root:100000:65536
root:44:1
root:104:1

Added the groups to root

Code:
usermod -aG render,video root

Lastly created my lxc.conf

Code:
root@proxmox:~# cat /etc/pve/nodes/proxmox/lxc/106.conf
arch: amd64
cmode: shell
cores: 16
features: nesting=1
hostname: gputest
memory: 12000
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=BC:24:11:0F:20:7F,ip=dhcp,ip6=dhcp,type=veth
ostype: debian
rootfs: local-lvm:vm-106-disk-0,size=64G
swap: 512
unprivileged: 1

Then I added this magic part to the lxc.conf file

Code:
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.idmap: u 0 100000 65536
lxc.idmap: g 0 100000 44
lxc.idmap: g 44 44 1
lxc.idmap: g 45 100045 62
lxc.idmap: g 107 104 1
lxc.idmap: g 108 100108 65428


Then I added contrib non-free non-free-firmware to the source list
ran apt install nvidia-driver nvidia-kernel-dkm

and rebooted it all

Now inside the LXC I get the following error


Code:
root@gputest:~/stable-diffusion-webui# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

When I try to run stable diffusion I get



Code:
root@gputest:~/stable-diffusion-webui# ./webui.sh

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################

################################################################
Running on root user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
glibc version is 2.36
Check TCMalloc: libtcmalloc_minimal.so.4
libtcmalloc_minimal.so.4 is linked with libc.so,execute LD_PRELOAD=/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0]
Version: v1.10.1
Commit hash: 82a973c04367123ae98bd9abdf80d9eda9b910e2
Traceback (most recent call last):
  File "/root/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/root/stable-diffusion-webui/launch.py", line 39, in main
    prepare_environment()
  File "/root/stable-diffusion-webui/modules/launch_utils.py", line 387, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
root@gputest:~/stable-diffusion-webui# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.


And I'm kind of stuck at this point, every should work ? But doesn't !?

What am I missing here ?
 
I have not tried it using the distro nvidia drivers and I’m no expert. But I have gotten LXCs to use my GPU. In general, the difference is that the nvidia driver is manually installed on the Proxmox host and also in each LXC, but with —no-kernel-headers. The LXC conf also has a lot more stuff to pass through. And Cuda has to be manually installed.

I’ve used these two guides to get it working. The first one I used later but it gets Cuda working correctly for me.

https://gist.github.com/egg82/90164a31db6b71d36fa4f4056bbee2eb

https://sluijsjes.nl/2024/05/18/cor...to-install-frigate-video-surveillance-server/
 
Hi,

I got it working

I made these scripts to make this streamlined


Code:
#-----------------------
# Creating the shared space

#create logical volumes
lvcreate -V 200G -T pve/data -n stable-diffusion-models
lvcreate -V 200G -T pve/data -n stable-diffusion-extensions
lvcreate -V 200G -T pve/data -n stable-diffusion-webui

#format each volumes with ext4
mkfs.ext4 /dev/pve/stable-diffusion-webui
mkfs.ext4 /dev/pve/stable-diffusion-extensions
mkfs.ext4 /dev/pve/stable-diffusion-models

#create the mount point folder
mkdir /mnt/stable-diffusion-webui
mkdir /mnt/stable-diffusion-extensions
mkdir /mnt/stable-diffusion-models

#create fstab entries
echo "/dev/pve/stable-diffusion-webui /mnt/stable-diffusion-webui  ext4    defaults    0  2" >> /etc/fstab
echo "/dev/pve/stable-diffusion-extensions /mnt/stable-diffusion-extensions  ext4    defaults    0  2" >> /etc/fstab
echo "/dev/pve/stable-diffusion-models /mnt/stable-diffusion-models  ext4    defaults    0  2" >> /etc/fstab

#mount the logical volumes
mount /mnt/stable-diffusion-webui
mount /mnt/stable-diffusion-extensions 
mount /mnt/stable-diffusion-models
#-----------------------



#-----------------------
#Create the LXC container

vmid=115
rsa_pub_key="ssh-rsa AAAAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXUjcfgDejp rsa-key-20220909"
lxc_template="local:vztmpl/debian-12-standard_12.2-1_amd64.tar.zst"
mac_address="DE:AD:BE:EF:00:02"
lxc_hostname="sdweb"

sdweb_service_file="/etc/systemd/system/sdweb.service"
echo $rsa_pub_key > /ssh_key.pub

pct create $vmid  $lxc_template  --arch amd64 --cores 16 --memory 16000 --swap 3000 --hostname $lxc_hostname --net0 name=eth0,bridge=vmbr0,firewall=1,hwaddr=$mac_address,ip=dhcp,type=veth --rootfs local-lvm:128 --mp0 /mnt/stable-diffusion-models,mp=/opt/stable-diffusion-models --features nesting=1 --unprivileged 1 --ostype debian --ssh-public-keys /ssh_key.pub

# to add permissions mapping for nvidia video devices
LXC_CONF_FILE="/etc/pve/nodes/proxmox/lxc/$vmid.conf"
echo "lxc.idmap: u 0 100000 65536" >> "$LXC_CONF_FILE"
echo "lxc.idmap: g 0 100000 44" >> "$LXC_CONF_FILE"
echo "lxc.idmap: g 44 44 1" >> "$LXC_CONF_FILE"
echo "lxc.idmap: g 45 100045 65491" >> "$LXC_CONF_FILE"
echo "lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file" >> "$LXC_CONF_FILE"
echo "lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file" >> "$LXC_CONF_FILE"
echo "lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file" >> "$LXC_CONF_FILE"
echo "lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file" >> "$LXC_CONF_FILE"

# to set console to shell mode
echo "cmode: shell" >> "$LXC_CONF_FILE"

# to mount the stable diffusion models
echo "mp0: /mnt/stable-diffusion-models,mp=/opt/stable-diffusion-models" >> "$LXC_CONF_FILE"

rm /ssh_key.pub

pct start 115

#-----------------------

#-----------------------
#Install stable diffusion inside the LXC container

# add non-free non-free-firmware to apt sources.list
sed -i 's|deb http://deb.debian.org/debian bookworm main contrib|deb http://deb.debian.org/debian bookworm main contrib non-free non-free-firmware|' /etc/apt/sources.list
sed -i 's|deb http://security.debian.org bookworm-security main contrib|deb http://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware|' /etc/apt/sources.list

#install all dependencies
apt update ; apt install -y nvidia-driver nvidia-kernel-dkms wget git python3 python3-venv libgl1 libglib2.0-0 screen  google-perftools bc authbind

#create user user
useradd -s /usr/sbin/nologin user
usermod -L user
usermod -p '!' user
mkdir /home/user
chown user:video /home/user

#allow port binding for port 80 by user user
touch /etc/authbind/byport/80
chmod 500 /etc/authbind/byport/80
chown user /etc/authbind/byport/80

#download sdwebui
cd /opt
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
chown -R user:video /opt/stable-diffusion-webui/
cd /opt/stable-diffusion-webui
mv /opt/stable-diffusion-webui/models /opt/stable-diffusion-webui/models.old
ln -s /opt/stable-diffusion-models/ /opt/stable-diffusion-webui/models

#setup sdweb command line
sed -i 's|#export COMMANDLINE_ARGS=""|export COMMANDLINE_ARGS="--listen --port 80 --enable-insecure-extension-access --gradio-auth MyUsername:MyPassword"|' /opt/stable-diffusion-webui/webui-user.sh

#allow running webui as root
#sed -i 's|can_run_as_root=0|can_run_as_root=1|' webui.sh

#create the sdweb systemctl service file
sdweb_service_file="/etc/systemd/system/sdweb.service"
echo "[Unit]" > $sdweb_service_file
echo "Description=Stable Diffusion Web UI Service" >> $sdweb_service_file
echo "After=network.target" >> $sdweb_service_file
echo "" >> $sdweb_service_file
echo "[Service]" >> $sdweb_service_file
echo "Restart=always" >> $sdweb_service_file
echo "RestartSec=3" >> $sdweb_service_file
echo "User=user" >> $sdweb_service_file
echo "Group=video" >> $sdweb_service_file
echo "ExecStart=/usr/bin/screen -DmS sdweb /usr/bin/authbind --deep /opt/stable-diffusion-webui/webui.sh" >> $sdweb_service_file
echo "" >> $sdweb_service_file
echo "[Install]" >> $sdweb_service_file
echo "WantedBy=multi-user.target" >> $sdweb_service_file

#start sdweb service
systemctl start sdweb

#start sdweb service at boot
systemctl enable sdweb

#This will take a while to start the first time
#-----------------------
 
I'd use nvidia-container-toolkit so only the host requires drivers, guest doesn't need them at all. Much easier to manage.
 
Hello,

I gave nvidia-container-toolkit a shot this evening

Here is the new version of the script that runs inside the lxc container

This installs successfully, but fails at the end

Are you sure that nvidia-container-toolkit works with lxc containers and not just docker containers ?

Code:
apt update ; apt install -y gpg

wget -O /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg https://nvidia.github.io/libnvidia-container/gpgkey
gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg /usr/share/keyrings/nvidia-container-toolkit-keyring.txt
rm /usr/share/keyrings/nvidia-container-toolkit-keyring.txt
wget -O /etc/apt/sources.list.d/nvidia-container-toolkit.list  https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list
sed -i 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

#install all dependencies
apt update ; apt install -y wget git python3 python3-venv libgl1 libglib2.0-0 screen  google-perftools bc authbind nvidia-container-toolkit sudo

#create user user
useradd -s /usr/sbin/nologin user
usermod -L user
usermod -p '!' user
mkdir /home/user
chown user:video /home/user

#allow port binding for port 80 by user user
touch /etc/authbind/byport/80
chmod 500 /etc/authbind/byport/80
chown user /etc/authbind/byport/80

#download sdwebui
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui /opt/stable-diffusion-webui
chown -R user:video /opt/stable-diffusion-webui/
mv /opt/stable-diffusion-webui/models /opt/stable-diffusion-webui/models.old
ln -s /opt/stable-diffusion-models/ /opt/stable-diffusion-webui/models

#setup sdweb command line
sed -i 's|#export COMMANDLINE_ARGS=""|export COMMANDLINE_ARGS="--listen --port 80 --enable-insecure-extension-access --gradio-auth MyUsername:MyPassword"|' /opt/stable-diffusion-webui/webui-user.sh

#allow running webui as root
#sed -i 's|can_run_as_root=0|can_run_as_root=1|' webui.sh

#create the sdweb systemctl service file
sdweb_service_file="/etc/systemd/system/sdweb.service"
echo "[Unit]" > $sdweb_service_file
echo "Description=Stable Diffusion Web UI Service" >> $sdweb_service_file
echo "After=network.target" >> $sdweb_service_file
echo "" >> $sdweb_service_file
echo "[Service]" >> $sdweb_service_file
echo "Restart=always" >> $sdweb_service_file
echo "RestartSec=3" >> $sdweb_service_file
echo "User=user" >> $sdweb_service_file
echo "Group=video" >> $sdweb_service_file
echo "ExecStart=/usr/bin/screen -DmS sdweb /usr/bin/authbind --deep /opt/stable-diffusion-webui/webui.sh" >> $sdweb_service_file
echo "" >> $sdweb_service_file
echo "[Install]" >> $sdweb_service_file
echo "WantedBy=multi-user.target" >> $sdweb_service_file

#start sdweb service
systemctl start sdweb

#start sdweb service at boot
systemctl enable sdweb

Here is the error message

sudo -u user /usr/bin/authbind --deep /opt/stable-diffusion-webui/webui.sh

Code:
################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################

################################################################
Running on user user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0]
Version: v1.10.1
Commit hash: 82a973c04367123ae98bd9abdf80d9eda9b910e2
Traceback (most recent call last):
  File "/opt/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/opt/stable-diffusion-webui/launch.py", line 39, in main
    prepare_environment()
  File "/opt/stable-diffusion-webui/modules/launch_utils.py", line 387, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
 
Yes, I'm using it with my Plex container. Can you run nvidia-smi in the container?
Here's my ctid.conf:

Code:
lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia0 ] && /usr/bin/nvidia-modprobe -c0 -u'
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
lxc.hook.mount: /usr/share/lxc/hooks/nvidia
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!