LXC Specific Commands For University Machine Learning Work Manager Cluster

dosmage

Active Member
Nov 30, 2016
27
0
41
41
Good afternoon! I'm in need of running a specific LXC command and I'm not certain how to do this.

I need to attach a GPU to a container using the gputype=mig, although I cannot find any examples of gputype=physical for Proxmox either. The Proxmox articles I've read seems to want to bind device majors and minors, but the LXC document wants me to set gputype=mig and pass the mig.uuid variables into LXC. I think using the major and minor approach would work find if I was only trying to pass a raw device into a container. In my case I need to utilize a feature of lxc that was added in April 2021. Here is the command I need to run in order to make this work.

lxc config device add slurmgpu0 gpu0 gpu gputype=mig mig.uuid=MIG-124c8a43-daa3-575a-8196-99e3e57e8828 pci=01:00.0

This command should send one of my Nvidia A100's 7 multi instance graphics into the container where the userland can be isolated from other instances. My purpose is to create a container which will run one slurmd work manager per Nvidia MIG instance in order to isolate the processor, memory and Nvidia compute engine from the other environments.

I could easily be missing or overlooking something. Any assistance would be very welcome. In the meantime I have half the cluster configured for PCI Passthru of the Nvidia A100s and the other half of the cluster configured for LXC. I should be able to convert one of the VMs running on the PCI Passthru to enable MIG inside of the VM and configure LXD there to move forward with testing our slurm configuration with MIG until then.

Thank you in advance with helping me figure out how to properly configure Proxmox!
 
Good evening! Here is how I made this work.

Forenote: The container must be unprivileged! The Nvidia script within the lxc distribution will exit 1 if it's privileged.

Beginning: You must already have the nvidia drivers setup and your MIG devices working.

Fix the /usr/share/lxc/hooks/nvidia missing libraries. I did this by installing Ubuntu 22.04 and I coping 2 files within `/snap/lxd/<your_version>/lib/` libnvidia-container-go.so.1.11.0 and libnvidia-container.so.1.11.0 and place them in /lib/x86_64-linux-gnu/ . Then you need to reproduce the symlinks, in my case libnvidia-container-go.so and libnvidia-container-go.so.1 symlinked to libnvidia-container-go.so.1.11.0 as well as libnvidia-container.so and libnvidia-container.so.1 symlinked to libnvidia-container.so.1.11.0.

You will also need to lift /snap/lxd/<your_version>/wrappers/nvidia-container-cli.real to /usr/bin/nvidia-container-cli. At this point you want to run ldd on nvidia-container-cli and ensure it's not missing any libraries. The libnvidia-container-go.so library was needed for a binary I hadn't isolated.

This all is probably resolvable by installing an Nvidia related package; however I haven't tracked down where nvidia container binaries comes from. I tried to locate it with apt-file search /usr/bin/nvidia-container-cli. It would be a lot cleaner to solve this!

Once the above is settled it's just a matter of setting some environmental variables and connecting the /usr/share/lxc/hooks/nvidia hook with the container. Edit the container configuration file with the following; however this is copied from our system. You must replace the UUID with your MIG device! Edit /etc/pve/nodes/<your_node>/lxc/<your_VMID>.conf and add these lines.

lxc.environment: NVIDIA_VISIBLE_DEVICES=none
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility
lxc.environment: NVIDIA_REQUIRE_CUDA=
lxc.environment: NVIDIA_REQUIRE_DRIVER=
lxc.environment: NVIDIA_VISIBLE_DEVICES=MIG-124c8a43-daa3-575a-8196-99e3e57e8828
lxc.hook.mount: /usr/share/lxc/hooks/nvidia

My host Proxmox has Nvidia 525 drivers installed and my container instance is Ubuntu 22.04, For this you need to have nvidia-utils-525-server installed, I chose the server package as this is a headless install.

Et voila, a Proxmox container running a MIG instance passed thru.

root@universitygpu18:~# nvidia-smi
Tue Feb 7 04:05:39 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13 Driver Version: 525.60.13 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100 80G... On | 00000000:01:00.0 Off | On |
| N/A 28C P0 39W / 300W | N/A | N/A Default |
| | | Enabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 7 0 0 | 6MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
 
Last edited:
In summary, it does appear that the nvidia hook script shipped with LXC automatically discovered and mounts the devices in from the host simply by setting the MIG's UUID in the environment.
 
Reproducing my steps today, there is a step that is necessary. If you run /usr/bin/nvidia-container-cli and get a dlsym error (null) copying the libraries requires ldconfig to refresh the ld.so.cache file. This problem manifests as the container unable to start. If you ptc start <your_vmid> --debug you may see the dlsym error. For some reason ldconfig didn't update the cache and I ended up replacing it from my a working cluster pair, which didn't immediately fix the issue; but running ldconfig again fixed it.

Also while working to bring up the remaining of our a100 machine learning lab systems and all the LXCs; troubleshooting the problems I mentioned above I believe https://github.com/NVIDIA/libnvidia-container is the project the nvidia-container-cli comes from. Again, it's probably be best to install the packages and let it do its own job, rather than lifting it from a Ubuntu snap install, would be more appropriate.

I do hope this helps someone in the future who wants to get MIG working with Proxmox.
 
Hi @dosmage

I have been trying this as well for some time, however have not yet been able to get it working. My Problem right now is that nvidia-container-cli does not seem to have access to the mig devices, see my other post in the LXC forum. Did you encounter this issue as well?

To add to this post, installing the nvidia-container-cli is as simple as adding the apt repo and installing nvidia-container-toolkit (See official documentation):

Bash:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt-get update
apt-get install -y nvidia-container-toolkit
 
@sargreal, hello! Thank you very much for the information. I have yet to come back to locating the source of the files I copied out of the lxc snap!

Since I don't know where to begin with helping troubleshoot your issue, I've sent you my notes and configuration. Hopefully this helps! If so please don't forget to update your post with the solution!

I've put this system in production late March of this year. We have 42 LXCs, one for each MIG. I definitely had issues getting here but I don't think I've had the specific issue of your post, or if I did my symptom/the way I noticed was different.

For me the Nvidia persistenced command would not reconfigure my a100s on boot. I ended up throwing in an rc.local to set the compute engines back up for each graphics instance. I noticed this issue because /dev/nvidia* wasn't being properly hooked into my containers. Fortunately those mig uuids appear to be pre deterministic!

/etc/rc.local
#!/bin/sh -e

#Configure 7 GPU instances
/usr/bin/nvidia-smi mig -cgi 19,19,19,19,19,19,19

#Configure all available GPU instances for compute
/usr/bin/nvidia-smi mig -cci 0

exit 0
 
lxc.environment: NVIDIA_VISIBLE_DEVICES=none
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility
lxc.environment: NVIDIA_REQUIRE_CUDA=
lxc.environment: NVIDIA_REQUIRE_DRIVER=
lxc.environment: NVIDIA_VISIBLE_DEVICES=MIG-124c8a43-daa3-575a-8196-99e3e57e8828b
lxc.hook.mount: /usr/share/lxc/hooks/nvidia

I see my lxc information is out of date. Here is an example lxc.conf, 70114.conf.

arch: amd64
cores: 4
features: nesting=1
hostname: universitygpu-00.mll
memory: 32768
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=D2:B3:41:7C:B5:FB,ip=dhcp,tag=216,type=vethonboot: 1
ostype: ubuntu
rootfs: local-zfs:subvol-70114-disk-0,size=32G
searchdomain: mll.university.edu
swap: 8192
unprivileged: 1

#the Nvidia mig variables
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility
lxc.environment: NVIDIA_VISIBLE_DEVICES=MIG-124c8a43-daa3-575a-8196-99e3e57e8828
lxc.hook.mount: /usr/share/lxc/hooks/nvidia

#to get NFSv4 mounted inside the unprivileged LXCs
#mount nfs mount on hypervisor into each container
mp0: /nfs/home,mp=/nfs/home
#don't remap uids/gids above 1024 so NFSv4 won't show nobody for owner/group (and NFSv3 would just have the wrong uids/gids)
lxc.idmap: u 0 100000 1025
lxc.idmap: g 0 100000 1025
lxc.idmap: u 1025 1025 64511
lxc.idmap: g 1025 1025 64511
 
In case anyone needs this information too, regarding what I meant by /dev/nvidia* not getting hooked in.

user@university-00:~# ls -al /dev/nvidia*
crw-rw-rw- 1 nobody nogroup 507, 0 Apr 6 20:00 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 507, 1 Apr 6 20:00 /dev/nvidia-uvm-tools
crw-rw-rw- 1 nobody nogroup 195, 0 Apr 6 20:00 /dev/nvidia0
crw-rw-rw- 1 nobody nogroup 195, 255 Apr 6 20:00 /dev/nvidiactl

/dev/nvidia-caps:
total 0
drwxr-xr-x 2 root root 80 May 1 18:43 .
drwxr-xr-x 7 root root 580 May 1 18:43 ..
cr--r--r-- 1 nobody nogroup 510, 66 Apr 6 20:00 nvidia-cap66
cr--r--r-- 1 nobody nogroup 510, 67 Apr 6 20:00 nvidia-cap67

You want those uvm and cap entries. When I had my problem only /dev/nvidia0, 1 or 2 was passed in, the GPU the mig is off of, and the nvidiactl. The cap files are the capabilities of the mig instance. If they're not there your compute instance won't function.

The magic of the nvidia-hook is so that you don't have to identify what /dev/nvidia[0-9] you need to pass in, or which 2 /dev/nvidia-caps/nvidia-caps* files you need. Instead it allows you to pass the mig UUID and it'll auto hook in each file you need. I believe you can make this work without the nvidia-hook, you'd just have to pass the devices in by hand.
 
Thanks for sharing your notes dosmage! I have now got MIG working under my test proxmox container and I'm sure it would've took me a lot longer to get it configured without.

I would like to correct sargreal and the official NVIDIA docs because the above commands for installing nvidia-container-toolkit didn't work for me. Under proxmox 8 I had to run:

Code:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/debian11/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt update
apt-get install -y nvidia-container-toolkit

For anyone else new to nvidia-smi, the command to list the UUIDs of your MIG instances is
Code:
nvidia-smi -L
.
 
Thanks for correcting me @danboid !

I have also found my issue, see Github Issue. From another Guide somewhere on the internet, I had the following udev rule:

Code:
KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia* && /usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"

After removing that and just restarting it worked!

So takeaway is: do not fiddle with the nvidia devices and don't run nvidia-modprobe yourself.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!