Good afternoon! I'm in need of running a specific LXC command and I'm not certain how to do this.
I need to attach a GPU to a container using the gputype=mig, although I cannot find any examples of gputype=physical for Proxmox either. The Proxmox articles I've read seems to want to bind device majors and minors, but the LXC document wants me to set gputype=mig and pass the mig.uuid variables into LXC. I think using the major and minor approach would work find if I was only trying to pass a raw device into a container. In my case I need to utilize a feature of lxc that was added in April 2021. Here is the command I need to run in order to make this work.
lxc config device add slurmgpu0 gpu0 gpu gputype=mig mig.uuid=MIG-124c8a43-daa3-575a-8196-99e3e57e8828 pci=01:00.0
This command should send one of my Nvidia A100's 7 multi instance graphics into the container where the userland can be isolated from other instances. My purpose is to create a container which will run one slurmd work manager per Nvidia MIG instance in order to isolate the processor, memory and Nvidia compute engine from the other environments.
I could easily be missing or overlooking something. Any assistance would be very welcome. In the meantime I have half the cluster configured for PCI Passthru of the Nvidia A100s and the other half of the cluster configured for LXC. I should be able to convert one of the VMs running on the PCI Passthru to enable MIG inside of the VM and configure LXD there to move forward with testing our slurm configuration with MIG until then.
Thank you in advance with helping me figure out how to properly configure Proxmox!
I need to attach a GPU to a container using the gputype=mig, although I cannot find any examples of gputype=physical for Proxmox either. The Proxmox articles I've read seems to want to bind device majors and minors, but the LXC document wants me to set gputype=mig and pass the mig.uuid variables into LXC. I think using the major and minor approach would work find if I was only trying to pass a raw device into a container. In my case I need to utilize a feature of lxc that was added in April 2021. Here is the command I need to run in order to make this work.
lxc config device add slurmgpu0 gpu0 gpu gputype=mig mig.uuid=MIG-124c8a43-daa3-575a-8196-99e3e57e8828 pci=01:00.0
This command should send one of my Nvidia A100's 7 multi instance graphics into the container where the userland can be isolated from other instances. My purpose is to create a container which will run one slurmd work manager per Nvidia MIG instance in order to isolate the processor, memory and Nvidia compute engine from the other environments.
I could easily be missing or overlooking something. Any assistance would be very welcome. In the meantime I have half the cluster configured for PCI Passthru of the Nvidia A100s and the other half of the cluster configured for LXC. I should be able to convert one of the VMs running on the PCI Passthru to enable MIG inside of the VM and configure LXD there to move forward with testing our slurm configuration with MIG until then.
Thank you in advance with helping me figure out how to properly configure Proxmox!