IOMMU 4 NVIDIA GPUs with NCCL

Nathan Stratton

Well-Known Member
Dec 28, 2018
43
3
48
47
I have a VM with four exported 3090 GPUs. The GPUs work and I can run things like gpuburn, but when I try to train my models with NCCL I run into errors. I don't have a ACS option in bios (I believe its off now so no option) Supermicro H12SSL, but I do have IOMMU on so I can export the cards to the VM.

https://docs.nvidia.com/deeplearnin...I switches have ACS,IO virtualization or VT-d.

This article suggests disabling IOMMU, but I can't do that because I need to export GPUs to a VM. How are other people doing GPU Direct without redirecting all PCI point-to-point traffic to the CPU root complex?

I assume that when AWS or GCloud provides GPUs, they are doing it with VMs, so this must be possible.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!