egpu passed through to ubuntu host seen by lspci but not nvidia-smi

1240evnm

New Member
Aug 26, 2021
1
0
1
33
I've recently upgraded my ML machine from an RTX 2080TI to an RTX 3090, and I've gotten a thunderbolt 3 egpu enclosure to house the 2080TI so that I can continue to use it alongside the new gpu. I've successfully managed to pass the 2080TI through to an Ubuntu guest, it shows up when using the lspci command, at least. And the built-in "Software & Updates" tool only seems to find it in the "Additional Drivers" tab (the 3090 is not shown there). However when running nvidia-smi only the 3090 is shown, and tensorflow only sees a single GPU as well. When using dmesg |grep NVRM the output is [ 3.237597] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 470.57.02 Tue Jul 13 16:14:05 UTC 2021 [ 6.224747] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x24:0xffff:1220) [ 6.228510] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 1 [ 6.328556] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x24:0xffff:1220) [ 6.328791] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 1 ...
(these errors are repeated many times)

I have to admit I'm a bit lost at this point. Ubuntu seems to be able to see the eGPU just fine, but Nvidia seemingly can't. Any help will be appreciated because right now I feel like I'm beating my head against my monitor with no progress.
 
AFAIK thunderbolt passthrough is not supported. I believe this has to do with the out-of-band thunderbolt communication via it's own protocol, which is not passed through to the VM.

Does the eGPU work on the host?
 
I've recently upgraded my ML machine from an RTX 2080TI to an RTX 3090, and I've gotten a thunderbolt 3 egpu enclosure to house the 2080TI so that I can continue to use it alongside the new gpu. I've successfully managed to pass the 2080TI through to an Ubuntu guest, it shows up when using the lspci command, at least. And the built-in "Software & Updates" tool only seems to find it in the "Additional Drivers" tab (the 3090 is not shown there). However when running nvidia-smi only the 3090 is shown, and tensorflow only sees a single GPU as well. When using dmesg |grep NVRM the output is
[ 3.237597] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 470.57.02 Tue Jul 13 16:14:05 UTC 2021
[ 6.224747] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x24:0xffff:1220)
[ 6.228510] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 1
[ 6.328556] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x24:0xffff:1220)
[ 6.328791] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 1
...

(these errors are repeated many times)

I have to admit I'm a bit lost at this point. Ubuntu seems to be able to see the eGPU just fine, but Nvidia seemingly can't. Any help will be appreciated because right now I feel like I'm beating my head against my monitor with no progress.
did you ever get this working?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!