I've recently upgraded my ML machine from an RTX 2080TI to an RTX 3090, and I've gotten a thunderbolt 3 egpu enclosure to house the 2080TI so that I can continue to use it alongside the new gpu. I've successfully managed to pass the 2080TI through to an Ubuntu guest, it shows up when using the lspci command, at least. And the built-in "Software & Updates" tool only seems to find it in the "Additional Drivers" tab (the 3090 is not shown there). However when running nvidia-smi only the 3090 is shown, and tensorflow only sees a single GPU as well. When using
(these errors are repeated many times)
I have to admit I'm a bit lost at this point. Ubuntu seems to be able to see the eGPU just fine, but Nvidia seemingly can't. Any help will be appreciated because right now I feel like I'm beating my head against my monitor with no progress.
dmesg |grep NVRM
the output is
[ 3.237597] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 470.57.02 Tue Jul 13 16:14:05 UTC 2021
[ 6.224747] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x24:0xffff:1220)
[ 6.228510] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 1
[ 6.328556] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x24:0xffff:1220)
[ 6.328791] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 1
...
(these errors are repeated many times)
I have to admit I'm a bit lost at this point. Ubuntu seems to be able to see the eGPU just fine, but Nvidia seemingly can't. Any help will be appreciated because right now I feel like I'm beating my head against my monitor with no progress.