Tesla K80 GPU passtrough driver error (device not found by nvidia-smi in VM)

Tutbjun · Aug 4, 2022

Hi everyone

I've undertaken a project of making a Proxmox server for machine learning and remote gaming, but I'm sort of stuck getting the Nvidia drivers to work on my Ubuntu 22.04 VM. The plan is to have a few VM's configured for either machine learning or playing games using different GPUs.
I am both new to this forum and a bit new to Linux, so please bear with me for any mistakes, and point out if I'm missing some info

So far I have successfully passed trough my 1060 to a Windows VM following the guide, but I can't seem to get the K80 working properly. The VM has been set up with both the two available GPU's from the K80, but i had a similar error before with a single K80 GPU VM.

My system has a MSI Z590 MB, Intel 10850k, a gtx 1060, and a Tesla K80.

I have mainly used this guide as a reference:
https://3os.org/infrastructure/prox...virtual-machine-gpu-passthrough-configuration

The only thing I have done inside the VM so far is to use the inbuilt "Software & Updates" to install the Nvidia 470 display driver.

The main symptom arises by running the

Code:

nvidia-smi

command:

Code:

No devices were found

Although the GPU's are listed when running

Code:

lspci -nnv

...

Code:

01:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
    Subsystem: NVIDIA Corporation GK210GL [Tesla K80] [10de:106c]
    Physical Slot: 0
    Flags: bus master, fast devsel, latency 0, IRQ 16
    Memory at c2000000 (32-bit, non-prefetchable) [size=16M]
    Memory at 1000000000 (64-bit, prefetchable) [size=32M]
    Capabilities: <access denied>
    Kernel modules: nvidiafb, nouveau

02:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
    Subsystem: NVIDIA Corporation GK210GL [Tesla K80] [10de:106c]
    Physical Slot: 0-2
    Flags: bus master, fast devsel, latency 0, IRQ 16
    Memory at c1000000 (32-bit, non-prefetchable) [size=16M]
    Memory at 1002000000 (64-bit, prefetchable) [size=32M]
    Capabilities: <access denied>
    Kernel modules: nvidiafb, nouveau

The best clue I have is this part from the

Code:

dmesg -w

command:

Code:

[    4.749614] resource sanity check: requesting [mem 0xc2700000-0xc36fffff], which spans more than PCI Bus 0000:01 [mem 0xc2000000-0xc2ffffff]
[    4.749619] caller os_map_kernel_space.part.0+0x97/0xa0 [nvidia] mapping multiple BARs
[    4.763172] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0xffff:1211)
[    4.763299] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

(same error on the other GPU; PCI bus 0000:02)

*full log in .txt file

The best suggestions I could find scouring the internet was to enable "above 4G decoding", and disable "CSM", both of which I have done in the host BIOS.

Any help or clues would be appreciated, as can't really find much info about this issue.

Update:
I've read through the forums a bit, and I found this very informative post by Lefuneste:
https://forum.proxmox.com/threads/problem-with-gpu-passthrough.55918/post-471013

Where it was helpfully pointed out that by running

Code:

cat /proc/iomem

in the host, there should be a line with "vfio-pci" the line under the GPU PCIE adress, which I don't get. Instead, I get nothing the line under my GPU adresses. In fact, when running

Code:

cat /proc/iomem | grep vfio

, I get nothing. Does this mean that the Nvidia drivers are succesfully blocked from grabbing my GPU's, but the vfio fails to get it?

doomonkee · Aug 15, 2022

Tutbjun said:
Hi everyone

I've undertaken a project of making a Proxmox server for machine learning and remote gaming, but I'm sort of stuck getting the Nvidia drivers to work on my Ubuntu 22.04 VM. The plan is to have a few VM's configured for either machine learning or playing games using different GPUs.
I am both new to this forum and a bit new to Linux, so please bear with me for any mistakes, and point out if I'm missing some info

So far I have successfully passed trough my 1060 to a Windows VM following the guide, but I can't seem to get the K80 working properly. The VM has been set up with both the two available GPU's from the K80, but i had a similar error before with a single K80 GPU VM.

My system has a MSI Z590 MB, Intel 10850k, a gtx 1060, and a Tesla K80.

I have mainly used this guide as a reference:
https://3os.org/infrastructure/prox...virtual-machine-gpu-passthrough-configuration

The only thing I have done inside the VM so far is to use the inbuilt "Software & Updates" to install the Nvidia 470 display driver.

The main symptom arises by running the

Code:

nvidia-smi

command:

Code:

No devices were found

Although the GPU's are listed when running

Code:

lspci -nnv

...

Code:

01:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1) Subsystem: NVIDIA Corporation GK210GL [Tesla K80] [10de:106c] Physical Slot: 0 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at c2000000 (32-bit, non-prefetchable) [size=16M] Memory at 1000000000 (64-bit, prefetchable) [size=32M] Capabilities: <access denied> Kernel modules: nvidiafb, nouveau 02:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1) Subsystem: NVIDIA Corporation GK210GL [Tesla K80] [10de:106c] Physical Slot: 0-2 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at c1000000 (32-bit, non-prefetchable) [size=16M] Memory at 1002000000 (64-bit, prefetchable) [size=32M] Capabilities: <access denied> Kernel modules: nvidiafb, nouveau

The best clue I have is this part from the

Code:

dmesg -w

command:

Code:

[ 4.749614] resource sanity check: requesting [mem 0xc2700000-0xc36fffff], which spans more than PCI Bus 0000:01 [mem 0xc2000000-0xc2ffffff] [ 4.749619] caller os_map_kernel_space.part.0+0x97/0xa0 [nvidia] mapping multiple BARs [ 4.763172] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0xffff:1211) [ 4.763299] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

(same error on the other GPU; PCI bus 0000:02)

*full log in .txt file

The best suggestions I could find scouring the internet was to enable "above 4G decoding", and disable "CSM", both of which I have done in the host BIOS.

Any help or clues would be appreciated, as can't really find much info about this issue.

Update:
I've read through the forums a bit, and I found this very informative post by Lefuneste:
https://forum.proxmox.com/threads/problem-with-gpu-passthrough.55918/post-471013

Where it was helpfully pointed out that by running

Code:

cat /proc/iomem

in the host, there should be a line with "vfio-pci" the line under the GPU PCIE adress, which I don't get. Instead, I get nothing the line under my GPU adresses. In fact, when running

Code:

cat /proc/iomem | grep vfio

, I get nothing. Does this mean that the Nvidia drivers are succesfully blocked from grabbing my GPU's, but the vfio fails to get it?

I wish I had an answer, but I will add myself as a +1 to having this exact issue.

doomonkee · Aug 17, 2022

Tutbjun said:
Hi everyone

I've undertaken a project of making a Proxmox server for machine learning and remote gaming, but I'm sort of stuck getting the Nvidia drivers to work on my Ubuntu 22.04 VM. The plan is to have a few VM's configured for either machine learning or playing games using different GPUs.
I am both new to this forum and a bit new to Linux, so please bear with me for any mistakes, and point out if I'm missing some info

So far I have successfully passed trough my 1060 to a Windows VM following the guide, but I can't seem to get the K80 working properly. The VM has been set up with both the two available GPU's from the K80, but i had a similar error before with a single K80 GPU VM.

My system has a MSI Z590 MB, Intel 10850k, a gtx 1060, and a Tesla K80.

I have mainly used this guide as a reference:
https://3os.org/infrastructure/prox...virtual-machine-gpu-passthrough-configuration

The only thing I have done inside the VM so far is to use the inbuilt "Software & Updates" to install the Nvidia 470 display driver.

The main symptom arises by running the

Code:

nvidia-smi

command:

Code:

No devices were found

Although the GPU's are listed when running

Code:

lspci -nnv

...

Code:

01:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1) Subsystem: NVIDIA Corporation GK210GL [Tesla K80] [10de:106c] Physical Slot: 0 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at c2000000 (32-bit, non-prefetchable) [size=16M] Memory at 1000000000 (64-bit, prefetchable) [size=32M] Capabilities: <access denied> Kernel modules: nvidiafb, nouveau 02:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1) Subsystem: NVIDIA Corporation GK210GL [Tesla K80] [10de:106c] Physical Slot: 0-2 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at c1000000 (32-bit, non-prefetchable) [size=16M] Memory at 1002000000 (64-bit, prefetchable) [size=32M] Capabilities: <access denied> Kernel modules: nvidiafb, nouveau

The best clue I have is this part from the

Code:

dmesg -w

command:

Code:

[ 4.749614] resource sanity check: requesting [mem 0xc2700000-0xc36fffff], which spans more than PCI Bus 0000:01 [mem 0xc2000000-0xc2ffffff] [ 4.749619] caller os_map_kernel_space.part.0+0x97/0xa0 [nvidia] mapping multiple BARs [ 4.763172] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0xffff:1211) [ 4.763299] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

(same error on the other GPU; PCI bus 0000:02)

*full log in .txt file

The best suggestions I could find scouring the internet was to enable "above 4G decoding", and disable "CSM", both of which I have done in the host BIOS.

Any help or clues would be appreciated, as can't really find much info about this issue.

Update:
I've read through the forums a bit, and I found this very informative post by Lefuneste:
https://forum.proxmox.com/threads/problem-with-gpu-passthrough.55918/post-471013

Where it was helpfully pointed out that by running

Code:

cat /proc/iomem

in the host, there should be a line with "vfio-pci" the line under the GPU PCIE adress, which I don't get. Instead, I get nothing the line under my GPU adresses. In fact, when running

Code:

cat /proc/iomem | grep vfio

, I get nothing. Does this mean that the Nvidia drivers are succesfully blocked from grabbing my GPU's, but the vfio fails to get it?

I have a separate problem but are you running "i440x" or "q35"? Do you have PCIE enabled with a UEFI boot for the VM?

bkinigadner · Jul 1, 2024

Does the problem still exist, because I have a working K80 running in Linux VMs on proxmox

Search

Search

Tesla K80 GPU passtrough driver error (device not found by nvidia-smi in VM)

Tutbjun

Member

Attachments

doomonkee

New Member

doomonkee

New Member

bkinigadner

Member