[SOLVED] GPU locked after creating VM

Novido

New Member
Dec 22, 2024
2
0
1
After creating a vm with gpu passthrough all my containers using the gpu no longer had access to the gpu. Neither did the host/node server.
I tried removing the vm but the problem persist.

I've tried reinstalling the drivers with no success.

Running
Code:
nvidia-smi
on the container just returns:
Failed to initialize NVML: GPU access blocked by the operating system

Doing the same command on the host I get:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Output from dmesg
root@server:~# dmesg | grep 0000:01
[ 0.373232] pci 0000:01:00.0: [10de:1e87] type 00 class 0x030000 PCIe Legacy Endpoint
[ 0.373246] pci 0000:01:00.0: BAR 0 [mem 0xa3000000-0xa3ffffff]
[ 0.373257] pci 0000:01:00.0: BAR 1 [mem 0x90000000-0x9fffffff 64bit pref]
[ 0.373268] pci 0000:01:00.0: BAR 3 [mem 0xa0000000-0xa1ffffff 64bit pref]
[ 0.373276] pci 0000:01:00.0: BAR 5 [io 0x3000-0x307f]
[ 0.373284] pci 0000:01:00.0: ROM [mem 0xa4000000-0xa407ffff pref]
[ 0.373308] pci 0000:01:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[ 0.373345] pci 0000:01:00.0: PME# supported from D0 D3hot
[ 0.373401] pci 0000:01:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x16 link at 0000:00:01.0 (capable of 126.016 Gb/s with 8.0 GT/s PCIe x16 link)
[ 0.373510] pci 0000:01:00.1: [10de:10f8] type 00 class 0x040300 PCIe Endpoint
[ 0.373523] pci 0000:01:00.1: BAR 0 [mem 0xa4080000-0xa4083fff]
[ 0.373637] pci 0000:01:00.2: [10de:1ad8] type 00 class 0x0c0330 PCIe Endpoint
[ 0.373653] pci 0000:01:00.2: BAR 0 [mem 0xa2000000-0xa203ffff 64bit pref]
[ 0.373669] pci 0000:01:00.2: BAR 3 [mem 0xa2040000-0xa204ffff 64bit pref]
[ 0.373720] pci 0000:01:00.2: PME# supported from D0 D3hot
[ 0.373777] pci 0000:01:00.3: [10de:1ad9] type 00 class 0x0c8000 PCIe Endpoint
[ 0.373788] pci 0000:01:00.3: BAR 0 [mem 0xa4084000-0xa4084fff]
[ 0.373850] pci 0000:01:00.3: PME# supported from D0 D3hot
[ 0.427949] pci 0000:01:00.0: vgaarb: setting as boot VGA device
[ 0.427949] pci 0000:01:00.0: vgaarb: bridge control possible
[ 0.427949] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[ 0.456170] pci_bus 0000:01: resource 0 [io 0x3000-0x3fff]
[ 0.456173] pci_bus 0000:01: resource 1 [mem 0xa3000000-0xa40fffff]
[ 0.456177] pci_bus 0000:01: resource 2 [mem 0x90000000-0xa20fffff 64bit pref]
[ 0.456698] pci 0000:01:00.1: extending delay after power-on from D3hot to 20 msec
[ 0.456729] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
[ 0.456780] pci 0000:01:00.2: D0 power state depends on 0000:01:00.0
[ 0.456989] pci 0000:01:00.3: D0 power state depends on 0000:01:00.0
[ 0.457291] pci 0000:01:00.0: Adding to iommu group 1
[ 0.457296] pci 0000:01:00.1: Adding to iommu group 1
[ 0.457302] pci 0000:01:00.2: Adding to iommu group 1
[ 0.457308] pci 0000:01:00.3: Adding to iommu group 1
[ 0.928281] nvidia-gpu 0000:01:00.3: enabling device (0000 -> 0002)
[ 0.948561] xhci_hcd 0000:01:00.2: xHCI Host Controller
[ 0.948567] xhci_hcd 0000:01:00.2: new USB bus registered, assigned bus number 3
[ 0.949162] xhci_hcd 0000:01:00.2: hcc params 0x0180ff05 hci version 0x110 quirks 0x0000000000000010
[ 0.949263] xhci_hcd 0000:01:00.2: xHCI Host Controller
[ 0.949267] xhci_hcd 0000:01:00.2: new USB bus registered, assigned bus number 4
[ 0.949271] xhci_hcd 0000:01:00.2: Host supports USB 3.1 Enhanced SuperSpeed
[ 0.949316] usb usb3: SerialNumber: 0000:01:00.2
[ 0.949511] usb usb4: SerialNumber: 0000:01:00.2
[ 4.016433] snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
[ 4.016504] snd_hda_intel 0000:01:00.1: Disabling MSI
[ 4.016509] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
[ 4.063613] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input16
[ 4.063682] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input17
[ 4.064458] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input18
[ 4.064560] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input19

Does anyone know what's wrong and how to unlock the gpu again?