Hi everybody,
I configured a VM with GPU passthrough on proxmox 7 with tesla T4 card
nvidia-smi stopped working and I can't figure out what went wrong
here is the output I get:
and some output from demsg:
I configured a VM with GPU passthrough on proxmox 7 with tesla T4 card
nvidia-smi stopped working and I can't figure out what went wrong
here is the output I get:
Code:
root@hyperviser:~# nvidia-smi
Failed to initialize NVML: Unknown Error
and some output from demsg:
Code:
root@hyperviser:~# dmesg | grep 5e:00
[ 1.040993] pci 0000:5e:00.0: [10de:1eb8] type 00 class 0x030200
[ 1.041004] pci 0000:5e:00.0: reg 0x10: [mem 0xb9000000-0xb9ffffff]
[ 1.041014] pci 0000:5e:00.0: reg 0x14: [mem 0xbfe0000000-0xbfefffffff 64bit pref]
[ 1.041023] pci 0000:5e:00.0: reg 0x1c: [mem 0xbff0000000-0xbff1ffffff 64bit pref]
[ 1.041265] pci 0000:5e:00.0: Enabling HDA controller
[ 1.041307] pci 0000:5e:00.0: PME# supported from D0 D3hot D3cold
[ 1.041334] pci 0000:5e:00.0: reg 0xbf0: [mem 0x00000000-0x0003ffff]
[ 1.041335] pci 0000:5e:00.0: VF(n) BAR0 space: [mem 0x00000000-0x003fffff] (contains BAR0 for 16 VFs)
[ 1.041344] pci 0000:5e:00.0: reg 0xbf4: [mem 0x00000000-0x0fffffff 64bit pref]
[ 1.041346] pci 0000:5e:00.0: VF(n) BAR1 space: [mem 0x00000000-0xffffffff 64bit pref] (contains BAR1 for 16 VFs)
[ 1.041354] pci 0000:5e:00.0: reg 0xbfc: [mem 0x00000000-0x01ffffff 64bit pref]
[ 1.041356] pci 0000:5e:00.0: VF(n) BAR3 space: [mem 0x00000000-0x1fffffff 64bit pref] (contains BAR3 for 16 VFs)
[ 1.041425] pci 0000:5e:00.0: 63.008 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x8 link at 0000:5d:00.0 (capable of 126.016 Gb/s with 8.0 GT/s PCIe x16 link)
[ 1.080219] pnp 00:01: disabling [mem 0xfed1c000-0xfed3ffff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080223] pnp 00:01: disabling [mem 0xfed45000-0xfed8bfff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080225] pnp 00:01: disabling [mem 0xff000000-0xffffffff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080227] pnp 00:01: disabling [mem 0xfee00000-0xfeefffff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080228] pnp 00:01: disabling [mem 0xfed12000-0xfed1200f] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080230] pnp 00:01: disabling [mem 0xfed12010-0xfed1201f] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080231] pnp 00:01: disabling [mem 0xfed1b000-0xfed1bfff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080854] pnp 00:04: disabling [mem 0xfd000000-0xfdabffff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080856] pnp 00:04: disabling [mem 0xfdad0000-0xfdadffff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080858] pnp 00:04: disabling [mem 0xfdb00000-0xfdffffff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080860] pnp 00:04: disabling [mem 0xfe000000-0xfe00ffff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080861] pnp 00:04: disabling [mem 0xfe011000-0xfe01ffff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080862] pnp 00:04: disabling [mem 0xfe036000-0xfe03bfff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080864] pnp 00:04: disabling [mem 0xfe03d000-0xfe3fffff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.080865] pnp 00:04: disabling [mem 0xfe410000-0xfe7fffff] because it overlaps 0000:5e:00.0 BAR 8 [mem 0x00000000-0xffffffff 64bit pref]
[ 1.094944] pci 0000:5e:00.0: BAR 8: no space for [mem size 0x100000000 64bit pref]
[ 1.094946] pci 0000:5e:00.0: BAR 8: failed to assign [mem size 0x100000000 64bit pref]
[ 1.094947] pci 0000:5e:00.0: BAR 10: no space for [mem size 0x20000000 64bit pref]
[ 1.094949] pci 0000:5e:00.0: BAR 10: failed to assign [mem size 0x20000000 64bit pref]
[ 1.094950] pci 0000:5e:00.0: BAR 7: no space for [mem size 0x00400000]
[ 1.094951] pci 0000:5e:00.0: BAR 7: failed to assign [mem size 0x00400000]
[ 1.094984] pci 0000:5e:00.0: BAR 1: assigned [mem 0xb000000000-0xb00fffffff 64bit pref]
[ 1.094991] pci 0000:5e:00.0: BAR 8: assigned [mem 0xb010000000-0xb10fffffff 64bit pref]
[ 1.094994] pci 0000:5e:00.0: BAR 3: assigned [mem 0xb110000000-0xb111ffffff 64bit pref]
[ 1.095000] pci 0000:5e:00.0: BAR 10: assigned [mem 0xb112000000-0xb131ffffff 64bit pref]
[ 1.095003] pci 0000:5e:00.0: BAR 0: assigned [mem 0xbb000000-0xbbffffff]
[ 1.095006] pci 0000:5e:00.0: BAR 7: assigned [mem 0xbc000000-0xbc3fffff]
[ 1.101725] pci 0000:5e:00.0: Adding to iommu group 79
[ 9.247094] NVRM: GPU at 0000:5e:00.0 has software scheduler DISABLED with policy BEST_EFFORT.
[ 9.848237] nvidia 0000:5e:00.0: Driver cannot be asked to release device
[ 9.848311] nvidia 0000:5e:00.0: MDEV: Registered
[ 14.185300] nvidia 0000:5e:00.0: MDEV: Unregistering
[ 27.192546] vfio-pci 0000:5e:00.0: vfio_cap_init: hiding cap 0x0@0x68
[ 27.192591] vfio-pci 0000:5e:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[ 27.192616] vfio-pci 0000:5e:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
root@hyperviser:~# dmesg | grep vfio
[ 5.212096] vfio_pci: invalid id string "10de:leb8"
[ 10.147214] vfio_mdev d77668c2-36bf-4c12-a6f7-b20b7c214384: Adding to iommu group 143
[ 10.147222] vfio_mdev d77668c2-36bf-4c12-a6f7-b20b7c214384: MDEV: group_id = 143
[ 14.185487] vfio_mdev d77668c2-36bf-4c12-a6f7-b20b7c214384: Removing from iommu group 143
[ 14.185498] vfio_mdev d77668c2-36bf-4c12-a6f7-b20b7c214384: MDEV: detaching iommu
[ 27.192546] vfio-pci 0000:5e:00.0: vfio_cap_init: hiding cap 0x0@0x68
[ 27.192591] vfio-pci 0000:5e:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[ 27.192616] vfio-pci 0000:5e:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Last edited: