GPU passthrough woes.....rm_init_adapter failed, device minor number 0

bubbacub

Member
May 22, 2020
2
0
6
50
Hi, am desperate for any advice!
Have an install of pve that went up this week - 6.2.
I've followed the guides to setup gpu passthrough and have got a VM booting with my gpu passed through.

Hardware is:
intel server board s3420gp, xeon x3430, asus p106-100 gpu - is a second hand mining card which essentially is a gtx1060 6gb without display outputs (uefi compatible). The idea is that I will use this for cuda accelerated tensorflow/pytorch deep learning in an ubuntu 20.04 lts guest - i.e. I dont need this to run as a display card.

The host binds to VFIO correctly and passes through to the guest. Nouveau installs correctly automatically - but is unfortunately useless as it doesn't support cuda.

My guest dmesg after I've installed the 440 nvidia drivers from the ubuntu repository shows:

NVRM (a pci address) RmInitAdapter failed!
NVRM (a pci address) rm_init_adapter failed, device minor number 0


my nvidia-smi command returns:

no devices found


I'm thinking that this could be the linux equivalent of the code43 error in windows - but I'm very open to suggestions from people who know more than me.

my grub addons are:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt rd.driver.pre=vfio-pci pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:eek:ff,efifb:eek:ff vfio_iommu_type1.allow_unsafe_interrupts=1"

not sure how much of this I need - the last line was crucial as the previous methods (documented in the guides on the proxmox website) do not function to allow unsafe interupts



my vm conf is:


args: -cpu 'host,hv_time,kvm=off,hv_vendor_id=1234567890ab' -machine type=q35,kernel_irqchip=on
bios: ovmf
bootdisk: scsi0
cores: 2
cpu: host,hidden=1,flags=+pcid
efidisk0: tensor:vm-100-disk-1,size=1M
hostpci0: 02:00,pcie=1,romfile=patch_vbios.bin
ide2: none,media=cdrom
machine: q35
memory: 6144
name: tensor
net0: virtio=BA:C7:1D:92:04:5C,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: tensor:vm-100-disk-0,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=17494963-eb27-48a1-998e-d011db8580f7
sockets: 1
vmgenid: 45a59a1f-a9f2-4cf4-b1b0-b0fb1db2fbfb



I have tried a variety of different configurations including rombar off, changing the vendor, I downloaded a bios from techpowerup which has been modded.

if anyone has any ideas - then please let me know!
 
Last edited:
I've had some passthrough devices that work best not on machine: q35

Have you tried default/i440fx instead of q35?
Then try again with and without the rom file.

Code:
args: -cpu 'host,hv_time,kvm=off,hv_vendor_id=1234567890ab'
bios: ovmf
bootdisk: scsi0
cores: 2
cpu: host,hidden=1,flags=+pcid
efidisk0: tensor:vm-100-disk-1,size=1M
hostpci0: 02:00
ide2: none,media=cdrom
memory: 6144
name: tensor
net0: virtio=BA:C7:1D:92:04:5C,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: tensor:vm-100-disk-0,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=17494963-eb27-48a1-998e-d011db8580f7
sockets: 1
vmgenid: 45a59a1f-a9f2-4cf4-b1b0-b0fb1db2fbfb
 
Hi,
Thanks for the help. Have tried your suggestions with and without the rom file - no change in the failure of the nvidia driver to load.
Have attached some more dmesg errors that might shed light on the problem.

dmesg | grep NVRM
[ 4.471449] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 440.64 Fri Feb 21 01:17:26 UTC 2020
[ 11.419497] NVRM: GPU 0000:00:10.0: RmInitAdapter failed! (0x31:0xffff:934)
[ 11.419954] NVRM: GPU 0000:00:10.0: rm_init_adapter failed, device minor number 0
[ 16.589480] NVRM: GPU 0000:00:10.0: RmInitAdapter failed! (0x31:0xffff:934)
[ 16.611047] NVRM: GPU 0000:00:10.0: rm_init_adapter failed, device minor number 0



dmesg | grep nvidia


[ 4.206159] nvidia: loading out-of-tree module taints kernel.
[ 4.206170] nvidia: module license 'NVIDIA' taints kernel.
[ 4.233027] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 4.250342] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[ 4.501413] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 440.64 Fri Feb 21 00:43:19 UTC 2020
[ 4.507954] [drm] [nvidia-drm] [GPU ID 0x00000010] Loading driver
[ 4.507964] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:00:10.0 on minor 1
[ 4.553682] nvidia-uvm: Loaded the UVM driver, major device number 237.