GPU passthrough woes.....rm_init_adapter failed, device minor number 0

bubbacub

New Member
May 22, 2020
2
0
1
46
Hi, am desperate for any advice!
Have an install of pve that went up this week - 6.2.
I've followed the guides to setup gpu passthrough and have got a VM booting with my gpu passed through.

Hardware is:
intel server board s3420gp, xeon x3430, asus p106-100 gpu - is a second hand mining card which essentially is a gtx1060 6gb without display outputs (uefi compatible). The idea is that I will use this for cuda accelerated tensorflow/pytorch deep learning in an ubuntu 20.04 lts guest - i.e. I dont need this to run as a display card.

The host binds to VFIO correctly and passes through to the guest. Nouveau installs correctly automatically - but is unfortunately useless as it doesn't support cuda.

My guest dmesg after I've installed the 440 nvidia drivers from the ubuntu repository shows:

NVRM (a pci address) RmInitAdapter failed!
NVRM (a pci address) rm_init_adapter failed, device minor number 0


my nvidia-smi command returns:

no devices found


I'm thinking that this could be the linux equivalent of the code43 error in windows - but I'm very open to suggestions from people who know more than me.

my grub addons are:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt rd.driver.pre=vfio-pci pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:eek:ff,efifb:eek:ff vfio_iommu_type1.allow_unsafe_interrupts=1"

not sure how much of this I need - the last line was crucial as the previous methods (documented in the guides on the proxmox website) do not function to allow unsafe interupts



my vm conf is:


args: -cpu 'host,hv_time,kvm=off,hv_vendor_id=1234567890ab' -machine type=q35,kernel_irqchip=on
bios: ovmf
bootdisk: scsi0
cores: 2
cpu: host,hidden=1,flags=+pcid
efidisk0: tensor:vm-100-disk-1,size=1M
hostpci0: 02:00,pcie=1,romfile=patch_vbios.bin
ide2: none,media=cdrom
machine: q35
memory: 6144
name: tensor
net0: virtio=BA:C7:1D:92:04:5C,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: tensor:vm-100-disk-0,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=17494963-eb27-48a1-998e-d011db8580f7
sockets: 1
vmgenid: 45a59a1f-a9f2-4cf4-b1b0-b0fb1db2fbfb



I have tried a variety of different configurations including rombar off, changing the vendor, I downloaded a bios from techpowerup which has been modded.

if anyone has any ideas - then please let me know!
 
Last edited:

Republicus

Active Member
Aug 7, 2017
111
16
38
38
I've had some passthrough devices that work best not on machine: q35

Have you tried default/i440fx instead of q35?
Then try again with and without the rom file.

Code:
args: -cpu 'host,hv_time,kvm=off,hv_vendor_id=1234567890ab'
bios: ovmf
bootdisk: scsi0
cores: 2
cpu: host,hidden=1,flags=+pcid
efidisk0: tensor:vm-100-disk-1,size=1M
hostpci0: 02:00
ide2: none,media=cdrom
memory: 6144
name: tensor
net0: virtio=BA:C7:1D:92:04:5C,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: tensor:vm-100-disk-0,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=17494963-eb27-48a1-998e-d011db8580f7
sockets: 1
vmgenid: 45a59a1f-a9f2-4cf4-b1b0-b0fb1db2fbfb
 

bubbacub

New Member
May 22, 2020
2
0
1
46
Hi,
Thanks for the help. Have tried your suggestions with and without the rom file - no change in the failure of the nvidia driver to load.
Have attached some more dmesg errors that might shed light on the problem.

dmesg | grep NVRM
[ 4.471449] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 440.64 Fri Feb 21 01:17:26 UTC 2020
[ 11.419497] NVRM: GPU 0000:00:10.0: RmInitAdapter failed! (0x31:0xffff:934)
[ 11.419954] NVRM: GPU 0000:00:10.0: rm_init_adapter failed, device minor number 0
[ 16.589480] NVRM: GPU 0000:00:10.0: RmInitAdapter failed! (0x31:0xffff:934)
[ 16.611047] NVRM: GPU 0000:00:10.0: rm_init_adapter failed, device minor number 0



dmesg | grep nvidia


[ 4.206159] nvidia: loading out-of-tree module taints kernel.
[ 4.206170] nvidia: module license 'NVIDIA' taints kernel.
[ 4.233027] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 4.250342] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[ 4.501413] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 440.64 Fri Feb 21 00:43:19 UTC 2020
[ 4.507954] [drm] [nvidia-drm] [GPU ID 0x00000010] Loading driver
[ 4.507964] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:00:10.0 on minor 1
[ 4.553682] nvidia-uvm: Loaded the UVM driver, major device number 237.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!