Multi GPU Passthrough - 4G decoding error?

Discussion in 'Proxmox VE: Installation and configuration' started by p.lakis, Dec 6, 2018.

  1. p.lakis

    p.lakis New Member

    Joined:
    Jul 11, 2018
    Messages:
    10
    Likes Received:
    0
    I can pass through 4 individual Tesla P100's to 4 VMs but when combining to pass through any number above 1 i get the following error when running - dmesg | grep NVRM

    1 of four works, but any amount over 1 the below output is produced.

    Code:
    admin@gpu-host:~$ dmesg | grep NVRM
    [    4.550588] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
                   NVRM: BAR0 is 0M @ 0x0 (PCI:0000:02:00.0)
    [    4.550589] NVRM: The system BIOS may have misconfigured your GPU.
    [    4.550843] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
                   NVRM: BAR0 is 0M @ 0x0 (PCI:0000:03:00.0)
    [    4.550844] NVRM: The system BIOS may have misconfigured your GPU.
    [    4.551092] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
                   NVRM: BAR0 is 0M @ 0x0 (PCI:0000:04:00.0)
    [    4.551093] NVRM: The system BIOS may have misconfigured your GPU.
    [    4.551108] NVRM: The NVIDIA probe routine failed for 3 device(s).
    [    4.551109] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  410.78  Sat Nov 10 22:09:04 CST 2018 (using threaded interrupts)
    
    I have seen this on Systems that cant decode above 4G,
    My VMID.conf is attached.

    Code:
    agent: 1
    bios: ovmf
    bootdisk: scsi0
    cores: 12
    cpu: host
    efidisk0: vm1028gq:vm-122-disk-1,size=128K
    hostpci0: 05:00,pcie=1,x-vga=on
    hostpci1: 06:00,pcie=1,x-vga=on
    hostpci2: 84:00,pcie=1,x-vga=on
    hostpci3: 85:00,pcie=1,x-vga=on
    hugepages: 2
    ide2: iso:iso/ubuntu-16.04.4-desktop-amd64.iso,media=cdrom
    machine: q35
    memory: 131072
    name: U16.04-Tensor-Box
    net0: virtio=DE:FC:7F:0B:27:04,bridge=vmbr1
    numa: 1
    ostype: l26
    scsi0: vm1028gq:vm-122-disk-2,cache=writethrough,size=200G
    scsihw: virtio-scsi-pci
    smbios1: uuid=65d62e28-3f97-430b-be89-68567bc9fc2b
    sockets: 2
    args: -machine pc,max-ram-below-4g=4G
    
    I have tried: 1G, 2G and 4G. All return the same NVRM errors, Is there something im missing?

    .
    Thanks
     
  2. dcsapak

    dcsapak Proxmox Staff Member
    Staff Member

    Joined:
    Feb 1, 2016
    Messages:
    2,924
    Likes Received:
    266
    i guess your 'args' line has no effect, since you specify q35 (so the machine gets overwritten by us again)
    there is an open bug for this https://bugzilla.proxmox.com/show_bug.cgi?id=1267

    you can try to get the qemu-commandline by 'qm showcmd ID --pretty' and
    add the ',max-ram-below-4g=X' to the "-machine 'type=q35'" part and execute that by hand, until the bug i mentioned is fixed
    alternatively, did you try without q35? (ofc you have to remove pcie=1 and x-vga=on then)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. p.lakis

    p.lakis New Member

    Joined:
    Jul 11, 2018
    Messages:
    10
    Likes Received:
    0
    In my VM config i added the following:

    Code:
    machine: q35,max-ram-below-4g=1G
    Which in return gives the following QEMU config:

    Code:
    # qm showcmd 152 --pretty
    /usr/bin/kvm \
      -id 152 \
      -name Base-Window10 \
      -chardev 'socket,id=qmp,path=/var/run/qemu-server/152.qmp,server,nowait' \
      -mon 'chardev=qmp,mode=control' \
      -pidfile /var/run/qemu-server/152.pid \
      -daemonize \
      -smbios 'type=1,uuid=c5b75794-9838-4a19-91c2-0ca588bcd49b' \
      -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' \
      -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,file=/dev/zvol/vm1028pool/vm-152-disk-2' \
      -smp '16,sockets=2,cores=8,maxcpus=16' \
      -nodefaults \
      -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
      -vga none \
      -nographic \
      -no-hpet \
      -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,kvm=off' \
      -m 64512 \
      -object 'memory-backend-ram,id=ram-node0,size=32256M' \
      -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' \
      -object 'memory-backend-ram,id=ram-node1,size=32256M' \
      -numa 'node,nodeid=1,cpus=8-15,memdev=ram-node1' \
      -readconfig /usr/share/qemu-server/pve-q35.cfg \
      -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
      -device 'vfio-pci,host=05:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0' \
      -device 'vfio-pci,host=06:00.0,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0' \
      -chardev 'socket,path=/var/run/qemu-server/152.qga,server,nowait,id=qga0' \
      -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' \
      -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' \
      -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
      -iscsi 'initiator-name=iqn.1993-08.org.debian:01:6b9a7558eb39' \
      -drive 'file=/mnt/pve/iso/template/iso/virtio-win-0.1.149.iso,if=none,id=drive-ide0,media=cdrom,aio=threads' \
      -device 'ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200' \
      -drive 'file=/mnt/pve/iso/template/iso/SW_DVD9_Win_Pro_Ent_Edu_N_10_1803_64BIT_English_-4_MLF_X21-87129.ISO,if=none,id=drive-ide2,media=cdrom,aio=threads' \
      -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=201' \
      -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
      -drive 'file=/dev/zvol/vm1028pool/vm-152-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' \
      -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
      -netdev 'type=tap,id=net0,ifname=tap152i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' \
      -device 'e1000,mac=DE:D6:56:85:F1:53,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' \
      -rtc 'driftfix=slew,base=localtime' \
      -machine 'type=q35,max-ram-below-4g=1G' \
      -global 'kvm-pit.lost_tick_policy=discard'
    
    Still no luck on getting the resources to the VM for the GPUs,

    Is there a way in QEMU to define the following?
    • Above 4G Decoding = Enabled
    • MMIOH Base = 256G
    • MMIO High Size = 128G
    This is how the BIOS is set up on the host system, If we could pass these over to QEMU then im confident we could get it to work?
    Im struggling to find any documentation on MMIOH and QEMU.

    EDIT:
    I want to add that this issue is only on these TESLA chips, Prior testing on a GeForce system allowed all 4 cards to be pass through without issue.

    Thanks
     
    #3 p.lakis, Dec 7, 2018
    Last edited: Dec 7, 2018
  4. dcsapak

    dcsapak Proxmox Staff Member
    Staff Member

    Joined:
    Feb 1, 2016
    Messages:
    2,924
    Likes Received:
    266
    does it work when you pass the 4 cards to 4 different vms?
    since this line appears on the host, maybe the host bios is really configuring the system not correctly...
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. p.lakis

    p.lakis New Member

    Joined:
    Jul 11, 2018
    Messages:
    10
    Likes Received:
    0
    Yes i can create 4 VMs with a single GPU each running concurrently.

    That line appears inside the VM, Which is part of the OVMF bios.
    If we could emulate the MMIO area in QEMU it could overcome this
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice