PCI(e) passthroug device on VM is different from PVE host

urc-ysh

Member
Nov 10, 2021
6
0
6
28
I installed graphic card on my PVE hosts, and I wish my VM can use it. So I configured a PCI(e) passthrough on my PVE cluster. There are three nodes in my cluster , one node installed Tesla K40m, two nodes installed RTX 2080ti. First Tesla K40m node's configuration was pretty smoothly. After install CUDA in first node's VM and ran a demo on it, every thing is fine. Then I configured the second node which with RTX 2080ti installed in the same way with the first node. But when install GPU driver I found after I execute command nvidia-smi it displaied below:
Unable to determine the device handle for GPU 0000:01:00.0: Unknown Error
And I executed lspci | grep -i nvidia on VM, it displaied below:
Code:
01:00.0 VGA compatible controller: NVIDIA Corporation GV102 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
01:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
The VGA device name is GV102 which is core code of TITAN V but not RTX 2080ti. And I execute same command on second node, the result is below:
Code:
82:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev a1)
82:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
82:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1)
82:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a1)
I wonde is there any wrong with my second PCI(e) passthrough configuration? My cluster information and configuration like below:
Node vendor & modelDell PowerEdge R730xd
CPUE5-2682 v4
Kernel5-11-22-5-PVE
PVE Version7.0-13
VM OS18.04.5
VM Kernel Version4.15.0-140
Host /etc/default/grub content:
Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox VE"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
GRUB_CMDLINE_LINUX=""
GRUB_DISABLE_OS_PROBER=true
GRUB_DISABLE_RECOVERY="true
Host comman lspci -n -s execute command display:
Code:
82:00.0 0300: 10de:1e07 (rev a1)
82:00.1 0403: 10de:10f7 (rev a1)
82:00.2 0c03: 10de:1ad6 (rev a1)
82:00.3 0c80: 10de:1ad7 (rev a1)
Host /etc/modprobe.d/vfio.conf content:
Code:
options vfio-pci ids="10de:1e07,10de:10f7,10de:1ad6,10de:1ad7"
Host /etc/modprobe.d/kvm.conf content:
Code:
options kvm ignore_msrs=1
Host /etc/modules conten:
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
Host /etc/modprobe.d/blacklist.conf content:
Code:
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
Host /etc/pve/qemu-server/103.conf conten:
Code:
boot: order=scsi0;net0
cores: 16
hostpci0: 0000:82:00,pcie=1
machine: q35
memory: 16384
name: myVM
net0: virtio=F6:50:E0:4F:0B:92,bridge=vmbr0,firewall=1
net1: virtio=12:9A:8B:A0:87:FD,bridge=vmbr1
numa: 1
onboot: 1
ostype: l26
scsi0: ceph_pool0:vm-103-disk-0,size=64G
scsihw: virtio-scsi-pci
smbios1: uuid=3f741261-13eb-42d0-a28d-b6f62f401019
sockets: 1
vmgenid: a1acd298-3e67-4f3a-b18a-4fc635e57993
 
Last edited:
can you post the output of 'pveversion -v' 'lspci -v' and 'dmesg' from both nodes with the 2080tis
 
can you post the output of 'pveversion -v' 'lspci -v' and 'dmesg' from both nodes with the 2080tis
I'm so sorry, Ididn't mentioned the attach file button. I'll upload the text fileof above message.
 

Attachments

  • k40_dmesg.txt
    178.5 KB · Views: 2
  • 2080_dmsg.txt
    143 KB · Views: 1
  • k40_lspci.txt
    53 KB · Views: 1
  • 2080_lspci.txt
    54.4 KB · Views: 1
can you post the output of 'pveversion -v' 'lspci -v' and 'dmesg' from both nodes with the 2080tis
And this is 2080ti node's pveversion -v display
proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph: 16.2.6-1~bpo11+1
ceph-fuse: 16.2.6-1~bpo11+1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-16
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
And this is k40m node's pveversion -v display
proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph: 16.2.6-1~bpo11+1
ceph-fuse: 16.2.6-1~bpo11+1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-16
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
 
i though you have 2 nodes with 2080tis? i wanted to see the commands from those two nodes, to see what is different...
 
i though you have 2 nodes with 2080tis? i wanted to see the commands from those two nodes, to see what is different...
Yes I do, but another 2080ti node I even diden't configured PCI(e) passthrough on it. You mean you recommand me try it on another 2080ti node?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!