PCI(e) passthroug device on VM is different from PVE host

urc-ysh

Member
Nov 10, 2021
6
0
6
29
I installed graphic card on my PVE hosts, and I wish my VM can use it. So I configured a PCI(e) passthrough on my PVE cluster. There are three nodes in my cluster , one node installed Tesla K40m, two nodes installed RTX 2080ti. First Tesla K40m node's configuration was pretty smoothly. After install CUDA in first node's VM and ran a demo on it, every thing is fine. Then I configured the second node which with RTX 2080ti installed in the same way with the first node. But when install GPU driver I found after I execute command nvidia-smi it displaied below:
Unable to determine the device handle for GPU 0000:01:00.0: Unknown Error
And I executed lspci | grep -i nvidia on VM, it displaied below:
Code:
01:00.0 VGA compatible controller: NVIDIA Corporation GV102 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
01:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
The VGA device name is GV102 which is core code of TITAN V but not RTX 2080ti. And I execute same command on second node, the result is below:
Code:
82:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev a1)
82:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
82:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1)
82:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a1)
I wonde is there any wrong with my second PCI(e) passthrough configuration? My cluster information and configuration like below:
Node vendor & modelDell PowerEdge R730xd
CPUE5-2682 v4
Kernel5-11-22-5-PVE
PVE Version7.0-13
VM OS18.04.5
VM Kernel Version4.15.0-140
Host /etc/default/grub content:
Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox VE"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
GRUB_CMDLINE_LINUX=""
GRUB_DISABLE_OS_PROBER=true
GRUB_DISABLE_RECOVERY="true
Host comman lspci -n -s execute command display:
Code:
82:00.0 0300: 10de:1e07 (rev a1)
82:00.1 0403: 10de:10f7 (rev a1)
82:00.2 0c03: 10de:1ad6 (rev a1)
82:00.3 0c80: 10de:1ad7 (rev a1)
Host /etc/modprobe.d/vfio.conf content:
Code:
options vfio-pci ids="10de:1e07,10de:10f7,10de:1ad6,10de:1ad7"
Host /etc/modprobe.d/kvm.conf content:
Code:
options kvm ignore_msrs=1
Host /etc/modules conten:
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
Host /etc/modprobe.d/blacklist.conf content:
Code:
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
Host /etc/pve/qemu-server/103.conf conten:
Code:
boot: order=scsi0;net0
cores: 16
hostpci0: 0000:82:00,pcie=1
machine: q35
memory: 16384
name: myVM
net0: virtio=F6:50:E0:4F:0B:92,bridge=vmbr0,firewall=1
net1: virtio=12:9A:8B:A0:87:FD,bridge=vmbr1
numa: 1
onboot: 1
ostype: l26
scsi0: ceph_pool0:vm-103-disk-0,size=64G
scsihw: virtio-scsi-pci
smbios1: uuid=3f741261-13eb-42d0-a28d-b6f62f401019
sockets: 1
vmgenid: a1acd298-3e67-4f3a-b18a-4fc635e57993
 
Last edited:
can you post the output of 'pveversion -v' 'lspci -v' and 'dmesg' from both nodes with the 2080tis
 
can you post the output of 'pveversion -v' 'lspci -v' and 'dmesg' from both nodes with the 2080tis
And this is 2080ti node's pveversion -v display
proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph: 16.2.6-1~bpo11+1
ceph-fuse: 16.2.6-1~bpo11+1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-16
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
And this is k40m node's pveversion -v display
proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph: 16.2.6-1~bpo11+1
ceph-fuse: 16.2.6-1~bpo11+1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-16
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
 
i though you have 2 nodes with 2080tis? i wanted to see the commands from those two nodes, to see what is different...
 
i though you have 2 nodes with 2080tis? i wanted to see the commands from those two nodes, to see what is different...
Yes I do, but another 2080ti node I even diden't configured PCI(e) passthrough on it. You mean you recommand me try it on another 2080ti node?