I installed graphic card on my PVE hosts, and I wish my VM can use it. So I configured a PCI(e) passthrough on my PVE cluster. There are three nodes in my cluster , one node installed Tesla K40m, two nodes installed RTX 2080ti. First Tesla K40m node's configuration was pretty smoothly. After install CUDA in first node's VM and ran a demo on it, every thing is fine. Then I configured the second node which with RTX 2080ti installed in the same way with the first node. But when install GPU driver I found after I execute command nvidia-smi it displaied below:
The VGA device name is GV102 which is core code of TITAN V but not RTX 2080ti. And I execute same command on second node, the result is below:
I wonde is there any wrong with my second PCI(e) passthrough configuration? My cluster information and configuration like below:
Host /etc/default/grub content:
Host comman lspci -n -s execute command display:
Host /etc/modprobe.d/vfio.conf content:
Host /etc/modprobe.d/kvm.conf content:
Host /etc/modules conten:
Host /etc/modprobe.d/blacklist.conf content:
Host /etc/pve/qemu-server/103.conf conten:
And I executed lspci | grep -i nvidia on VM, it displaied below:Unable to determine the device handle for GPU 0000:01:00.0: Unknown Error
Code:
01:00.0 VGA compatible controller: NVIDIA Corporation GV102 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
01:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
Code:
82:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev a1)
82:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
82:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1)
82:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a1)
Node vendor & model | Dell PowerEdge R730xd |
CPU | E5-2682 v4 |
Kernel | 5-11-22-5-PVE |
PVE Version | 7.0-13 |
VM OS | 18.04.5 |
VM Kernel Version | 4.15.0-140 |
Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox VE"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
GRUB_CMDLINE_LINUX=""
GRUB_DISABLE_OS_PROBER=true
GRUB_DISABLE_RECOVERY="true
Code:
82:00.0 0300: 10de:1e07 (rev a1)
82:00.1 0403: 10de:10f7 (rev a1)
82:00.2 0c03: 10de:1ad6 (rev a1)
82:00.3 0c80: 10de:1ad7 (rev a1)
Code:
options vfio-pci ids="10de:1e07,10de:10f7,10de:1ad6,10de:1ad7"
Code:
options kvm ignore_msrs=1
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
Code:
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
Code:
boot: order=scsi0;net0
cores: 16
hostpci0: 0000:82:00,pcie=1
machine: q35
memory: 16384
name: myVM
net0: virtio=F6:50:E0:4F:0B:92,bridge=vmbr0,firewall=1
net1: virtio=12:9A:8B:A0:87:FD,bridge=vmbr1
numa: 1
onboot: 1
ostype: l26
scsi0: ceph_pool0:vm-103-disk-0,size=64G
scsihw: virtio-scsi-pci
smbios1: uuid=3f741261-13eb-42d0-a28d-b6f62f401019
sockets: 1
vmgenid: a1acd298-3e67-4f3a-b18a-4fc635e57993
Last edited: