Any ideas what might be wrong with qemu 6.2 - after pve version upgrade passthrough of NVIDIA A100 stopped working,
...if we roll back to
In dmesg we see address conflicts,
Virtual machine configuration,
EDIT: It seems that forcing machine type to
Code:
root@gpu-test-vm ~ # nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
root@gpu-test-vm ~ # dmesg
[ 113.406581] nvidia-nvlink: Nvlink Core is being initialized, major device number 244
[ 113.406586] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:20b0)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 470.103.01 driver release.
NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
NVRM: in this release's README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.
[ 113.409224] nvidia: probe of 0000:01:00.0 failed with error -1
[ 113.409245] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 113.409246] NVRM: None of the NVIDIA devices were initialized.
[ 113.409952] nvidia-nvlink: Unregistered the Nvlink Core, major device number 244
...if we roll back to
apt install pve-qemu-kvm=6.1.1-2
and reboot vm passthrough is working again.In dmesg we see address conflicts,
Code:
root@gpu-test-vm ~ # diff -u dmesg_working dmesg_not_working | grep BAR
pci 0000:00:01.0: BAR 0: assigned to efifb
+pci 0000:00:1a.0: can't claim BAR 4 [io 0xd300-0xd31f]: address conflict with PCI Bus 0000:01 [io 0xd000-0xdfff]
+pci 0000:00:1a.1: can't claim BAR 4 [io 0xd2e0-0xd2ff]: address conflict with PCI Bus 0000:01 [io 0xd000-0xdfff]
+pci 0000:00:1a.2: can't claim BAR 4 [io 0xd2c0-0xd2df]: address conflict with PCI Bus 0000:01 [io 0xd000-0xdfff]
+pci 0000:00:1d.0: can't claim BAR 4 [io 0xd2a0-0xd2bf]: address conflict with PCI Bus 0000:01 [io 0xd000-0xdfff]
+pci 0000:00:1d.1: can't claim BAR 4 [io 0xd280-0xd29f]: address conflict with PCI Bus 0000:01 [io 0xd000-0xdfff]
+pci 0000:00:1d.2: can't claim BAR 4 [io 0xd260-0xd27f]: address conflict with PCI Bus 0000:01 [io 0xd000-0xdfff]
+pci 0000:00:1f.2: can't claim BAR 4 [io 0xd240-0xd25f]: address conflict with PCI Bus 0000:01 [io 0xd000-0xdfff]
+pci 0000:00:1f.3: can't claim BAR 4 [io 0xd200-0xd23f]: address conflict with PCI Bus 0000:01 [io 0xd000-0xdfff]
pci 0000:01:00.0: can't claim BAR 0 [mem 0xff000000-0xffffffff]: no compatible bridge window
pci 0000:01:00.0: can't claim BAR 1 [mem 0xfffffff000000000-0xffffffffffffffff 64bit pref]: no compatible bridge window
pci 0000:01:00.0: can't claim BAR 3 [mem 0xfffffffffe000000-0xffffffffffffffff 64bit pref]: no compatible bridge window
pci 0000:00:01.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]: no compatible bridge window
pci 0000:06:12.0: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no compatible bridge window
pci 0000:00:1c.0: BAR 15: assigned [mem 0x1000000000-0x27ffffffff 64bit pref]
-pci 0000:00:1c.0: BAR 14: assigned [mem 0x80000000-0x80ffffff]
-pci 0000:00:01.0: BAR 6: assigned [mem 0x81000000-0x8100ffff pref]
+pci 0000:00:1c.1: BAR 15: assigned [mem 0x800200000-0x8003fffff 64bit pref]
+pci 0000:00:1c.2: BAR 15: assigned [mem 0x800400000-0x8005fffff 64bit pref]
+pci 0000:00:1c.3: BAR 15: assigned [mem 0x800600000-0x8007fffff 64bit pref]
+pci 0000:00:01.0: BAR 6: assigned [mem 0x80000000-0x8000ffff pref]
+pci 0000:00:1f.3: BAR 4: assigned [io 0x1000-0x103f]
+pci 0000:00:1a.0: BAR 4: assigned [io 0x1040-0x105f]
+pci 0000:00:1a.1: BAR 4: assigned [io 0x1060-0x107f]
+pci 0000:00:1a.2: BAR 4: assigned [io 0x1080-0x109f]
+pci 0000:00:1d.0: BAR 4: assigned [io 0x10a0-0x10bf]
+pci 0000:00:1d.1: BAR 4: assigned [io 0x10c0-0x10df]
+pci 0000:00:1d.2: BAR 4: assigned [io 0x10e0-0x10ff]
+pci 0000:00:1f.2: BAR 4: assigned [io 0x1400-0x141f]
pci 0000:01:00.0: BAR 1: assigned [mem 0x1000000000-0x1fffffffff 64bit pref]
pci 0000:01:00.0: BAR 3: assigned [mem 0x2000000000-0x2001ffffff 64bit pref]
-pci 0000:01:00.0: BAR 0: assigned [mem 0x80000000-0x80ffffff]
+pci 0000:01:00.0: BAR 0: no space for [mem size 0x01000000]
+pci 0000:01:00.0: BAR 0: trying firmware assignment [mem 0xff000000-0xffffffff]
+pci 0000:01:00.0: BAR 0: assigned [mem 0xff000000-0xffffffff]
pci 0000:06:12.0: BAR 6: assigned [mem 0xc1640000-0xc167ffff pref]
Code:
agent: 1
args: -global q35-pcihost.pci-hole64-size=2048G
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 16
cpu: host
efidisk0: ceph-vm:vm-107-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:01:00,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 16384
meta: creation-qemu=6.1.0,ctime=1642774395
name: gpu-test
net0: virtio=8E:C7:92:8F:2D:4B,bridge=vmbr0,firewall=1,tag=1234
numa: 1
ostype: l26
scsi0: ceph-vm:vm-107-disk-1,discard=on,iothread=1,size=50G
scsihw: virtio-scsi-single
smbios1: uuid=3e37cffc-6534-4a81-a0c5-bec22ac4b228
snaptime: 1652715711
sockets: 1
vmgenid: 1ee48706-9eca-4477-b98a-2313901a473e
EDIT: It seems that forcing machine type to
pc-q35-6.1
also fixes the issue and BAR address conflicts disappear. Any ideas why pc-q35-6.2
is not working?
Last edited: