So, is it safe to assume that passthrough issues on non-Skylake systems have been resolved with the current 4.4.6-1-pve PVE kernel?
The reason I ask is, I have been fighting Nvidia GPU passthrough on my LGA1366 Xeon system, both in OVMF and Seabios mode and I am about to tear my hair out, as I have been following all the guides and it looks like it should be working, but it just doesn't... At this point I'm wondering to what extent anyone has this working in proxmox with the current kernel...
- if you get a Code 43 error in Windows 10, add a custom arg to your vm's .conf file ("args: -cpu host,kvm=off,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vendor_id=Nvidia43FIX")
After struggling for a bit, I have it working reliably on my i7-3770 / Z77 / GTX 960 / GTX 730 / Intel Integrated GPU system using the pvetest repos. I pass through the 960 to a Windows 10 VM, and the 730 to an Ubuntu 16.04 Gnome VM. The onboard GPU is used for Proxmox. There was a few tips and tricks that I learned along the way:
- use the acs=downstream option if your IOMMU groups don't split correctly (see Wiki)
- use q35 machine type (I had to under Proxmox, using Ubuntu host passthrough never needed this, don't know why they differ)
- use the new vfio-pci.ids boot option instead of the older pci-stub.ids (i.e. "vfio-pci.ids=8086:0151,10de:1401,10de:0fba" in grub cmdline in my case)
- use OVMF, forget Seabios as its older tech and doesn't work with vfio-pci.ids
- passthrough the card and the sound hardware separately as the Proxmox feature where 01:00 meant 01.00 + any other 01.00.x devices is broken
- if you get a Code 43 error in Windows 10, add a custom arg to your vm's .conf file ("args: -cpu host,kvm=off,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vendor_id=Nvidia43FIX")
- I didn't use the Wiki's suggested vfio.conf settings, I used kernel boot options instead
- The "options kvm ignore_msrs=1" was needed as well
I know the above is a little vague, so here's some actual files/details from my working system so you can compare/test/tweak. Good luck:
vm.conf:
agent: 1
args: -cpu host,kvm=off,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vendor_id=Nvidia43FIX
bios: ovmf
boot: c
bootdisk: virtio0
cores: 4
cpu: host
hostpci0: 01:00.0,x-vga=on,pcie=1
hostpci1: 01:00.1
machine: q35
memory: 8192
name: win10
net0: bridge=vmbr0,virtio=36:63:63:30:35:34
numa: 0
ostype: win8
smbios1: uuid=e0c4e97a-a4d7-4695-bd29-7ed3ee464bd8
sockets: 1
usb0: 2-1.5
usb1: 2-1.6
usb2: 1-1.4
usb3: 3-1.12
vga: std
virtio0: zvols:vm-100-disk-2,size=200G
virtio1: zvols:vm-100-disk-1,discard=on,iothread=1,size=512G
cat /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox Virtual Environment"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on pcie_acs_override=downstream vfio-pci.ids=8086:0151,10de:1401,10de:0fba"
GRUB_CMDLINE_LINUX=""
lspci:
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.4 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 5 (rev c4)
00:1c.5 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 6 (rev c4)
00:1c.6 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 7 (rev c4)
00:1c.7 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 8 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation Z77 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04)
01:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 0fba (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 730] (rev a1)
02:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)
04:00.0 Ethernet controller: Broadcom Corporation NetLink BCM57781 Gigabit Ethernet PCIe (rev 10)
05:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 03)
07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
08:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
pveversion -v:
proxmox-ve: 4.2-49 (running kernel: 4.4.8-1-pve)
pve-manager: 4.2-4 (running version: 4.2-4/2660193c)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-49
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-39
qemu-server: 4.0-74
pve-firmware: 1.1-8
libpve-common-perl: 4.0-60
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-50
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-16
pve-container: 1.0-63
pve-firewall: 2.0-26
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie
Not sure that it will help your specific issue, but after struggling for a while trying to get PCI passthrough NICs to work reliably, I found it essential to have memory ballooning disabled.
With memory ballooning enabled, about 95% packets were lost somewhere (kernel found and initialised the devices fine, but even ARP was hit and miss). Interrupts appeared to be getting through, so I'm assuming it causes some kind of DMA corruption.
06:03.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
Not needed anymore, proxmox already do it since januaryCode:- if you get a Code 43 error in Windows 10, add a custom arg to your vm's .conf file ("args: -cpu host,kvm=off,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vendor_id=Nvidia43FIX")
Much obliged for this information, unfortunately still a no go for me.
I duplicated your configs as much as possible, The only differences are :
1.) I'm on the Enterprise repo, not the pvetest repo
2.) I've used the vfio.conf method instead of the grub commands method.
q35 + pcie causes my ubuntu 16.04 vm to never come up (or at least not such that I can ssh in to it, and nothing displays on the screen attached to the passed through adapter. Removing q35 and pcie=1 the VM comes up, however I get the RMInit failure in dmesg, and Nvidia-SMI is unable to communicate with the adapter
Does your working ubuntu config differ at all from the Windows one above?
The kicker is, passthrough APPEARS to work to a 16.04 guest using seabios. I get console output to the attached screen, and the nouveau module appears to load just fine, yet, when I install the nvidia binary driver it fails with a "RnInitAdapter failed" message in dmesg.
I'd use nouveau, but the purpose of this passthrough is for VDPAU output, and nouveau still lists VP6 support (what the GT 720 has) as "TO DO".
I presume this has something to do with Nvidias pass-through blocking. Their pettiness in this regard just astounds me.
I also use the latest 4.4.8 kernel in the pvetest repo. I don't have an Enterprise subscription, so I'm not sure what version is installed there (I'm guessing an older, more stable release maybe?). Perhaps our difference is there.
That could be, yes.
Current enterprise repo kernel is 4.4.6-1-pve... hmm.. Trying to decide if I want to try the 4.4.8 kernel.
The changelog on kernel.org DOES mention a patch related to IOMMU errors in the 4.4.8 kernel, but the significance of it goes above my level of understanding.
Is there a way to tell apt to just install the 4.4.8 kernel from the pvetest repo, but keep everything else the way it is? I guess I could download the kernel deb from the repo and install it...
Thank you for that suggestion.
My I don't have "allocate memory dynamically within this range" enabled in the UI for the VM, (its set to static) but this still shows up in lspci inside the guest:
Code:06:03.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
Is there something else I need to disable?
you can test the 4.4.8 kernel from pve-no-subscription (either by temporarily enabling that repository and only installing that single package, or by wgetting the .deb and installing that).