Hi there,
I have an issues getting to quadros running in seperate VMs. On one Quadro I get Code 43.
First I give you a heads up, what I did, what was working and isn't anymore after trying various fixes. I fixed it to the ground, hehe.
Hardware:
Fujitsu CELSIUS R920Power (without Monitor attached)
Quadro K4000
Quadro FX1800
I was running a Poxmox Setup for more than 2 years on that machine in Version Proxmox V3.0-20/0428106c, QEMU 1.7. GPU passthrough with the K4000 was working well.
I needed another GPU for another VM so I plugged a FX570. I did not get it to work. I thought the nvidia Driver was the blocking Point as this Card is not on the passthrough list from NVIDIA. Here i probably made my first mistake (never Change a running System) and did an "dist-upgrade" to get the "kvm=off" Option for the CPU in seabios. Did not help either.
Finally I got an FX1800 which is officially supported by NVIDIA. Et viola, working, I could virtualize both Quadros. BUT, heavy load VMs got unsable. I even upgraded to an 3.1xxxx pve-kernel. I did not work out, so I downgraded back to 2.6xxxx, I figured that it might had something to do with Caching, as the VMs hung on diskwrites, but not crashed. I disabled all Caches in the VMs, had the FX1800 passthrough to a Windows machine - working fine. Not testing the K4000 again.
After two weeks I needed the K4000 in an Linux VM, but I could not install it. As soon as X tried to Access the Driver module the VM crashed. I was looking for the Problem in the VM, because the Card was working in Windows VMs before.
I removed the Card and tested it outside the Server. All good.
I plugged it back in, physically removed the FX1800, passed the K4000 inside a Windows VM - Code 43 (what???). That was working before the kernel up-/downgrade.
I swapped PCI Slots, everything. Another strange Thing happend, when I plugged the FX1800 back in and installed the Driver in Windows I got a BSOD. After I enabled writeback Cache on virtio again, the BSOD went away, the Card was recognised - all good. If I disable now the Cache, everything still works even after shutdown/powerup of the VM. But only for the FX1800, the K4000 still throws Code 43.
I also tried different Windows VMs.
Sorry for the fairytale at the beginning. I just want to make sure you understand, that the K4000 was working, even after "dist-upgrade".
But after that I really tried a lot to bring back stability to the VMs, before I narrowed it down to the Cache Thing.
So that is my Setup now:
lspci -nn | grep "NVIDIA"
dmesg | grep "IOMMU"
dmesg | grep "claimed"
seabios config of Windows 7 VM
"cpu: host,kvm=off" makes no difference.
I have blacklisted "nvidia" and "nouveau" in the host, even if it doesn't matter with pci-stub enabled. I doesn't make a difference.
/etc/default/grub
I have no other Options currently, as I had no Interrupt Problems before.
vfio is not working with my kernel and with kernel 3.1xx I had Problems with IOMMU_groups that the Folders where not found or the modules were not loaded.
NVIDIA Driver in the VM is "309.08-quadro-tesla-win8-win7-winvista-64bit-international-whql.exe" which was working with both Cards before.
Currently I have no clue at all why the FX1800 is working without a Problem and the K4000 not.
I am a little bit reluctant to upgrade to proxmox 4.x as I might get other more severe Problems, e.g. nvidia reset Problem. Rebooting the host on a regular Basis is not possible.
I really hope that you can help me out. Maybe the way back to my Initial Setup is the way, but how do I do that?
Cheers,
Stefan
I have an issues getting to quadros running in seperate VMs. On one Quadro I get Code 43.
First I give you a heads up, what I did, what was working and isn't anymore after trying various fixes. I fixed it to the ground, hehe.
Hardware:
Fujitsu CELSIUS R920Power (without Monitor attached)
Quadro K4000
Quadro FX1800
I was running a Poxmox Setup for more than 2 years on that machine in Version Proxmox V3.0-20/0428106c, QEMU 1.7. GPU passthrough with the K4000 was working well.
I needed another GPU for another VM so I plugged a FX570. I did not get it to work. I thought the nvidia Driver was the blocking Point as this Card is not on the passthrough list from NVIDIA. Here i probably made my first mistake (never Change a running System) and did an "dist-upgrade" to get the "kvm=off" Option for the CPU in seabios. Did not help either.
Finally I got an FX1800 which is officially supported by NVIDIA. Et viola, working, I could virtualize both Quadros. BUT, heavy load VMs got unsable. I even upgraded to an 3.1xxxx pve-kernel. I did not work out, so I downgraded back to 2.6xxxx, I figured that it might had something to do with Caching, as the VMs hung on diskwrites, but not crashed. I disabled all Caches in the VMs, had the FX1800 passthrough to a Windows machine - working fine. Not testing the K4000 again.
After two weeks I needed the K4000 in an Linux VM, but I could not install it. As soon as X tried to Access the Driver module the VM crashed. I was looking for the Problem in the VM, because the Card was working in Windows VMs before.
I removed the Card and tested it outside the Server. All good.
I plugged it back in, physically removed the FX1800, passed the K4000 inside a Windows VM - Code 43 (what???). That was working before the kernel up-/downgrade.
I swapped PCI Slots, everything. Another strange Thing happend, when I plugged the FX1800 back in and installed the Driver in Windows I got a BSOD. After I enabled writeback Cache on virtio again, the BSOD went away, the Card was recognised - all good. If I disable now the Cache, everything still works even after shutdown/powerup of the VM. But only for the FX1800, the K4000 still throws Code 43.
I also tried different Windows VMs.
Sorry for the fairytale at the beginning. I just want to make sure you understand, that the K4000 was working, even after "dist-upgrade".
But after that I really tried a lot to bring back stability to the VMs, before I narrowed it down to the Cache Thing.
So that is my Setup now:
Code:
proxmox-ve-2.6.32: 3.4-166 (running kernel: 2.6.32-43-pve)
pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-43-pve: 2.6.32-166
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-5
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-34
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-14
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
lspci -nn | grep "NVIDIA"
Code:
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation G94GL [Quadro FX 1800] [10de:0638] (rev a1)
84:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK106GL [Quadro K4000] [10de:11fa] (rev a1)
84:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:0e0b] (rev a1)
dmesg | grep "IOMMU"
Code:
Intel-IOMMU: enabled
dmar: IOMMU 0: reg_base_addr fbffe000 ver 1:0 cap d2078c106f0462 ecap f020fe
dmar: IOMMU 1: reg_base_addr bfffc000 ver 1:0 cap d2078c106f0462 ecap f020fe
IOMMU 0xfbffe000: using Queued invalidation
IOMMU 0xbfffc000: using Queued invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:1d.0 [0x3cf56000 - 0x3cf63000]
IOMMU: Setting identity map for device 0000:00:1a.0 [0x3cf56000 - 0x3cf63000]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0x1000000]
dmesg | grep "claimed"
Code:
pci-stub 0000:02:00.0: claimed by stub
pci-stub 0000:84:00.0: claimed by stub
pci-stub 0000:84:00.1: claimed by stub
seabios config of Windows 7 VM
Code:
oot: cdn
bootdisk: virtio3
cores: 8
hostpci0: 84:00.0
ide2: none,media=cdrom
memory: 32768
name: TEMP
net0: e1000=3E:FB:4F:27:6E:45,bridge=vmbr0
numa: 0
ostype: other
sockets: 1
virtio0: local:108/vm-108-disk-1.vmdk,format=vmdk,cache=none,size=10G
virtio3: local:108/vm-108-disk-3.vmdk,format=vmdk,cache=none,size=100G
"cpu: host,kvm=off" makes no difference.
I have blacklisted "nvidia" and "nouveau" in the host, even if it doesn't matter with pci-stub enabled. I doesn't make a difference.
/etc/default/grub
Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox Virtual Environment"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
GRUB_CMDLINE_LINUX=""
I have no other Options currently, as I had no Interrupt Problems before.
vfio is not working with my kernel and with kernel 3.1xx I had Problems with IOMMU_groups that the Folders where not found or the modules were not loaded.
NVIDIA Driver in the VM is "309.08-quadro-tesla-win8-win7-winvista-64bit-international-whql.exe" which was working with both Cards before.
Currently I have no clue at all why the FX1800 is working without a Problem and the K4000 not.
I am a little bit reluctant to upgrade to proxmox 4.x as I might get other more severe Problems, e.g. nvidia reset Problem. Rebooting the host on a regular Basis is not possible.
I really hope that you can help me out. Maybe the way back to my Initial Setup is the way, but how do I do that?
Cheers,
Stefan