[SOLVED] VM start timeout - PCI Pass through related

glitch · Aug 23, 2016

Hi Everyone

I've successfully gotten my LSI 9211-8i passed through to my freenas VM, running Proxmox 4.2-17/e1400248. My issue is that when I start the vm from the gui or qm start 100 command I get the following:

Code:

root@artemis:~# qm start 100
start failed: command '/usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=b9d43e1e-586a-40bd-9fe1-190015671575' -name Akashic -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000' -vga cirrus -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 98304 -k en-us -readconfig /usr/share/qemu-server/pve-q35.cfg -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=09:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:9cdddb3d2cf5' -drive 'file=/dev/zvol/rpool/data/vm-100-disk-1,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=3A:63:30:35:37:35,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=q35'' failed: got timeout

However, if I run the /usr/bin/kvm... command from that error, it will start up fine, it seems that the delay is coming from the passthrough, because if i remove the hostpci entry it works.

My question is, can this timeout be modified easily, my google searches have been fruitless. I'm fine with it taking an extra 5 seconds to start, but currently I have to ssh in and start it manually.

Barring that, is there anything I can tweak to make the passthrough quicker?

The system is:
2x Xeon E5645
Supermicro X8DTN+
144Gb DDR3 ecc reg

/etc/pve/qemu-server/100.conf (the vm in question):

Code:

balloon: 49152
bootdisk: virtio0
cores: 4
hostpci0: 09:00.0
machine: q35
memory: 98304
name: Akashic
net0: bridge=vmbr0,virtio=3A:63:30:35:37:35
numa: 0
ostype: other
smbios1: uuid=b9d43e1e-586a-40bd-9fe1-190015671575
sockets: 1
virtio0: local-zfs:vm-100-disk-1,size=64G

dmesg | grep -e DMAR -e IOMMU

Code:

root@artemis:~# dmesg | grep -e DMAR -e IOMMU
[    0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[    0.000000] ACPI: DMAR 0x00000000BF77E0E0 000144 (v01 AMI    OEMDMAR  00000001 MSFT 00000097)
[    0.000000] DMAR: IOMMU enabled
[    0.170074] DMAR: Host address width 40
[    0.170075] DMAR: DRHD base: 0x000000fbffe000 flags: 0x1
[    0.170082] DMAR: dmar0: reg_base_addr fbffe000 ver 1:0 cap c90780106f0462 ecap f020fe
[    0.170083] DMAR: RMRR base: 0x000000000e6000 end: 0x000000000e9fff
[    0.170085] DMAR: RMRR base: 0x000000bf7ec000 end: 0x000000bf7fffff
[    0.170086] DMAR: ATSR flags: 0x0
[    0.170088] DMAR-IR: IOAPIC id 6 under DRHD base  0xfbffe000 IOMMU 0
[    0.170090] DMAR-IR: IOAPIC id 7 under DRHD base  0xfbffe000 IOMMU 0
[    0.170092] DMAR-IR: IOAPIC id 8 under DRHD base  0xfbffe000 IOMMU 0
[    0.170094] DMAR-IR: IOAPIC id 9 under DRHD base  0xfbffe000 IOMMU 0
[    0.170568] DMAR-IR: Enabled IRQ remapping in xapic mode
[    1.116536] DMAR: dmar0: Using Queued invalidation
[    1.116555] DMAR: Setting RMRR:
[    1.116585] DMAR: Setting identity map for device 0000:00:1a.0 [0xbf7ec000 - 0xbf7fffff]
[    1.116623] DMAR: Setting identity map for device 0000:00:1a.1 [0xbf7ec000 - 0xbf7fffff]
[    1.116658] DMAR: Setting identity map for device 0000:00:1a.2 [0xbf7ec000 - 0xbf7fffff]
[    1.116692] DMAR: Setting identity map for device 0000:00:1a.7 [0xbf7ec000 - 0xbf7fffff]
[    1.116726] DMAR: Setting identity map for device 0000:00:1d.0 [0xbf7ec000 - 0xbf7fffff]
[    1.116763] DMAR: Setting identity map for device 0000:00:1d.1 [0xbf7ec000 - 0xbf7fffff]
[    1.116797] DMAR: Setting identity map for device 0000:00:1d.2 [0xbf7ec000 - 0xbf7fffff]
[    1.116832] DMAR: Setting identity map for device 0000:00:1d.7 [0xbf7ec000 - 0xbf7fffff]
[    1.116854] DMAR: Setting identity map for device 0000:00:1a.0 [0xe6000 - 0xe9fff]
[    1.116868] DMAR: Setting identity map for device 0000:00:1a.1 [0xe6000 - 0xe9fff]
[    1.116883] DMAR: Setting identity map for device 0000:00:1a.2 [0xe6000 - 0xe9fff]
[    1.116897] DMAR: Setting identity map for device 0000:00:1a.7 [0xe6000 - 0xe9fff]
[    1.116912] DMAR: Setting identity map for device 0000:00:1d.0 [0xe6000 - 0xe9fff]
[    1.116926] DMAR: Setting identity map for device 0000:00:1d.1 [0xe6000 - 0xe9fff]
[    1.116940] DMAR: Setting identity map for device 0000:00:1d.2 [0xe6000 - 0xe9fff]
[    1.116955] DMAR: Setting identity map for device 0000:00:1d.7 [0xe6000 - 0xe9fff]
[    1.117014] DMAR: Prepare 0-16MiB unity mapping for LPC
[    1.117031] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[    1.117168] DMAR: Intel(R) Virtualization Technology for Directed I/O
[   11.497445] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[   11.497447] AMD IOMMUv2 functionality not available on this system

Thanks for any and all help, it is very much appreciated.

adamb · Aug 23, 2016

I have run into similar issues. Just for a test. Try reducing the amount of ram you are giving the VM to like 10G.

glitch · Aug 23, 2016

adamb said:
I have run into similar issues. Just for a test. Try reducing the amount of ram you are giving the VM to like 10G.

Interesting, that also resolves the issue but, obviously, for freenas is not a real solution.

Btw, nothing else is running on this machine right now, I'm just getting it setup.

thanks again

adamb · Aug 23, 2016

glitch said:
Interesting, that also resolves the issue but, obviously, for freenas is not a real solution.

Btw, nothing else is running on this machine right now, I'm just getting it setup.

thanks again

Yea ive yet to find a workaround for the issue, I typically can run 60-70G with PCI passthrough, anything more results in the issue you are seeing.

StarkWiz · Aug 23, 2016

I noticed you are using q35 machine type on which adding pcie=1 is important.
Try changing the hostpci0 to below.

Code:

hostpci0: 09:00.0,pcie=1

Also I hope the balloon parameter is not actually missing the b in the configuration as seen here.

adamb · Aug 23, 2016

Just as reference, this is were I left ths issue as I got busy with other things.

https://forum.proxmox.com/threads/pci-passthrough-issues-proxmox4.25504/#post-127864

glitch · Aug 23, 2016

StarkWiz said:
I noticed you are using q35 machine type on which adding pcie=1 is important.
Try changing the hostpci0 to below.

Code:

hostpci0: 09:00.0,pcie=1

Also I hope the balloon parameter is not actually missing the b in the configuration as seen here.

Apologies, I missed a char in my copy and paste, I've added pcie=1, (I had tried this previously, but removed it trying to get it to boot at all).

Code:

balloon: 49152
bootdisk: virtio0
cores: 4
hostpci0: 09:00.0,pcie=1
machine: q35
memory: 98304
name: Akashic
net0: bridge=vmbr0,virtio=3A:63:30:35:37:35
numa: 0
ostype: other
smbios1: uuid=b9d43e1e-586a-40bd-9fe1-190015671575
sockets: 1
virtio0: local-zfs:vm-100-disk-1,size=64G

This still throws the same error, but reducing to 8g of ram works:

Code:

balloon: 4096
bootdisk: virtio0
cores: 4
hostpci0: 09:00.0,pcie=1
machine: q35
memory: 8096
name: Akashic
net0: bridge=vmbr0,virtio=3A:63:30:35:37:35
numa: 0
ostype: other
smbios1: uuid=b9d43e1e-586a-40bd-9fe1-190015671575
sockets: 1
virtio0: local-zfs:vm-100-disk-1,size=64G

StarkWiz · Aug 23, 2016

I would like to suggest removing the balloon parameter and update the config as below.

Code:

numa: 1
hugepages: 2

glitch · Aug 23, 2016

I've made the change you suggested:

Code:

bootdisk: virtio0
cores: 4
hostpci0: 09:00.0,pcie=1
machine: q35
memory: 98304
name: Akashic
net0: bridge=vmbr0,virtio=3A:63:30:35:37:35
numa: 1
hugepages: 2
ostype: other
smbios1: uuid=b9d43e1e-586a-40bd-9fe1-190015671575
sockets: 1
virtio0: local-zfs:vm-100-disk-1,size=64G

However this is the result of qm start 100:

Code:

root@artemis:~# qm start 100
start failed: hugepage allocation failed at /usr/share/perl5/PVE/QemuServer/Memory.pm line 483.

glitch · Aug 23, 2016

I've tried a few other values for hugepages, not knowing exactly what it does, and get the same error, or an error similar to:

Code:

your system doesn't support hugepages of 16384kB at /usr/share/perl5/PVE/QemuServer/Memory.pm line 370.

Without hugepages, (only enabling NUMA), it times out.

Thanks again for your help!

StarkWiz · Aug 23, 2016

I see, I think it should be allocate by default unless there isn't enough memory available.
Can you please post output for this command
"cat /proc/meminfo | grep -i huge"

Also just to be sure, can you confirm if you've configured non-subscription or enterprise repository accordingly.
https://pve.proxmox.com/wiki/Package_repositories
Then run apt-get update and pveupgrade
To make sure all the packages are up-to-date.

glitch · Aug 23, 2016

The output of /proc/meminfo:

Code:

root@artemis:~# cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

I'm using the community repo, as this is just a home lab, I cant afford a monthly subscription

My system was already up to date:

Code:

root@artemis:~# apt-get update
Ign http://ftp.ca.debian.org jessie InRelease
Hit http://security.debian.org jessie/updates InRelease
Ign http://download.proxmox.com jessie InRelease
Hit http://ftp.ca.debian.org jessie Release.gpg
Hit http://download.proxmox.com jessie Release.gpg
Hit http://ftp.ca.debian.org jessie Release
Hit http://download.proxmox.com jessie Release
Hit http://ftp.ca.debian.org jessie/main amd64 Packages
Hit http://security.debian.org jessie/updates/main amd64 Packages
Hit http://ftp.ca.debian.org jessie/contrib amd64 Packages
Hit http://security.debian.org jessie/updates/contrib amd64 Packages
Hit http://ftp.ca.debian.org jessie/contrib Translation-en
Hit http://security.debian.org jessie/updates/contrib Translation-en
Hit http://ftp.ca.debian.org jessie/main Translation-en
Hit http://download.proxmox.com jessie/pve-no-subscription amd64 Packages
Hit http://security.debian.org jessie/updates/main Translation-en
Ign http://download.proxmox.com jessie/pve-no-subscription Translation-en_US
Ign http://download.proxmox.com jessie/pve-no-subscription Translation-en
Reading package lists... Done
root@artemis:~# pveupgrade
Starting system upgrade: apt-get dist-upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Your System is up-to-date

StarkWiz · Aug 23, 2016

I too recently started using proxmox in home lab and without subscription

Unfortunately I haven't added lot of memory yet to test your scenario.

Update the line below in /etc/default/grub (Please take backup of this file before updating if necessary, let me know if you had made any customization to this file before)

Code:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on hugepagesz=1G default_hugepagesz=2M"

Then run "update-grub" to load these update settings.

Restart the server and make sure hugepages is set to 2 in vm conf file. 2 basically means 2MB page size.
Try using less memory at first.
16GB memory: 16384
64GB memory: 65536
If this works with hugepages then go for 96 GB (memory: 98304).

This is the output I get when I have assigned 16GB memory to the VM with hugepages enabled.
I run a Windows VM though with GPU passthrough.

Code:

root@thunderbolt:~# cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
HugePages_Total:    8192
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:     8192
Hugepagesize:       2048 kB

glitch · Aug 23, 2016

I made the changes as suggested, and up to 64gb ram worked, and 96gb fails with:

Code:

start failed: hugepage allocation failed at /usr/share/perl5/PVE/QemuServer/Memory.pm line 483.

I've made the grub config match what you provided, previously it only had up to the IOMMU option.

With 16gb the output was:

Code:

root@artemis:~# cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
HugePages_Total:    8192
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:     8192
Hugepagesize:       2048 kB

with 64gb:

Code:

root@artemis:~# cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
HugePages_Total:   32768
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:    32768
Hugepagesize:       2048 kB

glitch · Aug 23, 2016

Just attempted to allocate more hugepages before starting and had the same result:

Code:

root@artemis:~# echo 49152 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
root@artemis:~# cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
HugePages_Total:   49152
HugePages_Free:    49152
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
root@artemis:~# qm start 100
start failed: hugepage allocation failed at /usr/share/perl5/PVE/QemuServer/Memory.pm line 483.

I did try a larger number of pages (53248) and got the same result.

StarkWiz · Aug 23, 2016

Trying random number on hugepages won't work only specific page size supported by your system works, its mostly either 2MB or 1GB for 64-bit linux. 1GB page size is supported only on Xeon and Intel HEDT processors.

Not exactly sure what must be causing it to fail when it goes above 64 GB, it maybe related to NUMA and proxmox code.
Also this may depend on how much memory is directly accessible to each CPU as you appear to have dual socket CPU configuration.

Can you change hugepages as below ? This will basically try to allocate 1GB hugepages instead of 2MB.

Code:

hugepages: 1024

glitch · Aug 23, 2016

Updated to hugepages: 1024

first run at 64gb, then upped to 96gb:

Code:

root@artemis:~# qm start 100
root@artemis:~# cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
root@artemis:~# qm stop 100
root@artemis:~# nano /etc/pve/qemu-server/100.conf
root@artemis:~# qm start 100
start failed: hugepage allocation failed at /usr/share/perl5/PVE/QemuServer/Memory.pm line 483.

then changed the grub config to default_hugepagesz=1G and repeated test with 64gb:

Code:

root@artemis:~# qm start 100
kvm: -object memory-backend-file,id=ram-node0,size=65536M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes: can't open backing store /run/hugepages/kvm/1048576kB for guest RAM: No such file or directory
start failed: command '/usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=b9d43e1e-586a-40bd-9fe1-190015671575' -name Akashic -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000' -vga cirrus -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 65536 -object 'memory-backend-file,id=ram-node0,size=65536M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' -numa 'node,nodeid=0,cpus=0-3,memdev=ram-node0' -k en-us -readconfig /usr/share/qemu-server/pve-q35.cfg -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=09:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:9cdddb3d2cf5' -drive 'file=/dev/zvol/rpool/data/vm-100-disk-1,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=3A:63:30:35:37:35,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=q35'' failed: exit code 1

StarkWiz · Aug 23, 2016

I am running out of suggestions but this is last one.
Try changing sockets to 2 instead of 1 in VM config and test it with 64GB and 96GB and hugepages set to 2 and 1024.

You normally dont need to change the default hugepage size as we had already added the 1GB parameter so it should be available when we want to use it.

StarkWiz · Aug 23, 2016

Re memory-backend error it seems 1GB hugepages is not allocated at boot time for some reason, there is a different process to do that.
Anyways page size doesn't appear to be important in this case, so lets stick to 2MB hugepages for now.

StarkWiz · Aug 23, 2016

I was just checking the qemu command line and noticed this parameter.
-numa 'node,nodeid=0,cpus=0-3,memdev=ram-node0'
It seems node0 is only being assigned to the VM which means CPU1 and memory directly accessible to CPU1 is only available.
Hopefully changing to dual socket configuration should fix the issue.

[SOLVED] VM start timeout - PCI Pass through related

New Member

Famous Member

New Member

Famous Member

New Member

Famous Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

We value your privacy