PCIe passthrough fails and drop into EFI shell

mimesot

Well-Known Member
Aug 17, 2017
36
4
48
Hi everyone!
I hope you can help me with a problem, which I cannot solve myself.
Since the last Proxmox VE update, which I performed a few days ago, I cannot boot up my Windows 10 VM any more (worked flawlessly for over a year now). Instead I drop into the EFI shell and I have no keyboard available. All I can do is stop the VM from outside.

My hardware:
  • CPU: Intel Xeon E3-1245 v6 (Quad core, Hyperthreading, 3,7 GHz, VD-x, VT-d, ECC)
  • RAM: 16GB ECC Crucial 2133MHz DDR4
  • Mainboard: MSI C236a
  • System SSD: Crucial MX300 525GB
What uses PCIe-Lanes (lspci excerpt):
  • 01:00 VGA Nvidia GTX 950 (passed through successfully to Win 10 VM, still works)
  • 02:00 USB controller: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller
    (passed through to Win 10 VM, successfully until recently,
    does not work any more, but still works when not being passed through)
  • 03:00 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 (on Mainboard, not in use)
  • 04:00 SATA controller: ASMedia Technology Inc. ASM1062 (not in use)
  • 05:00 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411
  • 06:00 Non-Volatile memory controller: Samsung Electronics (256GB passed through to Win 10 VM)
Operating Systems:
  • Debian 9.4 with Proxmox 5.1-51, Kernel 4.13.16-2-pve
  • Windows 10 Pro 64bit (with a passed through Nvidia-Card, USB-3-card, NVME-SSD)
This is my config of the Windows 10 VM (vm number 121):
Code:
balloon: 4096
bios: ovmf
boot: c
bootdisk: virtio0
cores: 6
efidisk0: local:121/vm-121-disk-3.qcow2,size=128K
hostpci0: 01:00,x-vga=on,pcie=1
hostpci1: 02:00,pcie=1
hostpci2: 06:00,pcie=1
ide2: local:iso/virtio-win-0.1.126.iso,media=cdrom,size=152204K
keyboard: de
machine: q35
memory: 12288
name: zirkon-vm-win
net0: virtio=9E:76:4B:34:E3:CE,bridge=vmbr0
numa: 0
ostype: win10
sata2: local:iso/Win10_1703_German_x64.iso,media=cdrom,size=4277286K
scsihw: virtio-scsi-pci
smbios1: uuid=2879b8af-d4a5-48d1-ba07-61130e1fddd4
sockets: 1
tablet: 0
usb0: host=1-9
virtio0: local:121/vm-121-disk-1.qcow2,cache=writeback,size=240G

Can I provide any more useful information?

Thanks a lot in advance!
Kind regards
Mimesot
 
does the syslog/dmesg say anything? bios/firmware update? iommu groups ok ?
 
Hi!
I found something interesting: I removed all PCIe pasthroughs in order to get back control using the Web-Interface. That worked, meaning that I have the keyboard working again, but I still drop to the EFI shell. Now, if I enter the EFI-BIOS typing exit and go into boot order no drives appear. I then detatched the storage and addid it back as IDE drive and then my disk appeared in the EFI-BIOS-Boot-Options. It didn't boot though.

My IOMMU groups didn't change, and I did not perform a Bios update.
Code:
find /sys/kernel/iommu_groups/ -type l | sort
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/10/devices/0000:00:1d.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.2
/sys/kernel/iommu_groups/11/devices/0000:00:1f.3
/sys/kernel/iommu_groups/11/devices/0000:00:1f.4
/sys/kernel/iommu_groups/12/devices/0000:00:1f.6
/sys/kernel/iommu_groups/13/devices/0000:03:00.0
/sys/kernel/iommu_groups/14/devices/0000:04:00.0
/sys/kernel/iommu_groups/15/devices/0000:05:00.0
/sys/kernel/iommu_groups/16/devices/0000:06:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.1
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/1/devices/0000:02:00.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:08.0
/sys/kernel/iommu_groups/4/devices/0000:00:14.0
/sys/kernel/iommu_groups/4/devices/0000:00:14.2
/sys/kernel/iommu_groups/5/devices/0000:00:16.0
/sys/kernel/iommu_groups/6/devices/0000:00:17.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.0
/sys/kernel/iommu_groups/8/devices/0000:00:1c.2
/sys/kernel/iommu_groups/9/devices/0000:00:1c.4

This is the Syslog, when I start up the Win10 VM (system name zirkon, vm id 121):
Code:
May 06 08:18:00 zirkon-pve systemd[1]: Starting Proxmox VE replication runner...
May 06 08:18:00 zirkon-pve systemd[1]: Started Proxmox VE replication runner.
May 06 08:18:02 zirkon-pve caribou[2314]: daemon.vala:120: error in focus handler: The process appears to be hung.
May 06 08:18:04 zirkon-pve pvedaemon[7055]: start VM 121: UPID:zirkon-pve:00001B8F:00B1749D:5AEE9E1C:qmstart:121:root@pam:
May 06 08:18:04 zirkon-pve pvedaemon[2353]: <root@pam> starting task UPID:zirkon-pve:00001B8F:00B1749D:5AEE9E1C:qmstart:121:root@pam:
May 06 08:18:04 zirkon-pve systemd[1]: Started 121.scope.
May 06 08:18:04 zirkon-pve systemd-udevd[7067]: Could not generate persistent MAC address for tap121i0: No such file or directory
May 06 08:18:04 zirkon-pve NetworkManager[1009]: <info> [1525587484.2694] manager: (tap121i0): new Tun device (/org/freedesktop/NetworkManager/Devices/10)
May 06 08:18:04 zirkon-pve NetworkManager[1009]: <info> [1525587484.2764] devices added (path: /sys/devices/virtual/net/tap121i0, iface: tap121i0)
May 06 08:18:04 zirkon-pve NetworkManager[1009]: <info> [1525587484.2764] device added (path: /sys/devices/virtual/net/tap121i0, iface: tap121i0): no ifupdown configuration found.
May 06 08:18:04 zirkon-pve kernel: device tap121i0 entered promiscuous mode
May 06 08:18:04 zirkon-pve NetworkManager[1009]: <info> [1525587484.5521] device (tap121i0): state change: unmanaged -> unavailable (reason 'connection-assumed') [10 20 41]
May 06 08:18:04 zirkon-pve NetworkManager[1009]: <info> [1525587484.5526] device (tap121i0): state change: unavailable -> disconnected (reason 'none') [20 30 0]
May 06 08:18:04 zirkon-pve kernel: vmbr0: port 2(tap121i0) entered blocking state
May 06 08:18:04 zirkon-pve kernel: vmbr0: port 2(tap121i0) entered disabled state
May 06 08:18:04 zirkon-pve kernel: vmbr0: port 2(tap121i0) entered blocking state
May 06 08:18:04 zirkon-pve kernel: vmbr0: port 2(tap121i0) entered forwarding state
May 06 08:18:04 zirkon-pve pvedaemon[2353]: <root@pam> end task UPID:zirkon-pve:00001B8F:00B1749D:5AEE9E1C:qmstart:121:root@pam: OK
May 06 08:18:05 zirkon-pve caribou[2314]: daemon.vala:120: error in focus handler: The process appears to be hung.
May 06 08:18:30 zirkon-pve systemd[1]: dev-disk-by\x2duuid-6E97\x2d17DA.device: Job dev-disk-by\x2duuid-6E97\x2d17DA.device/start timed out.
May 06 08:18:30 zirkon-pve systemd[1]: Timed out waiting for device /dev/disk/by-uuid/6E97-17DA.
May 06 08:18:30 zirkon-pve systemd[1]: Dependency failed for File System Check on /dev/disk/by-uuid/6E97-17DA.
May 06 08:18:30 zirkon-pve systemd[1]: Dependency failed for /boot/efi.
May 06 08:18:30 zirkon-pve systemd[1]: boot-efi.mount: Job boot-efi.mount/start failed with result 'dependency'.
May 06 08:18:30 zirkon-pve systemd[1]: systemd-fsck@dev-disk-by\x2duuid-6E97\x2d17DA.service: Job systemd-fsck@dev-disk-by\x2duuid-6E97\x2d17DA.service/start failed with result 'dependency'.
May 06 08:18:30 zirkon-pve systemd[1]: dev-disk-by\x2duuid-6E97\x2d17DA.device: Job dev-disk-by\x2duuid-6E97\x2d17DA.device/start failed with result 'timeout'.
May 06 08:18:33 zirkon-pve caribou[2314]: daemon.vala:120: error in focus handler: The process appears to be hung.

Thanks a lot!
 
May 06 08:18:04 zirkon-pve NetworkManager[1009]: <info> [1525587484.2694] manager: (tap121i0): new Tun device (/org/freedesktop/NetworkManager/Devices/10)
why is networkmanager installed? i guess this has nothing to do with your problem, but an interesting choice

May 06 08:18:05 zirkon-pve caribou[2314]: daemon.vala:120: error in focus handler: The process appears to be hung.
there seems to be a problem with this, not sure if it has something to do with the problem

May 06 08:18:30 zirkon-pve systemd[1]: dev-disk-by\x2duuid-6E97\x2d17DA.device: Job dev-disk-by\x2duuid-6E97\x2d17DA.device/start timed out.
here it seems that systemd misses an efi disk

are you sure you not booting from that passed through disk? (in, you use the efi partition of that disk for the host?)
 
Thanks for your answer!

why is networkmanager installed? i guess this has nothing to do with your problem, but an interesting choice
Well, that was definitively no deliberate choice. Perhaps a relic of my Debian installation? It regularly causes trouble with the reslov.conf, which is annoying to fix each time something changes about the network configuration.

are you sure you not booting from that passed through disk? (in, you use the efi partition of that disk for the host?)
I once reinstalled Windows 10. Maybe the NVME-disk was already passed through and Windows chose to put an EFI partition there. As far as I know there is no means of controlling the location of the bootloader from within the windows installer. But this was long ago and I used my Windows-VM many times in the meantime. Anyway, in order to get things working again, is it the best solution to detach the NVME drive, reinstall windows and then attach the NVME drive again?

Greetings
mimesot
 
I just wanted to mention: I just set up a new windows 10 VM and passed through that USB 3 card again. It works. This still leaves the question, why did passing through an USB keyboard to the EFI-shell and bios fail both ways, via PICe passthrough and via USB-Port passthrough, when the EFI disk is missing? Does the boot process freeze, when no EFI disk is detected and the search for USB-devices is not performed in consequence?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!