Black Screen Win 10 VM

jzuck74

Member
May 16, 2022
8
0
6
Hello,

I have a few VMs Windows 10 VM that will randomly get stuck and only show a black screen when consoling in and then ill be forced to reboot the machine to get it back up. Ive been searching the forum and found a couple things. Are there any other solutions to look at to stop this from happening?

I just installed the newest VirtIO drivers and also moved the nic to the VirtIO nic away from the intel e1000 nic to see if that helped.

Thank you for any help!
 
HI,
please post the config of the VM so we can see the current one qm config <VMID>, also your pveversion -v.
 
qm config 311
bios: ovmf
boot: order=ide0;ide2
cores: 2
efidisk0: storage:vm-311-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
ide0: storage:vm-311-disk-1,cache=writethrough,size=100G
ide2: none,media=cdrom
machine: pc-i440fx-6.2
memory: 8192
meta: creation-qemu=6.2.0,ctime=1659391227
name: 3.11
net0: virtio=02:14:8A:95:75:C8,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=47061853-d05d-489e-b789-582b4575c986
sockets: 1
vmgenid: 95557f23-398f-4984-89c5-1e27bf182313

pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.30-2-pve)
pve-manager: 7.2-3 (running version: 7.2-3/c743d6c1)
pve-kernel-helper: 7.2-2
pve-kernel-5.15: 7.2-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-8
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-6
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.2-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.1.8-1
proxmox-backup-file-restore: 2.1.8-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-10
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-1
pve-ha-manager: 3.3-4
pve-i18n: 2.7-1
pve-qemu-kvm: 6.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1
 
A few things to consider:
I have a few VMs Windows 10 VM that will randomly get stuck and only show a black screen
Is the VM in that case still reacable via the network (test if you get a ping, eventual services are still running, ...)?
 
Last edited:
A few things to consider:

Is the VM in that case still reacable via the network (test if you get a ping, eventual services are still running, ...)?

@Chris: all my Win10 VMs started doing this recently and it only affects OVMF ones. It will shows "Guest did not initialized the display yet", the standard resolution change appears and switches to black screen. When they start they will run for some time and just crash silently becoming unresponsive.
 
Hi,
@Chris: all my Win10 VMs started doing this recently and it only affects OVMF ones. It will shows "Guest did not initialized the display yet", the standard resolution change appears and switches to black screen. When they start they will run for some time and just crash silently becoming unresponsive.
did you change anything that could coincide with the "recently"? It seems that you are using nearly a year old packages. Can't tell if the problem will go away by upgrading, but applying (security) updates is certainly recommended, see: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_package_repositories
 
Hi,

did you change anything that could coincide with the "recently"? It seems that you are using nearly a year old packages. Can't tell if the problem will go away by upgrading, but applying (security) updates is certainly recommended, see: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_package_repositories
Hm, did I miss something? I was under impression I'm not using any obsolete packages, besides not being at 7.4 yet:


Code:
# pveversion
pve-manager/7.3-6/723bb6ec (running kernel: 6.1.15-1-pve)
root@hv-chi:~# apt update
Hit:1 http://ftp.us.debian.org/debian bullseye InRelease
Get:2 http://ftp.us.debian.org/debian bullseye-updates InRelease [44.1 kB]
Get:3 http://security.debian.org bullseye-security InRelease [48.4 kB]
Get:4 http://download.proxmox.com/debian/pve bullseye InRelease [2,661 B]
Get:5 http://security.debian.org bullseye-security/main amd64 Packages [236 kB]
Get:6 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages [377 kB]
Fetched 708 kB in 1s (1,105 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
19 packages can be upgraded. Run 'apt list --upgradable' to see them.

# apt list --upgradable
Listing... Done
libpve-access-control/stable 7.4-2 all [upgradable from: 7.3-2]
libpve-cluster-api-perl/stable 7.3-3 all [upgradable from: 7.3-2]
libpve-cluster-perl/stable 7.3-3 all [upgradable from: 7.3-2]
libpve-common-perl/stable 7.3-3 all [upgradable from: 7.3-2]
libpve-guest-common-perl/stable 4.2-4 all [upgradable from: 4.2-3]
libpve-http-server-perl/stable 4.2-1 all [upgradable from: 4.1-6]
libpve-rs-perl/stable 0.7.5 amd64 [upgradable from: 0.7.3]
libpve-storage-perl/stable 7.4-2 all [upgradable from: 7.3-2]
proxmox-ve/stable 7.4-1 all [upgradable from: 7.3-1]
proxmox-widget-toolkit/stable 3.6.3 all [upgradable from: 3.5.5]
pve-cluster/stable 7.3-3 amd64 [upgradable from: 7.3-2]
pve-container/stable 4.4-3 all [upgradable from: 4.4-2]
pve-docs/stable 7.4-2 all [upgradable from: 7.3-1]
pve-edk2-firmware/stable 3.20221111-2 all [upgradable from: 3.20221111-1]
pve-firewall/stable 4.3-1 amd64 [upgradable from: 4.2-7]
pve-ha-manager/stable 3.6.0 amd64 [upgradable from: 3.5.1]
pve-i18n/stable 2.11-1 all [upgradable from: 2.8-3]
pve-manager/stable 7.4-3 amd64 [upgradable from: 7.3-6]
qemu-server/stable 7.4-2 amd64 [upgradable from: 7.3-4]

As for changing things I recently was testing v6 kernel line. v6.1 was working great for me. During the last cycle of updates I grabbed v6.2 and a few packages (see below). I reverted the kernel to 6.1, so that's most likely not that. However, my suspect would be edk2, if any - I didn't have time to test reverting that yet:

Code:
# grep -E '(install|upgrade)\s' /var/log/dpkg.log
2023-03-02 02:34:31 upgrade curl:amd64 7.74.0-1.3+deb11u5 7.74.0-1.3+deb11u7
2023-03-02 02:34:31 upgrade libcurl4:amd64 7.74.0-1.3+deb11u5 7.74.0-1.3+deb11u7
2023-03-02 02:34:31 upgrade libcurl3-gnutls:amd64 7.74.0-1.3+deb11u5 7.74.0-1.3+deb11u7
2023-03-02 02:34:31 upgrade pve-qemu-kvm:amd64 7.1.0-4 7.2.0-5
2023-03-02 03:47:30 install cpuid:amd64 <none> 20201006-1
2023-03-04 19:05:42 install pve-kernel-6.1.10-1-pve:amd64 <none> 6.1.10-1
2023-03-04 19:05:53 install pve-kernel-6.1:all <none> 7.3-4
2023-03-18 18:21:43 install p7zip:amd64 <none> 16.02+dfsg-8
2023-03-18 18:21:43 install p7zip-full:amd64 <none> 16.02+dfsg-8
2023-03-18 19:38:21 install libhugetlbfs-bin:amd64 <none> 2.23-4
2023-03-19 02:11:08 install pve-kernel-6.2.6-1-pve:amd64 <none> 6.2.6-1
2023-03-19 02:11:20 install pve-kernel-6.2:all <none> 7.3-8
2023-03-22 00:12:22 upgrade libproxmox-acme-plugins:all 1.4.3 1.4.4
2023-03-22 00:12:23 upgrade libproxmox-acme-perl:all 1.4.3 1.4.4
2023-03-22 00:12:23 upgrade libpve-access-control:all 7.3-1 7.3-2
2023-03-22 00:12:23 upgrade libpve-http-server-perl:all 4.1-5 4.1-6
2023-03-22 00:12:23 upgrade lxc-pve:amd64 5.0.2-1 5.0.2-2
2023-03-22 00:12:23 upgrade novnc-pve:all 1.3.0-3 1.4.0-1
2023-03-22 00:12:24 upgrade pve-edk2-firmware:all 3.20220526-1 3.20221111-1
2023-03-22 00:12:24 upgrade pve-firmware:all 3.6-3 3.6-4
2023-03-22 00:12:31 upgrade pve-i18n:all 2.8-2 2.8-3
2023-03-22 00:12:31 install pve-kernel-5.15.102-1-pve:amd64 <none> 5.15.102-1
2023-03-22 00:12:42 upgrade pve-kernel-5.15:all 7.3-2 7.3-3
2023-03-22 00:12:42 install pve-kernel-6.1.15-1-pve:amd64 <none> 6.1.15-1
2023-03-22 00:12:54 upgrade pve-kernel-6.1:all 7.3-4 7.3-6
2023-03-22 00:12:54 upgrade pve-kernel-helper:all 7.3-4 7.3-8
2023-03-22 00:12:54 upgrade pve-qemu-kvm:amd64 7.2.0-5 7.2.0-8
2023-03-22 00:12:56 upgrade swtpm-tools:amd64 0.8.0~bpo11+2 0.8.0~bpo11+3
2023-03-22 00:12:56 upgrade swtpm:amd64 0.8.0~bpo11+2 0.8.0~bpo11+3
2023-03-22 00:12:56 upgrade swtpm-libs:amd64 0.8.0~bpo11+2 0.8.0~bpo11+3
2023-03-22 00:12:56 upgrade qemu-server:amd64 7.3-3 7.3-4

I narrowed down the black screen issue to a GPU passthrough. It only works once and then when attached to any VM it will make the VM unbottable with a black screen and no errors in logs. The only way to fix that is to do:

Code:
# echo 1 > /sys/bus/pci/devices/0000\:0b\:00.0/remove
# echo 1 > /sys/bus/pci/rescan

Normally I would blame GPU's firmware, but nothing has changed in that regard.
 
Hm, did I miss something? I was under impression I'm not using any obsolete packages, besides not being at 7.4 yet:
Oh, sorry, I completely missed that you were not the original poster (and that @Chris already suggested upgrading for @jzuck74).

However, my suspect would be edk2, if any - I didn't have time to test reverting that yet:
Yes, that is a likely candidate. How much RAM do your VMs with the issue have assigned? We had some reports where more RAM was needed with the new version (but for VMs that only had very little to begin with).

I narrowed down the black screen issue to a GPU passthrough. It only works once and then when attached to any VM it will make the VM unbottable with a black screen and no errors in logs. The only way to fix that is to do:

Code:
# echo 1 > /sys/bus/pci/devices/0000\:0b\:00.0/remove
# echo 1 > /sys/bus/pci/rescan

Normally I would blame GPU's firmware, but nothing has changed in that regard.
Good find! Yes, sounds like something goes wrong with releasing the device again.
 
Yes, that is a likely candidate. How much RAM do your VMs with the issue have assigned? We had some reports where more RAM was needed with the new version (but for VMs that only had very little to begin with).
That's most likely not it - one VM that does this has 1GB assigned but the other is 32GB. The bigger one uses static huge pages while the small one does not.

Good find! Yes, sounds like something goes wrong with releasing the device again.
Anything I can try to debug this? I'm guessing it's not the VFIO modules itself as these are tied to the kernel. I can try reverting EDK and separately QEMU. I also see there's also "pve-firmware" - could that be a suspect as well?
 
Anything I can try to debug this? I'm guessing it's not the VFIO modules itself as these are tied to the kernel. I can try reverting EDK and separately QEMU.
I'd first try downgrading the EDK package. AFAIK, the issue with debugging EDK2/OVMF is that it'd need to be enabled at build time and that is not done for the release build. There's likely going to be a newer version of the package in the not-to-far-future (still needs to go through internal testing). I'll try to remember to post here when it's out, but you can also keep a look on the pvetest repository.

I also see there's also "pve-firmware" - could that be a suspect as well?
That is the firmware for the host kernel.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!