VM Reboot Issue - VM stuck on Proxmox start boot option screen

Jul 10, 2023
30
1
8
Hey All,

So this problem I've been having for a little while now is sometimes I will reboot a Virtual Machine and then it will fully shut down and come back and just get stuck 9/10ths of the way on the start boot option and a lot of times if it's say a Windows update scheduled at 12 am, i'll get in the morning at 8 am and it's still there just sitting like this.

Usually a forceful stop or reset will get it working again but I have no clue why this happens.

Another time a colleague of mine rebooted their alarm server during the day at 11 am that I'm hosting on a Proxmox Ceph cluster and an hour later they said that their server never came back up, logging into the cluster I just saw this same screen and i force reset it and it came back alive.

1727218519647.png

Here's the output of my pveversion


Code:
pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.12-2-pve)
pve-manager: 8.2.6 (running version: 8.2.6/414ce79a1d42d6bc)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-2
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx9
intel-microcode: 3.20231114.1~deb12u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.2
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.10
libpve-storage-perl: 8.2.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-4
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.2.0
pve-docs: 8.2.3
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.0.7
pve-firmware: 3.13-2
pve-ha-manager: 4.0.5
pve-i18n: 3.2.3
pve-qemu-kvm: 9.0.2-3
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.4
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1

Code:
ceph status
  cluster:
    id:     5fd31f6c-3f31-4fe2-bcc5-1f73aa608f8f
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum A,B,C (age 44h)
    mgr: A(active, since 44h), standbys: B, C
    osd: 9 osds: 9 up (since 44h), 9 in (since 6w)
 
  data:
    pools:   2 pools, 129 pgs
    objects: 335.86k objects, 1.3 TiB
    usage:   3.7 TiB used, 4.2 TiB / 7.9 TiB avail
    pgs:     129 active+clean
 
  io:
    client:   304 KiB/s rd, 432 KiB/s wr, 33 op/s rd, 48 op/s wr

And lastly the specific VM details:

Code:
cat /etc/pve/qemu-server/103.conf
agent: 1
bios: ovmf
boot: order=scsi0;ide0;net0;scsi1
cores: 4
cpu: host
efidisk0: cluster-storage:vm-103-disk-2,efitype=4m,pre-enrolled-keys=1,size=528K
machine: pc-q35-8.1
memory: 16384
meta: creation-qemu=8.1.2,ctime=1704766310
name: 103
net0: virtio=BC:24:11:8C:5A:75,bridge=vmbr0,firewall=1
numa: 1
onboot: 1
ostype: win10
scsi0: cluster-storage:vm-103-disk-0,cache=writeback,discard=on,iothread=1,serial='C-drive',size=60G,ssd=1
scsi1: cluster-storage:vm-103-disk-4,cache=writeback,discard=on,iothread=1,serial='D-drive',size=150G,ssd=1
scsi2: cluster-storage:vm-103-disk-3,cache=writeback,discard=on,iothread=1,serial='E-drive',size=150G,ssd=1
scsi3: cluster-storage:vm-103-disk-5,cache=writeback,discard=on,iothread=1,serial='F-drive',size=210G,ssd=1
scsi4: cluster-storage:vm-103-disk-6,cache=writeback,discard=on,iothread=1,serial='G-drive',size=140G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=66c36bf2-6ee8-43d5-b62c-a34c93befbc8
sockets: 2
vmgenid: 215fc570-a1b1-402f-8c93-09eb20805feb

I've removed any information that is considered sensitive from these. I'm welcome to any suggestions and potential troubleshooting steps.
 
Another thing I can add to this is sometimes when I reset or stop the VM and start it back up it will just sit on this screen

1727230775288.png
After around 2 or so minutes it comes right and boots correctly.
 
This is also happening to me on my Windows VMs.
It looks like it only happens on VM's that use EFI/OVMF boot.
Any ideas?
 
Check if it helps to use older Machine-Type...
1727682458807.png
 
  • Like
Reactions: Kingneutron
This is also happening to me on my Windows VMs.
It looks like it only happens on VM's that use EFI/OVMF boot.
Any ideas?
Yeah I agree with this sentiment, none of my BIOS Boot machines have this issue, only seems to be EFI based VMs.

Check if it helps to use older Machine-Type...
View attachment 75442

For me I have a fair number of Virtual Machines that all range from Version 7.0 to the latest which is 9.0 depending on when the VM was installed. I don't seem to notice any difference as they all at some point got stuck at the boot screen.

The thing is, I don't know how to replicate it, it just happens sometimes and I deal with it on a case by case basis, would be nice to know how to replicate the issue.
 
The thing is, I don't know how to replicate it, it just happens sometimes and I deal with it on a case by case basis, would be nice to know how to replicate the issue.

Same here. This is frustrating because you don't know when it will happen but it happens when you don't want it to happen, especially with Windows Server. I will try to find a way to monitor the VM and get a notification when it doesn't boot properly. I think ping doesn't work when it's stuck like that so that might be a good way of monitoring it and maybe even automate a restart when it happens.

This is also happening to me on my Windows VMs.
It looks like it only happens on VM's that use EFI/OVMF boot.
Any ideas?

Just like @complexplaster27 said, my VM's have different versions of the pc-q35 (7 to 9) and it happens randomly to all of them every once in a while.
 
Bumping this thread, issue is still happening, even on freshly created VMs after a reboot. Would love some further guidance on how to troubleshoot this.
 
Try to delete and recreate your EFI-Partition... this might help in case there is an issue with the "secure-boot" config in Windows....
 
  • Like
Reactions: carles89
Try to delete and recreate your EFI-Partition... this might help in case there is an issue with the "secure-boot" config in Windows....
That's not it. Among the things I have done, that's one of them. It didn't work.
 
Same here, fortunately it is happening on a virtualized test cluster.

I've restored a Windows 2019 VM from a backup to local-zfs, it booted the first time, moved it to Ceph storage and from the second boot onwards it got stuck at "Start boot option...", with VM's CPU at 100%.

Just to make sure, I've restored it again to local-zfs, booted it (first boot OK), stopped it, booted it again and got stuck at "Start boot option...".

That's a weird issue...

Code:
root@pve02:~# cat /etc/pve/qemu-server/104.conf 
#Windows 2019
agent: 1
bios: ovmf
boot: order=scsi0;ide0;net0
cores: 2
cpu: host
efidisk0: vmstorage-local:vm-104-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
ide0: none,media=cdrom
machine: pc-q35-8.1
memory: 2048
meta: creation-qemu=8.1.5,ctime=1711640555
name: Windows2022-b
net0: virtio=BC:24:11:3A:69:A5,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: vmstorage-local:vm-104-disk-1,discard=on,iothread=1,size=32G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=851361f8-c636-4aac-ba08-07bb506809b3
sockets: 1
tpmstate0: vmstorage-local:vm-104-disk-2,size=4M,version=v2.0
vmgenid: cfb90c14-7df0-4c33-9f30-9fc8e6cd325c

Code:
proxmox-ve: 8.2.0 (running kernel: 6.8.4-2-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.4-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.13-3-pve-signed: 6.5.13-3
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.1
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.2.2-1
proxmox-backup-file-restore: 3.2.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.0.11
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.6
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2
 
As @itNGO said, I've recreated EFI disk and it worked. I've also tried to recreate TPM state, but with EFI disk was enough.

The only thing that bothers me is why the VM boots one time after restore from backup and then refuses to boot...
 
  • Like
Reactions: itNGO
As @itNGO said, I've recreated EFI disk and it worked. I've also tried to recreate TPM state, but with EFI disk was enough.

The only thing that bothers me is why the VM boots one time after restore from backup and then refuses to boot...
I wonder why this would work, would that mean over 4 different clusters I would need to re-create the EFI disk for all the Virtual Machines? Seems a bit strange to have to do that, @Max Carrara could you advise if this is the best way to work around this?
 
I recreated the EFI disk on all my VM's and it looked like it fixed it but about a week later, it happened again.
The thing is that since it happens rarely, you think it's fixed... Until it happens again.
 
Also seeing this issue. Following.
I will say this, I noticed this happen after I was fussing with the ostype. By default when I restored this VM it sets type to 'other' even though it is Windows 10. So I had shut the vm down and switched it to win10 and tried to boot it. After that it was a total loss. Just hangs at same spot as original poster. My VM details:

agent: 1
audio0: device=ich9-intel-hda,driver=none
bios: ovmf
boot: order=sata0;net0;ide2
cores: 2
efidisk0: datastore1:104/vm-104-disk-0.raw,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: Mirror:iso/virtio-win-0.1.262.iso,media=cdrom,size=708140K
localtime: 1
machine: pc,viommu=virtio
memory: 12288
meta: creation-qemu=9.0.2,ctime=1730478599
name: robot
net0: virtio=BC:24:11:19:2E:81,bridge=vmbr0,firewall=1
sata0: datastore1:104/vm-104-disk-1.raw,size=60G
smbios1: uuid=be311465-e553-43ae-bd9e-ba99bca81130
sockets: 2
vga: std
vmgenid: 6e2b63ab-7db6-49f8-9090-942cafff3f0a
 
Last edited:
Just thought I would drop a note and say that restoring from a backup taken before I tweaked the OS from 'other' to 'Win 10', has worked and the VM is now usable again. I have no idea what that tweak did to disable the VM but even switching it back did not correct the boot problem.
 
Would be nice for a Proxmox official staff member to chime in, it doesn't seem like we have a solution for this problem and it may potentially affect any user.
 
I was following the official Proxmox Documentation for setting up the Windows VM and ran into the same issue (https://pve.proxmox.com/wiki/Windows_2022_guest_best_practices). After changing my cache type from "write-back" to "no-cache" the issue has been resolved. Hope that helps.
Hmm very interesting, since you mentioned it I've noticed that all my Virtual Machines that are set to Default (No cache) have never had this issue based on the task history as it's never had a stop/reset event going all the way back to May of this year.

I'll try this one on some of my most common Virtual Machines that experience this behavior and change the Cache mode.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!