I think I might have found a solution for 3 of our VMs. The VMs run Windows Server 2022 and had the following configuration when the problems was observed:
Code:
agent: 1
bios: ovmf
boot: order=scsi0;net0
cores: 2
cpu: host
machine: pc-q35-5.1
memory: 4096
meta: creation-qemu=6.2.0,ctime=1648638637
name: Win2022Test
net0: virtio=6A:78:4E:4E:5D:43,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win11
scsi0: proxmox:102/vm-102-disk-0.qcow2,size=50G
scsihw: virtio-scsi-pci
smbios1: uuid=047aac44-58ec-48c9-a917-957bbbcdf899
sockets: 1
tpmstate0: proxmox:102/vm-102-disk-0.raw,size=4M,version=v2.0
vmgenid: e66d0391-63d2-4e55-b933-426803a8f8d6
I found the following WIndows system logs in sequence during the reboot process of the VM:
- The operating system is shutting down at system time 2022-03-30T14:03:53.680178600Z.
- The operating system started at system time 2022-03-30T23:10:58.500000000Z.
- The last shutdown's success status was true. The last boot's success status was true.
These logs are from a VM rebooted from within Windows and which got stuck during the reboot. Windows seems to think the reboot was successfull, but notice the time difference between the log for "system is shutting down" and "system started"? Following up this strange difference in time (in reality these logs should have been produced a couple of seconds apart), I decided to try to disable RTC by adding the following to my VM configurtaion:
This can also be achieved by modifying the setting "Use local time for RTC" in Proxmox GUI.
After disabling RTC, I did the following:
- Started up VM with RTC disabled
- Configured correct timezone for VM (same as my Proxmox nodes use)
- Forced Windows to sync time with NTP
- Waited for 5 minutes before rebooting the VM (does not seem to work if I rebooted the VM immediately after NTP sync)
Following these steps seems to have resolved the reboot issues for the 3 VMs I have tried it on so far. Will try to use the same process on a couple of more VMs tomorrow in order to verify that this actually completely solved the problem for us.
It seems like somehow the RTC time for our VMs is 2 hours behind, even though all our nodes report the correct local time. However, as long as the clock in our VMs run in the local time, and not the time reported by RTC in this case, they don't seem to have any issues rebooting.