Host randomly freezing after Win11Pro VM install, cannot access until hard reboot

jrock12

New Member
Dec 13, 2024
12
0
1
Hello - I am relatively new to Proxmox, but I had it running stable for quite awhile with no VM's.

I recently installed a windows 11 VM, and now the host will randomly freeze and when it does, the server itself will get really hot and fan running at full, until I hard reset.

I have no idea what is causing this, seems to happen over night. I originally did not have the QEMU agent running, but it is now and that didn't seem to help.

I attached the couple entries before it crashed and I rebooted from what I can see in the log. Appreciate any help you can offer. I copied the last few below.

Dec 13 06:21:22 proxmox01 systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Dec 13 06:21:23 proxmox01 systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Dec 13 06:21:23 proxmox01 systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Dec 13 06:21:39 proxmox01 pvedaemon[250814]: <root@pam> successful auth for user 'root@pam'
Dec 13 06:24:44 proxmox01 pveproxy[255062]: worker exit
Dec 13 06:24:44 proxmox01 pveproxy[1162]: worker 255062 finished
Dec 13 06:24:44 proxmox01 pveproxy[1162]: starting 1 worker(s)
Dec 13 06:24:44 proxmox01 pveproxy[1162]: worker 263045 started
Dec 13 06:25:01 proxmox01 CRON[263108]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Dec 13 06:25:01 proxmox01 CRON[263109]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Dec 13 06:25:01 proxmox01 CRON[263108]: pam_unix(cron:session): session closed for user root
Dec 13 06:31:00 proxmox01 pveproxy[255389]: worker exit
Dec 13 06:31:00 proxmox01 pveproxy[1162]: worker 255389 finished
Dec 13 06:31:00 proxmox01 pveproxy[1162]: starting 1 worker(s)
Dec 13 06:31:00 proxmox01 pveproxy[1162]: worker 264401 started
Dec 13 06:37:39 proxmox01 pvedaemon[250814]: <root@pam> successful auth for user 'root@pam'
Dec 13 06:53:39 proxmox01 pvedaemon[124373]: <root@pam> successful auth for user 'root@pam'
Dec 13 07:05:12 proxmox01 pveproxy[260282]: worker exit
Dec 13 07:05:12 proxmox01 pveproxy[1162]: worker 260282 finished
Dec 13 07:05:12 proxmox01 pveproxy[1162]: starting 1 worker(s)
Dec 13 07:05:12 proxmox01 pveproxy[1162]: worker 270610 started
-- Reboot --
 

Attachments

Within my Win11 VM, I have 4CPUs and 12 G Ram. I saved a screenshot of options and hardware. Host is HP Elitedesk G5 with 64GB of ram

1734095821581.png

1734095834595.png
 
How much memory does your proxmox host have,

What happens if you reduce the win 11 machne memory doen to 4 Gig

Does the host freeze when the windows 11 machine is not running.

I would turn off the win 11 machine, Install a linux vm with 4 gig of ram, run that and see how it goes. If it runs good without crashing the proxmox host, then I would shut down the linux box, reduce the win 11 vm memory down to 4 Gig and see if running win 11 with 4 gig will cause the crash.

Please let us know what happens

Ronney
 
  • Like
Reactions: Kingneutron
How much memory does your proxmox host have,

What happens if you reduce the win 11 machne memory doen to 4 Gig

Does the host freeze when the windows 11 machine is not running.

I would turn off the win 11 machine, Install a linux vm with 4 gig of ram, run that and see how it goes. If it runs good without crashing the proxmox host, then I would shut down the linux box, reduce the win 11 vm memory down to 4 Gig and see if running win 11 with 4 gig will cause the crash.

Please let us know what happens

Ronney
The host has 64GB ram, only 1 VM running which has 12. I haven't had an issue with the host freezing prior to running the win11 vm. Based on your suggestion, first I'll reduce the Win11 vm from 12 Gig down to 4 and see how that goes. On the side I'll install the linux vm so I can work towards step 2. Thanks for your help.
 
If you're using ZFS, limit your ARC size. I ran into oom-killer wih 24GB RAM on the host until doing this fix
I am using ZFS - also on the win11 vm it's set at write through.

As to the ARC size, what would you suggest as to a max (and/or min)? I'm still figuring out how to change that. I have 64GB total on the host, only running the one VM. I wonder if it would be better just to take the ZFS drive off the windows 11VM and use the local. I don't think that was an option when I set it up though...

1734115619195.png
 
Last edited:
Are you monitoring the CPU temperature? It may be a hardware-problem.
Does process on host-side take much CPU, especially KVM/QEMU? Check with top.
Do you have running programs needing much CPU in the Windows VM, a anti-virus for example?
 
Are you monitoring the CPU temperature? It may be a hardware-problem.
Does process on host-side take much CPU, especially KVM/QEMU? Check with top.
Do you have running programs needing much CPU in the Windows VM, a anti-virus for example?
I just checked my temps and everything is between 30 and 40 C, seems to be good. I haven't noticed any heat unless it freezes.

The host and the VM use little if any CPU....see below

The windows 11 VM is pretty much vanilla. I only installed it as I wanted to be able to remote into it from work in case I wanted to mess around in a browser or something.


1734120976612.png

1734121024906.png

1734121321731.png
 
I am using ZFS - also on the win11 vm it's set at write through.

As to the ARC size, what would you suggest as to a max (and/or min)? I'm still figuring out how to change that. I have 64GB total on the host, only running the one VM. I wonder if it would be better just to take the ZFS drive off the windows 11VM and use the local. I don't think that was an option when I set it up though...

View attachment 79083
It all depends. With spinners you generally want a bit more ARC for speed, but with SSD I set it all the way down to 1-2GB
 
Are you monitoring the CPU temperature? It may be a hardware-problem.
Does process on host-side take much CPU, especially KVM/QEMU? Check with top.
Do you have running programs needing much CPU in the Windows VM, a anti-virus for example?
I do have wireguard running directly on the windows vm...but I assume that wouldn't cause an issue
 
I just checked my temps and everything is between 30 and 40 C, seems to be good. I haven't noticed any heat unless it freezes.

The host and the VM use little if any CPU....see below

The windows 11 VM is pretty much vanilla. I only installed it as I wanted to be able to remote into it from work in case I wanted to mess around in a browser or something.
I assume you do not use tricks like PCIe passthrough the (integrated) GPU, which can causes some problems when incorrectly configured, or show some bugs in the hardware.
In case you are running out of memory, processes may be killed, but the kernel should keep responding. Processes not running as root should not accidentally kill the kernel.
So you have triggered a rare bug or have a flaky hardware. As it seems to "happen over night" I would try to disable some of the power-management first.
You can also run memtest86+ to check the RAM. A long time ago my laptop had problems and memcheck found errors after a 2 days.
 
I assume you do not use tricks like PCIe passthrough the (integrated) GPU, which can causes some problems when incorrectly configured, or show some bugs in the hardware.
In case you are running out of memory, processes may be killed, but the kernel should keep responding. Processes not running as root should not accidentally kill the kernel.
So you have triggered a rare bug or have a flaky hardware. As it seems to "happen over night" I would try to disable some of the power-management first.
You can also run memtest86+ to check the RAM. A long time ago my laptop had problems and memcheck found errors after a 2 days.
Actually I’m pretty sure I did do the pcie pass through haha. I won’t be able to access the server again for a couple weeks, (sadly didn’t put behind a vpn yet) but that will be the first thing I look at when I get back….
 
Hi jrock12;
I assume that the crash of your machine is due to two possible reasons: either an issue related to the machine's backups, or the fact that your server is a SQL Server database server, or even that you are running backups within the server itself. Both of these situations put tremendous stress on the disk and can cause the virtual machine (VM) to crash. You can address this issue as follows:

- The SCSI controller: VirtIO SCSI (or VirtIO SCSI Single)
- The disk should be of type VirtIO Block, with the following settings applied to each disk:
cache=writethrough (safe for database Server)
io thread=1
async io=threads

You can also try disabling 'Memory Ballooning' (optional).

Install the stable version of the Windows VirtIO drivers.
https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/stable-virtio/virtio-win.iso

From Shell edit file /etc/kernel/cmdfile :
Add mitigations=off at the end of the line (Not recommended for security reasons, but increases performance).

1. # nano /etc/kernel/cmdfile
root=ZFS=rpool/ROOT/pve-1 boot=zfs mitigations=off

2. # proxmox-boot-tool refresh

3. # reboot

This issue can be reproduced using the CrystalDiskMark tool inside Windows VM (in the Settings menu) for higher random load (up to Q32 T16).
Another solution is to use a CT (Container) instead of a VM for SQL Server on a Linux-based operating system, rather than on Windows.
 
Last edited:
Hi jrock12;
I assume that the crash of your machine is due to two possible reasons: either an issue related to the machine's backups, or the fact that your server is a SQL Server database server, or even that you are running backups within the server itself. Both of these situations put tremendous stress on the disk and can cause the virtual machine (VM) to crash. You can address this issue as follows:

- The SCSI controller: VirtIO SCSI (or VirtIO SCSI Single)
- The disk should be of type VirtIO Block, with the following settings applied to each disk:
cache=writethrough (safe for database Server)
io thread=1
async io=threads

You can also try disabling 'Memory Ballooning' (optional).

Install the stable version of the Windows VirtIO drivers.
https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/stable-virtio/virtio-win.iso

From Shell edit file /etc/kernel/cmdfile :
Add mitigations=off at the end of the line (Not recommended for security reasons, but increases performance).

1. # nano /etc/kernel/cmdfile
root=ZFS=rpool/ROOT/pve-1 boot=zfs mitigations=off

2. # proxmox-boot-tool refresh

3. # reboot

This issue can be reproduced using the CrystalDiskMark tool inside Windows VM (in the Settings menu) for higher random load (up to Q32 T16).
Another solution is to use a CT (Container) instead of a VM for SQL Server on a Linux-based operating system, rather than on Windows.
I appreciate the feedback. I’m not running an sql server though. It is proxmox on bare metal, with only one vm which is windows 11. I’m also not running backups, or if they are, it must be defaulted somehow because I hadn’t gotten around to setting that back up yet.

When I get back - per your direction I will change the scsi controller, as well as run those other commands and reboot
 
Last edited:
If the previous solution doesn't work, try changing the machine type to pc-i440fx instead of pc-q35.

Ensure that the computer's BIOS is up-to-date (Proxmox Host Server). Linux Kernel 6.8 may cause issues with older machines using Intel CPUs, but the chances are higher that a BIOS update will resolve it eventually.

Finally, if it’s still not working, try updating the CPU microcode within Debian (for Intel or AMD CPUs):

apt-get install intel-microcode

Note: Before installing the microcode update packages on your computer for the first time, it is recommended to check your system's vendor support site for BIOS/UEFI updates and apply those. Ensuring that the computer's BIOS/UEFI is up-to-date will reduce the chances of problems with the microcode update (although the chances are very low, they are not zero) and may also resolve other firmware bugs unrelated to the microcode.

Please install the amd64-microcode package for systems with AMD processors or the intel-microcode package for systems with Intel processors. You will need to enable both the contrib and non-free repositories in /etc/apt/sources.list.

Microcode updates are applied only at boot, so you will need to reboot your system to activate them. Keep the packages installed as described above to ensure that the microcode updates are reapplied at each boot.
 
Last edited:
  • Like
Reactions: jrock12
If the previous solution doesn't work, try changing the machine type to pc-i440fx instead of pc-q35.

Ensure that the computer's BIOS is up-to-date (Proxmox Host Server). Linux Kernel 6.8 may cause issues with older machines using Intel CPUs, but the chances are higher that a BIOS update will resolve it eventually.

Finally, if it’s still not working, try updating the CPU microcode within Debian (for Intel or AMD CPUs):

apt-get install intel-microcode

Note: Before installing the microcode update packages on your computer for the first time, it is recommended to check your system's vendor support site for BIOS/UEFI updates and apply those. Ensuring that the computer's BIOS/UEFI is up-to-date will reduce the chances of problems with the microcode update (although the chances are very low, they are not zero) and may also resolve other firmware bugs unrelated to the microcode.

Please install the amd64-microcode package for systems with AMD processors or the intel-microcode package for systems with Intel processors. You will need to enable both the contrib and non-free repositories in /etc/apt/sources.list.

Microcode updates are applied only at boot, so you will need to reboot your system to activate them. Keep the packages installed as described above to ensure that the microcode updates are reapplied at each boot.
Thank you - I will try this out !
 
Skimming through your logs - I see literally NO errors. So we can assume your server just dead-crashed.
This almost certainly will mean a HW problem.
Where did you get this i5-9500t? Is the CPU firmly seated in its chip-slot? Is the CPU-cooler / paste correctly installed?
That 25w i5-9500t in optimal conditions should never really get "very hot".
I'd also check the PSU, as over-heating can also be caused by a faulty or inadequate power supply.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!