Hi everyone,
I have an issue with VMs randomly freezing. IT happens multiple times a day at seemingly random times. Not all my VMs freeze at once but they are all affected at one time or another. All my VMs are Linux running different distros and different kernels. There is no information in the host syslog or in any of the guest logs.
Failure mode:
VM becomes totally unresponsive and CPU usage goes up to 100% on a single core. Proxmox will report 50% on VMs with 2 cores, 25% usage on VMs with 4 cores and so on.
All disk and network activity stops for that VM (according to my monitoring). The host has never frozen or become unstable.
Recovery:
Forcefully stop the VM then start it again.
Hardware:
Intel NUC 11 (NUC11ATKC4)
16 GB DDR4 RAM
M.2 SSD (ext4 filesystem, qcow2 virtual disks)
Realtek NIC chipset.
Things I've already tried:
As a workaround I am now using watchdog to restart the VMs after they hang. This is working but it's not a long term solution as I don't want my VMs to restart several times a day.
I have now spent many days on troubleshooting this so any info or idea you have will be highly appreciated.
Thanks
I have an issue with VMs randomly freezing. IT happens multiple times a day at seemingly random times. Not all my VMs freeze at once but they are all affected at one time or another. All my VMs are Linux running different distros and different kernels. There is no information in the host syslog or in any of the guest logs.
Failure mode:
VM becomes totally unresponsive and CPU usage goes up to 100% on a single core. Proxmox will report 50% on VMs with 2 cores, 25% usage on VMs with 4 cores and so on.
All disk and network activity stops for that VM (according to my monitoring). The host has never frozen or become unstable.
Recovery:
Forcefully stop the VM then start it again.
Hardware:
Intel NUC 11 (NUC11ATKC4)
16 GB DDR4 RAM
M.2 SSD (ext4 filesystem, qcow2 virtual disks)
Realtek NIC chipset.
Things I've already tried:
- Updating BIOS.
- Disabling C-states.
- All combinations of power management features from the BIOS.
- Disabling Intel SpeedStep and TurboBoost.
- Disabling suspend for PCIe devices.
- Changing machine type (to q35)
- VM BIOS: both legacy and UEFI
- Updating the kernels on both the host and guests. This has been happening from the moment I installed proxmox a few months ago and I always keep my kernels updated.
- memtest (long shot since everything else is working while a VM is frozen
As a workaround I am now using watchdog to restart the VMs after they hang. This is working but it's not a long term solution as I don't want my VMs to restart several times a day.
I have now spent many days on troubleshooting this so any info or idea you have will be highly appreciated.
Thanks