Random freezes for VM

ccorbin · Jun 25, 2024

Hi,

Issue Description:
For several months now I have had a problem with a VM in Proxmox, it is a SQL Windows server 2019. Initially, I found that when pinging the device it would randomly miss a ping, or just be delayed by several seconds *10-20s* sometimes longer depending on the workload (~150-250 users). After changing out the NIC for the server and doubling its RAM, the issue persisted. I also had a Linux bond with LACP applied to help with redundancy (removed to simplify troubleshooting). Once I tested and these did not resolve the issue, I found that the swap % was at 100% (7.9/8GB used). I resolved this by lowering swappiness and removing some RAM from the VM for the host but was still facing the same delays. Now I have been monitoring the syslog and performance of the machine to see what processes may be causing the random stalling, I found that the ps command is using 100% system and CPU at the time the system completely freezes (every process stops, disk r/w go to 0) this does not seem to correlate with the VM's activity from what I have monitored but could be relative.

Troubleshooting steps taken:
I used "pidstat -u 1 200" to find that the "ps" command was relative to the freeze. (please see attached image for results of this command) The process starts every 1-2 minutes and then completely dies.

Normally I would not be so concerned with this one missed ping, however, everyone who uses the application that this SQL server is servicing is reporting issues with their session hanging/freezing. I have witnessed this issue and have correlated it with this random freeze.
I have also,
Set up auditd to track invocations of ps.
Searched for scripts and cron jobs that might be invoking these commands. (none found)
Ensured no systemd timers are running these commands excessively. (none found)
Used tools like iotop and atop to monitor disk I/O, as high I/O from the SQL server VM could be related.

Hardware :
the server Chassis is a Dell R730XD
Storage controller PERC H730 mini

6 samsung ssd (870 evo) in RAID 10 (on the h730 mini) are being used for DATA *All new drives*
1 samsung ssd (870 evo) passed through as proxmox OS drive *new drive as well*

I have also ran the Dell diagnostic tools on the server to ensure none of the hardware is going bad or reporting an issue, it came back with a pass for each component.
We did have this server VM previously on the same hardware without issues the only difference is that it is on a PVE hypervisor as that is our standard practice, before it used ESXI.

I tried uploading the logs, however, the files could not be processed even when <3MB. Please let me know which logs you need and I will be happy to pull them.

Petbotson · Sep 14, 2024

Hey @ccorbin,

have you further troubleshooted this?
Can you post the vm config with

Code:

qm config vmid

kickfliph · May 22, 2025

ccorbin said:
Hi,

Issue Description:
For several months now I have had a problem with a VM in Proxmox, it is a SQL Windows server 2019. Initially, I found that when pinging the device it would randomly miss a ping, or just be delayed by several seconds *10-20s* sometimes longer depending on the workload (~150-250 users). After changing out the NIC for the server and doubling its RAM, the issue persisted. I also had a Linux bond with LACP applied to help with redundancy (removed to simplify troubleshooting). Once I tested and these did not resolve the issue, I found that the swap % was at 100% (7.9/8GB used). I resolved this by lowering swappiness and removing some RAM from the VM for the host but was still facing the same delays. Now I have been monitoring the syslog and performance of the machine to see what processes may be causing the random stalling, I found that the ps command is using 100% system and CPU at the time the system completely freezes (every process stops, disk r/w go to 0) this does not seem to correlate with the VM's activity from what I have monitored but could be relative.

Troubleshooting steps taken:
I used "pidstat -u 1 200" to find that the "ps" command was relative to the freeze. (please see attached image for results of this command) The process starts every 1-2 minutes and then completely dies.

View attachment 70363
View attachment 70364

Normally I would not be so concerned with this one missed ping, however, everyone who uses the application that this SQL server is servicing is reporting issues with their session hanging/freezing. I have witnessed this issue and have correlated it with this random freeze.
I have also,
Set up auditd to track invocations of ps.
Searched for scripts and cron jobs that might be invoking these commands. (none found)
Ensured no systemd timers are running these commands excessively. (none found)
Used tools like iotop and atop to monitor disk I/O, as high I/O from the SQL server VM could be related.

Hardware :
the server Chassis is a Dell R730XD
Storage controller PERC H730 mini

6 samsung ssd (870 evo) in RAID 10 (on the h730 mini) are being used for DATA *All new drives*

1 samsung ssd (870 evo) passed through as proxmox OS drive *new drive as well*

I have also ran the Dell diagnostic tools on the server to ensure none of the hardware is going bad or reporting an issue, it came back with a pass for each component.
We did have this server VM previously on the same hardware without issues the only difference is that it is on a PVE hypervisor as that is our standard practice, before it used ESXI.

I tried uploading the logs, however, the files could not be processed even when <3MB. Please let me know which logs you need and I will be happy to pull them.

Hello there,

I'm having the same problem. Did you find a way to fix it?

Regards,

Random freezes for VM

ccorbin

New Member

Petbotson

New Member

kickfliph

New Member

We value your privacy