Repeated VM crashes under memory pressure - swap thrashing causing SCSI timeouts

Cosmic7457 · 2025-11-25T15:30:46+0100

Problem Description

I'm experiencing systematic VM crashes on multiple Proxmox hosts when VMs face memory pressure. The crashes occur across different hosts and workloads and seem to follow the same pattern.

Environment

Proxmox VE version : 8.4.14 and 9.1.1
Affected VMs : Multiple VMs across different hosts (GitLab CI and MariaDB running Debian 12 & 13)
Storage backend : NVME Raid2 (2tb and 512gb)
Disk controller : SCSI (VirtIO SCSI)
Host specifications : Intel Xeon-E 2388G, 64gb

Symptoms

Crash pattern :

VM experiences high memory usage
System starts heavy swapping
I/O operations slow down dramatically
SCSI timeouts appear in logs (not always)
System becomes unresponsive and crashes, a VM restart is needed

Kernel logs show (not including the whole logs here) :

Code:

sd 1:0:0:0: [sda] tag#XXX ABORT operation started
sd 1:0:0:0: ABORT operation timed-out
sd 1:0:0:0: BUS RESET operation started
sym0: SCSI BUS reset detected

task:kswapd0 blocked for more than 120 seconds
task:khugepaged blocked for more than 120 seconds

Timeline :

MariaDB VM : crashes every ~12 hours during heavy operations
GitLab CI VM : crashes during concurrent job execution (a few times a week)
Other VMs on same hosts: no issues so far

Current Workaround

On the GitLab CI VM I solved the issue by allocating more RAM.
My MariaDB VM already had a lot of free RAM. Setting vm.swappiness=10 on it has completely resolved the issue, but this feels like treating symptoms rather than the root cause.

Questions

Is this a known interaction between swap pressure and VirtIO SCSI?
- Why would swap activity cause SCSI controller timeouts?
- Is there a timeout configuration that could be adjusted?
Storage backend configuration:
- Are there specific storage settings that could prevent this?
- Should I consider different disk controller types (SATA/IDE) for swap-heavy workloads?
Host-level optimizations:
- Any Proxmox-specific tuning to handle VM swap better?
- Should host swappiness also be reduced?
Long-term solution:
- Is low swappiness the recommended approach, or are there better alternatives?
- Should I simply increase VM RAM to avoid swap entirely?

Additional Context

The issue is reproducible across different Proxmox hosts
Only affects VMs under memory pressure, not regular operation
~15 production VMs - I would like to deploy preventive measures fleet-wide

What I've tried

Reducing swappiness (works but feels incomplete)
Monitoring to identify memory-hungry processes
Considering RAM increases for affected VMs

Search

Search

Repeated VM crashes under memory pressure - swap thrashing causing SCSI timeouts

Cosmic7457

New Member

Problem Description

Environment

Symptoms

Current Workaround

Questions

Additional Context

What I've tried

We value your privacy

Repeated VM crashes under memory pressure - swap thrashing causing SCSI timeouts

Cosmic7457

New Member

Problem Description​

Environment​

​

Symptoms​

​

Current Workaround​

Questions​

Additional Context​

What I've tried​

We value your privacy

Problem Description

Environment

Symptoms

Current Workaround

Questions

Additional Context

What I've tried