[SOLVED] Stop Mode Backup: "Dirty Shutdown" on Windows VMs with Database Services - Complete Timeout Chain Fix

felbad

New Member
Apr 19, 2024
1
0
1
The Problem : Stop Mode backups on Windows VMs running databases (SQL Server, MongoDB, Elasticsearch, Redis, etc.) fail with:

ERROR: VM quit/powerdown failed
timeout waiting on systemd

After backup, services won't start → Event Viewer shows "Service did not respond in a timely fashion"

---

Root Cause

Cascading timeout failure across 3 layers:

Proxmox (60s) → kills VM before Windows finishes

Windows (20s) → kills services before they close properly

Windows (30s) → kills services during recovery after restart

Result: Dirty shutdown → corrupted files → recovery timeout → service fails

---

✅ The Solution

Increase timeouts at all 3 layers to give services time to close cleanly and recover properly.

Layer 1: Proxmox

VM → Options → Shutdown timeout: 300 seconds

Layer 2: Windows Shutdown Timeout

Registry: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control
Name: WaitToKillServiceTimeout
Type: REG_SZ (String)
Value: 120000 (2 minutes, default is 20s)

Layer 3: Windows Service Startup Timeout

Registry: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control
Name: ServicesPipeTimeout
Type: REG_DWORD
Value: 300000 (5 minutes, default is 30s)

Reboot VM after registry changes.

---

Quick Registry Commands

Run as Administrator:

reg add "HKLM\SYSTEM\CurrentControlSet\Control" /v WaitToKillServiceTimeout /t REG_SZ /d 120000 /f
reg add "HKLM\SYSTEM\CurrentControlSet\Control" /v ServicesPipeTimeout /t REG_DWORD /d 300000 /f

---

Linux Equivalent (systemd)

For Linux VMs with similar issues, edit service unit:

systemctl edit <service-name>

Add:

[Service]
TimeoutStopSec=300
TimeoutStartSec=300

Then: systemctl daemon-reload

---

⚠️ Important Notes

- Test in non-production first
- Values can be adjusted based on your service needs (monitor Event Viewer Event IDs: 7000, 7001, 7009)
- This only extends maximum wait time - clean shutdowns still complete normally
- Alternative: Use Snapshot mode with QEMU Guest Agent if storage supports it

---

Tested Successfully On

- SQL Server 2016/2019/2022 (Windows Server 2019/2022)
- Wazuh Indexer 4.x (Amazon Linux 2023)
- Elasticsearch 7.x/8.x (Ubuntu 22.04)
- MongoDB 6.x (Windows Server 2022)

---

Credits

Problem Detection & Analysis:
- Antigravity (Claude) - AI assistant for troubleshooting and pattern recognition
- Zabbix - Monitoring tool that helped identify the timeout patterns and service failure cascade

Solution Development:
Community-driven troubleshooting combining Proxmox VM management, Windows OS internals, and database service behavior analysis.

---

Why This Works

Before:
PBS stops VM → Proxmox waits 60s → kills VM → SQL corrupts files
→ VM restarts → SQL recovery → Windows waits 30s → kills SQL → FAIL

After:
PBS stops VM → Proxmox waits 300s → SQL closes cleanly
→ VM restarts → SQL starts (or recovers with 300s available) → SUCCESS

---

Related Threads

This solution addresses incomplete fixes in:
- Thread #50723 - SQL Server backup timeout
- Thread #139077 - Windows VM won't start after backup
- Thread #135547 - Stop mode backup fails
- Multiple "timeout waiting on systemd" threads

Key difference: Previous threads only addressed Proxmox timeout. This fixes the entire chain including Windows internal limits.

---

Community Contribution

If this worked for you, please share:
- Your service type and version
- OS version
- Proxmox version
- Timeout values you used

This helps others gauge applicability to their setup.

---

Note: This is a root cause fix, not a workaround. It addresses the actual timing mismatch between Proxmox, Windows, and database services during Stop Mode operations.

Cartagena.co, February 20 2026

Tags: backup, windows, stop-mode, timeout