Exchange Server 2019 VSS Writer: retryable error

kevdpate

Member
Sep 7, 2022
1
0
6
I am backing up my Exchange Server 2019 daily with PBS 4.1.4. However, the Exchange log files never get truncated and continue to grow. After some investigation, I discovered that the VSS writer was failing with:

Writer Name: 'Microsoft Exchange Writer'
Writer ID: {..............}
Writer Instance ID: {.............}
State: [1] Stable
Last error: Retryable error

Event ID 2034-
"The Microsoft Exchange Replication service VSS Writer failed with error FFFFFFFC when processing the backup completion event."

I have tried increasing the timeout of the VSS writer on Exchange and updating Exchange to a later update. Every time I restart the Exchange Replication Service, the writer goes back to a normal state, but as soon as I start a backup job from PVE (to the PBS), the writer immediately goes back into error mode.
I have qemu-guest-agent version 0.1.271 installed on the Exchange VM. Any ideas?
 
I ran into the exact same "Microsoft Exchange Replication service VSS Writer failed with error FFFFFFFC" + retryable error on Exchange 2019 with PBS 4.1.4. The logs never truncate because the backup completion event never reaches the Exchange writer properly — this is a very common gotcha with qemu-guest-agent VSS on Exchange.

Quick checks & fixes that often resolve it:

Are you in a DAG?
This error is classic when backing up a passive copy (Microsoft doc: VSS_E_WRITERERROR_RETRYABLE). Apply this registry tweak on all DAG members:
HKLM\SOFTWARE\Microsoft\ExchangeServer\v15\Replay\Parameters → DWORD QueryLogRangeTimeoutInMsec = 120000 (120 seconds)
Restart the Microsoft Exchange Replication service.

Also, set KeepAliveTime under TCP/IP parameters to 900000+ ms if you have any network latency/firewalls between DAG nodes.

Standalone or still failing? Run vssadmin list writers right before and right after a test backup.

Stop both services (in this order):

Code:
net stop MSExchangeIS
net stop MSExchangeRepl

Then start them again. Just restarting Replication isn't enough — the Information Store writer also needs to be clean.
Reboot the VM once, then immediately test a backup (sometimes the QEMU VSS provider leaves the Exchange writers in a weird state).
Make sure you're on the latest qemu-guest-agent (0.1.271 is a bit old — newer builds handle the Freeze/Thaw timing better).

If the writer still goes retryable the moment the PBS job starts
This is, unfortunately, a known limitation: Proxmox + qemu-guest-agent does VSS_FULL, but the Exchange Replication writer often fails the "backup completion" event with FFFFFFFC. PBS is great for VM consistency, but it isn't a true Exchange-aware backup tool, so truncation fails even when the snapshot succeeds.

Reliable workaround until you fix or migrate:
Use a Proxmox hook script (/var/lib/vz/dump-hook-script) or a scheduled task inside the VM that runs a simple DiskShadow script right after the backup finishes. It forces Exchange to treat the backup as complete and truncates the logs safely.

Example script (save as C:\Exchange-Truncate.ps1 and call it post-backup):powershell

diskshadow.exe /s C:\truncate.txt

truncate.txt content:

set context persistent
begin backup
add volume X: # <--- your DB + log volume(s)
create
end backup

Long-term recommendation
These VSS + log-growth issues with on-prem Exchange + PBS are super common and eat up admin time. If you want to get rid of them forever, the cleanest move is migrating to Microsoft 365.

I've been recommending SysTool Exchange to Office 365 Migrator to several Exchange admins in the same boat — it does incremental, zero-downtime mailbox/public-folder migration with full data integrity, supports large environments, and handles all versions (2010–2019) perfectly. No more VSS writers, no more log bloat, and you get all the modern M365 features.

If you ever need to recover a damaged EDB (from log growth or corruption), their SysTool Exchange Recovery Tool is excellent — it repairs corrupted databases and exports to PST/live Exchange/O365 without any data loss.

Both tools have saved multiple orgs from exactly this nightmare.

Let me know if you're in a DAG or standalone — I can give you the exact steps/script tailored to your setup. Happy to help!