Guest agent not running on Windows 11

Yes, since the backup is at 01:00.
I think that the system gets shut down for backup, boots up again and then for some reasons these two services won't start.
The Windows event log should have an entry if they fail to start or crash.

You could always create a Scheduled Task in Windows to start the services at startup. “Sc start servicename”
 
The Windows 11 QEMU guest agent stops because during stop-mode backups, the VM pauses and if Proxmox Backup Server is slower than the guest’s I/O, services like QEMU-GA and SQLWriter can fail to start, causing timeouts. Downgrading Virtio guest tools to 0.1.271 or using snapshot/suspend mode for backups usually fixes it, and the agent may need a manual restart if it doesn’t start automatically.
 
If it happens during OS startup, wouldn't extending the timeout be the solution?

reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control /v ServicesPipeTimeout /t REG_DWORD /d 120000

*We don't recommend using services that can't be started infinitely, as it takes time to discover when problems occur. However, if you're certain the service you're using isn't one of those, changing it shouldn't cause any issues.

If 120 seconds doesn't work, try specifying 360 seconds (360000) and see how long it takes to boot up.
 
Last edited:
do you backup directly over WAN ?
WAN
schedule shutdown guest, then backup offline, then start guest after Backup done using hook script
Which is what basically happens now already. I don't know how this would help the service being able to start.
VM pauses and if Proxmox Backup Server is slower than the guest’s I/O
That is definitely not the case, but the Upload might be a bottleneck.
Downgrading Virtio guest tools to 0.1.271 or using snapshot/suspend mode for backups usually fixes it
First one I am doing already, second one is not really an option, since I don't want to take the inconsistency risk.
If it happens during OS startup, wouldn't extending the timeout be the solution?
Ahh so you think that maybe there is so much IO pressure from backing up the next VM, that it fails to start VM100 in time?

Still all looks like wonky workarounds to me. QEMU should be able to handle even a 5min boot IMHO.
 
again, sorry for my english wording...
Which is what basically happens now already. I don't know how this would help the service being able to start.
Do you get the point of "Stop mode" backup ?
"Stop mode" backup is still "live".

Fleecing option is mandatory to mitigate.
 
Last edited:
Ahh so you think that maybe there is so much IO pressure from backing up the next VM, that it fails to start VM100 in time?

The SCM service sends some kind of request, but since there is no response within the 45-second timeout period, I believe error 7009 is occurring because the service failed to respond within the required 45 seconds.

*Multiple services (SQLWriter and qemu-ga) from different vendors are timing out in 45 seconds, so I believe the cause of the SCM requests not being responded to is not the services themselves, but rather some kind of load or server unresponsiveness.

The 45-second timeout can be extended in the settings. Increasing this value should allow it to remain in the started state.

The fundamental issue of it not starting within 45 seconds requires investigation into the Windows OS and the service itself.

If it's paused, it might be exceeding the timeout period during that pause.
 
Last edited:
The fundamental issue of it not starting within 45 seconds requires investigation into the Windows OS and the service itself.
Gave it a try based on this. Still think that this is just a workaround and even 120s won't help. It even says:
However, we recommend that you research this problem to determine whether it is a symptom of another problem.


Fleecing is mandatory for PBS over WAN.
I don't think so, since my other PVE can backup just fine without fleecing. There is also basically no io pressure during that time, and the VM does not get unresponsive or anything like in your link.
 
Still think that this is just a workaround
If that's an unavoidable problem in your PVE/PBS, then there's no other way around it.

If you're investigating the issue, it's on the Windows OS side, so I don't think this is the place to ask. You need to figure out why the service isn't responding to SCM.
 
Last edited:
it's on the Windows OS side, so I don't think this is the place to ask.
I mean, it is the QEMU guest agent service having the issue, so I would argue it is not a Windows issue, but a QEMU on Windows issue. Just like the high DB load issue with VirtIO is not Windows issue but a QEMU on Windows issue.
You need to figure out why the service isn't responding to SCM.
Sure, that is why I am bringing it up here. My suspicion is the same as with the high DB load VirtIO issue; a driver issue of QEMU.
 
I don't think so, since my other PVE can backup just fine without fleecing. There is also basically no io pressure during that time, and the VM does not get unresponsive or anything like in your link.
After you discover how it works under the hood, you will follow the recommended way : Local PBS then WAN PBS Remote Sync from.
Give a try , I bet on no more QEMU guest agent stopped with a local PBS.