Hi everyone!
Created account so I can post that I've faced similar problem, could not figure out whats going on until found this topic. Yesterday I reinstalled/recreated my PBS 3.x to newest 4.x, simple default install, empty data store running on second VM disk formatted as XFS.
Facts:
Dell R640, datacenter ssd, hw raid.
Non subscription proxmox:
PVE 8.X with all the latest updates
PBS 4.x with all the latest updates
Currently the only problematic VM is Win2k22, virtio scsi single with 2 disks, 800GB and 1TB, latest virtio drivers, discard on, iothread on. However I believe it could be any other VM as well, this windows VM is the only one that I have with such big disks.
So, after reinstalled a fresh PBS, on the first run my Win2k22 backup stuck at 34% when zabbix started to panic that my VM is offline.
First what I noticed, the backup was still running, tried to stop the task, it somewhat stopped but VM stayed locked.
Task log:
Code:
....
INFO: 31% (567.1 GiB of 1.8 TiB) in 1h 17m 26s, read: 852.3 MiB/s, write: 79.8 MiB/s
INFO: 32% (583.7 GiB of 1.8 TiB) in 1h 17m 50s, read: 706.3 MiB/s, write: 82.7 MiB/s
INFO: 33% (603.1 GiB of 1.8 TiB) in 1h 19m 8s, read: 255.4 MiB/s, write: 93.9 MiB/s
INFO: 34% (621.4 GiB of 1.8 TiB) in 1h 19m 35s, read: 693.0 MiB/s, write: 87.3 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job
PVE syslog:
Code:
Dec 31 20:06:40 pve-1 pvedaemon[1635155]: VM 108 qmp command failed - VM 108 qmp command 'query-backup' failed - got timeout
Dec 31 20:08:54 pve-1 pvedaemon[1520992]: <root@pam> successful auth for user 'root@pam'
Dec 31 20:09:05 pve-1 pvedaemon[1520991]: VM 108 qmp command failed - VM 108 qmp command 'guest-ping' failed - got timeout
Dec 31 20:09:29 pve-1 pvedaemon[1520991]: VM 108 qmp command failed - VM 108 qmp command 'guest-ping' failed - got timeout
Dec 31 20:09:48 pve-1 pvedaemon[1520993]: VM 108 qmp command failed - VM 108 qmp command 'guest-ping' failed - unable to connect to VM 108 qga socket - timeout after 31 retries
Dec 31 20:10:10 pve-1 pvedaemon[1520991]: VM 108 qmp command failed - VM 108 qmp command 'guest-ping' failed - unable to connect to VM 108 qga socket - timeout after 31 retries
Dec 31 20:10:51 pve-1 pvedaemon[1520991]: VM 108 qmp command failed - VM 108 qmp command 'guest-ping' failed - unable to connect to VM 108 qga socket - timeout after 31 retries
Dec 31 20:11:02 pve-1 pvedaemon[1520992]: <root@pam> starting task UPID:pve-1:001A2EAF:01A98813:69556736:vncproxy:108:root@pam:
Dec 31 20:11:02 pve-1 pvedaemon[1715887]: starting vnc proxy UPID:pve-1:001A2EAF:01A98813:69556736:vncproxy:108:root@pam:
Dec 31 20:11:03 pve-1 pvedaemon[1715890]: starting vnc proxy UPID:pve-1:001A2EB2:01A9889B:69556737:vncproxy:108:root@pam:
Dec 31 20:11:03 pve-1 pvedaemon[1520991]: <root@pam> starting task UPID:pve-1:001A2EB2:01A9889B:69556737:vncproxy:108:root@pam:
Dec 31 20:11:08 pve-1 qm[1715889]: VM 108 qmp command failed - VM 108 qmp command 'set_password' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Dec 31 20:11:08 pve-1 pvedaemon[1715887]: Failed to run vncproxy.
Dec 31 20:11:08 pve-1 pvedaemon[1520992]: <root@pam> end task UPID:pve-1:001A2EAF:01A98813:69556736:vncproxy:108:root@pam: Failed to run vncproxy.
Dec 31 20:11:09 pve-1 qm[1715892]: VM 108 qmp command failed - VM 108 qmp command 'set_password' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Dec 31 20:11:09 pve-1 pvedaemon[1715890]: Failed to run vncproxy.
Dec 31 20:11:09 pve-1 pvedaemon[1520991]: <root@pam> end task UPID:pve-1:001A2EB2:01A9889B:69556737:vncproxy:108:root@pam: Failed to run vncproxy.
Dec 31 20:11:13 pve-1 pvedaemon[1520993]: VM 108 qmp command failed - VM 108 qmp command 'guest-ping' failed - unable to connect to VM 108 qga socket - timeout after 31 retries
Dec 31 20:11:23 pve-1 pveproxy[1641711]: worker exit
Dec 31 20:11:23 pve-1 pveproxy[1488]: worker 1641711 finished
Dec 31 20:11:23 pve-1 pveproxy[1488]: starting 1 worker(s)
Dec 31 20:11:23 pve-1 pveproxy[1488]: worker 1716020 started
Dec 31 20:11:25 pve-1 pvedaemon[1635155]: VM 108 qmp command failed - VM 108 qmp command 'backup-cancel' failed - interrupted by signal
Dec 31 20:11:30 pve-1 pvedaemon[1520992]: <root@pam> end task UPID:pve-1:0018F353:0196BA03:69553712:vzdump::root@pam: unexpected status
At this point no more active tasks were running on PVE, VM was locked but was on/running, only not responsible.
While trying to figure out whats happening, I noticed that on PBS I had still the backup task running (even I stopped it from PVE). Well.. lets give it a try and stop that task as well.. and wolaa.. after few seconds my VM started to respond again and everything seemed fine from there on, nothing even crashed and everything conitued to work from there on, except the downtime/freeze.
Windows VM eventlogs:
Code:
vioscsci: Reset to device, \Device\RaidPort1, was issued.
Kernel PNP: The application \Device\HarddiskVolume3\Program Files\Qemu-ga\qemu-ga.exe with process id 3540 stopped the removal or ejection for the device PCI\VEN_1AF4&DEV_1003&SUBSYS_00031AF4&REV_00\5&2490727a&0&4008F0. Process command line: "C:\Program Files\Qemu-ga\qemu-ga.exe" -d --retry-path
Storahci: Reset to device, \Device\RaidPort0, was issued.
What I tried:
1) chkdsk all disks - everything fine
2) Updated everything I can to the latest (including virtio drivers on Windows)
3) Multiple backup tries ended the same way - with a freeze in different percentage done
In the end I did completley shutdown the VM and the backup was successfull. Currently VM is back on running with incremental backup task (as snapshot) running, will see how that ends.
Edit: Incremental backup via snapshot ended successful. However, only a little bit has changed over the night, so nothing much has actually saved.
Code:
INFO: 98% (1.7 TiB of 1.8 TiB) in 1h 4m 18s, read: 3.0 GiB/s, write: 0 B/s
INFO: 99% (1.8 TiB of 1.8 TiB) in 1h 4m 24s, read: 3.0 GiB/s, write: 0 B/s
INFO: 100% (1.8 TiB of 1.8 TiB) in 1h 4m 31s, read: 2.3 GiB/s, write: 75.4 KiB/s
INFO: backup is sparse: 713.79 GiB (39%) total zero data
INFO: backup was done incrementally, reused 1.78 TiB (99%)
INFO: transferred 1.78 TiB in 3871 seconds (482.5 MiB/s)
INFO: adding notes to backup
INFO: Finished Backup of VM 108 (01:04:38)
INFO: Backup finished at 2026-01-01 11:30:57
INFO: Backup job finished successfully
INFO: skipping disabled target 'mail-to-root'
In my case, I can only repeat the error/bug only on full backup on *large* disks (1.8T total) + while VM is actually running.