Hello,
I have a 6 nodes Proxmox cluster running version 7.2-4.
Each of this node has a daily backup task :
- node-1 at 2:00am
- node-2 at 2:30am
...
- node-6 at 4:30am
Target storage in a NFS share with about 2TB free space.
I always get 2 to 3 of the backup tasks stuck with last log line : INFO: include disk 'scsi0' 'pve-shared:502/vm-502-disk-0.qcow2' 32G
Here is a full log from one of these failing backup tasks :
As you can see it succeeded to backup a first VM and then got stuck on the second (over a total of 3).
In the morning what I do to get everything back is rebooting each failing node.
Do someone have an idea what is going on ?
Is there a way to get DEBUG log level for backup tasks ?
Thank you for reading me !
I have a 6 nodes Proxmox cluster running version 7.2-4.
Each of this node has a daily backup task :
- node-1 at 2:00am
- node-2 at 2:30am
...
- node-6 at 4:30am
Target storage in a NFS share with about 2TB free space.
I always get 2 to 3 of the backup tasks stuck with last log line : INFO: include disk 'scsi0' 'pve-shared:502/vm-502-disk-0.qcow2' 32G
Here is a full log from one of these failing backup tasks :
Code:
INFO: starting new backup job: vzdump --node pve-a84d07 --mode snapshot --prune-backups 'keep-last=7' --mailnotification failure --all 1 --compress zstd --quiet 1 --notes-template '{{guestname}}' --storage pve-shared
INFO: Starting Backup of VM 203 (qemu)
INFO: Backup started at 2022-05-27 02:30:03
INFO: status = running
INFO: VM Name: Docker-Host-16E229
INFO: include disk 'scsi0' 'pve-shared:203/vm-203-disk-1.qcow2' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: skip unused drive 'pve-shared:203/vm-203-disk-0.qcow2' (not included into backup)
INFO: creating vzdump archive '/mnt/pve/pve-shared/dump/vzdump-qemu-203-2022_05_27-02_30_03.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'cf8e3543-d959-4756-a400-a57b28cf8c53'
INFO: resuming VM again
INFO: 1% (369.0 MiB of 32.0 GiB) in 3s, read: 123.0 MiB/s, write: 90.2 MiB/s
[...]
INFO: 100% (32.0 GiB of 32.0 GiB) in 5m 39s, read: 95.5 MiB/s, write: 0 B/s
INFO: backup is sparse: 22.48 GiB (70%) total zero data
INFO: transferred 32.00 GiB in 339 seconds (96.7 MiB/s)
INFO: archive file size: 3.45GB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-last=7
INFO: removing backup 'pve-shared:backup/vzdump-qemu-203-2022_05_20-15_32_10.vma.zst'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 203 (00:05:41)
INFO: Backup finished at 2022-05-27 02:35:44
INFO: Starting Backup of VM 502 (qemu)
INFO: Backup started at 2022-05-27 02:35:44
INFO: status = running
INFO: VM Name: K3S-Master-2
INFO: include disk 'scsi0' 'pve-shared:502/vm-502-disk-0.qcow2' 32G
##################################################
##################################################
Was stuck here, asked for a reboot of this node
##################################################
##################################################
interrupted by signal
could not parse qemu-img info command output for '/mnt/pve/pve-shared/images/502/vm-502-disk-0.qcow2' - malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/share/perl5/PVE/Storage/Plugin.pm line 894.
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/pve-shared/dump/vzdump-qemu-502-2022_05_27-02_35_44.vma.zst'
ERROR: got timeout
INFO: aborting backup job
ERROR: VM 502 qmp command 'backup-cancel' failed - interrupted by signal
INFO: resuming VM again
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Connection refused
ERROR: Backup of VM 502 failed - VM 502 qmp command 'cont' failed - unable to connect to VM 502 qmp socket - timeout after 449 retries
INFO: Failed at 2022-05-27 09:43:52
ERROR: Backup job failed - Connection refused
TASK ERROR: Connection refused
As you can see it succeeded to backup a first VM and then got stuck on the second (over a total of 3).
In the morning what I do to get everything back is rebooting each failing node.
Do someone have an idea what is going on ?
Is there a way to get DEBUG log level for backup tasks ?
Thank you for reading me !