I just debugged an error message that I had received for the last couple of days.
The regular daily backup job on our Promox kept reporting
for one of our VMs, while all other VMs were backed up correctly.
Before this message suddenly appeared (about a week ago), backups had worked correctly for many weeks.
I logged into the pbs dashboard and looked at the log message for the respective backup. It showed another error which was more accurate but did not really help me either:
I then saw that the "Gabage collections" task had also generated errors which read
From then on I at least knew where the problem was located. I have to admit that I neither tried to fix the backup from the third of February, nor did I try to find the root cause for this problem.
Instead I just deleted the incremental backup for this date and tried a manual backup of the VM. This time it worked like a charm.
I think the vzdump backup notifications created by PVE should probably be changed to more accurately reflect the real issue (the previous backup is not missing, but it is somehow "corrupted" or unusable).
The regular daily backup job on our Promox kept reporting
Code:
The regular daily backup job on our Promox kept reporting "Backup of VM 103 failed - VM 103 qmp command 'backup' failed - backup register image failed: command error: no previous backup found, cannot do incremental backup" for one of our VMs, while all other VMs were backed up correctly.
Before this message suddenly appeared (about a week ago), backups had worked correctly for many weeks.
I logged into the pbs dashboard and looked at the log message for the respective backup. It showed another error which was more accurate but did not really help me either:
Code:
2021-02-08T12:53:09+01:00: starting new backup on datastore 'REDACTED': "vm/103/2021-02-08T11:53:10Z"
2021-02-08T12:53:09+01:00: download 'index.json.blob' from previous backup.
2021-02-08T12:53:09+01:00: register chunks in 'drive-scsi0.img.fidx' from previous backup.
2021-02-08T12:53:09+01:00: download 'drive-scsi0.img.fidx' from previous backup.
2021-02-08T12:53:09+01:00: backup ended and finish failed: backup ended but finished flag is not set.
2021-02-08T12:53:09+01:00: removing unfinished backup
2021-02-08T12:53:09+01:00: TASK ERROR: backup ended but finished flag is not set.
I then saw that the "Gabage collections" task had also generated errors which read
Code:
2021-02-08T00:00:00+01:00: starting garbage collection on store REDACTED
2021-02-08T00:00:00+01:00: task triggered by schedule 'daily'
2021-02-08T00:00:00+01:00: Start GC phase1 (mark used chunks)
2021-02-08T00:00:08+01:00: percentage done: phase1 2% (1 of 46 index files)
[...]
2021-02-08T00:00:59+01:00: TASK ERROR: can't open index /PATH/TO/DATASTORE/vm/103/2021-02-03T02:00:02Z/drive-scsi0.img.fidx - Invalid argument (os error 22)
From then on I at least knew where the problem was located. I have to admit that I neither tried to fix the backup from the third of February, nor did I try to find the root cause for this problem.
Instead I just deleted the incremental backup for this date and tried a manual backup of the VM. This time it worked like a charm.
I think the vzdump backup notifications created by PVE should probably be changed to more accurately reflect the real issue (the previous backup is not missing, but it is somehow "corrupted" or unusable).