Misleading error "no previous backup found" when in reality there is an issue with the last backup state

audioPhil · Feb 8, 2021

I just debugged an error message that I had received for the last couple of days.
The regular daily backup job on our Promox kept reporting

Code:

The regular daily backup job on our Promox kept reporting "Backup of VM 103 failed - VM 103 qmp command 'backup' failed - backup register image failed: command error: no previous backup found, cannot do incremental backup" for one of our VMs, while all other VMs were backed up correctly.

for one of our VMs, while all other VMs were backed up correctly.
Before this message suddenly appeared (about a week ago), backups had worked correctly for many weeks.

I logged into the pbs dashboard and looked at the log message for the respective backup. It showed another error which was more accurate but did not really help me either:

Code:

2021-02-08T12:53:09+01:00: starting new backup on datastore 'REDACTED': "vm/103/2021-02-08T11:53:10Z"
2021-02-08T12:53:09+01:00: download 'index.json.blob' from previous backup.
2021-02-08T12:53:09+01:00: register chunks in 'drive-scsi0.img.fidx' from previous backup.
2021-02-08T12:53:09+01:00: download 'drive-scsi0.img.fidx' from previous backup.
2021-02-08T12:53:09+01:00: backup ended and finish failed: backup ended but finished flag is not set.
2021-02-08T12:53:09+01:00: removing unfinished backup
2021-02-08T12:53:09+01:00: TASK ERROR: backup ended but finished flag is not set.

I then saw that the "Gabage collections" task had also generated errors which read

Code:

2021-02-08T00:00:00+01:00: starting garbage collection on store REDACTED
2021-02-08T00:00:00+01:00: task triggered by schedule 'daily'
2021-02-08T00:00:00+01:00: Start GC phase1 (mark used chunks)
2021-02-08T00:00:08+01:00: percentage done: phase1 2% (1 of 46 index files)
[...]
2021-02-08T00:00:59+01:00: TASK ERROR: can't open index /PATH/TO/DATASTORE/vm/103/2021-02-03T02:00:02Z/drive-scsi0.img.fidx - Invalid argument (os error 22)

From then on I at least knew where the problem was located. I have to admit that I neither tried to fix the backup from the third of February, nor did I try to find the root cause for this problem.
Instead I just deleted the incremental backup for this date and tried a manual backup of the VM. This time it worked like a charm.

I think the vzdump backup notifications created by PVE should probably be changed to more accurately reflect the real issue (the previous backup is not missing, but it is somehow "corrupted" or unusable).

audioPhil · Feb 12, 2021

The above-described issue just happened again for me. This time, vmid 104 was affected (the second vmid when ordering them numerically), while last time vmid 103 (the first one) had the issue.
However, as the sample size is still quite low, this could also be a coincidence.

Does anyone have an idea what the root cause could be? I just started a verification run for all snapshots to identify potential further issues.

rada · Feb 12, 2021

Hi! We are receiving the same message in a few backups in one of our datastores:

Code:

2021-02-12T02:21:26-03:00: starting new backup on datastore '[removed]': "vm/723/2021-02-12T05:21:25Z"
2021-02-12T02:21:26-03:00: download 'index.json.blob' from previous backup.
2021-02-12T02:21:26-03:00: GET /previous: 400 Bad Request: Unable to open fixed index "/backup/[removed]/vm/723/2021-02-10T05:04:17Z/drive-scsi0.img.fidx" - got unknown magic number
2021-02-12T02:21:26-03:00: backup ended and finish failed: backup ended but finished flag is not set.
2021-02-12T02:21:26-03:00: removing unfinished backup
2021-02-12T02:21:26-03:00: TASK ERROR: backup ended but finished flag is not set.

We will also run verifications for snapshots, but we are not aware about the root cause.

Thank you!

fabian · Feb 12, 2021

on what kind of storage/file systems are your datastores? any special mount options?

the index files are written to a temporary file and then atomically renamed into place..

fabian · Feb 12, 2021

rada said:

Hi! We are receiving the same message in a few backups in one of our datastores:

Code:

2021-02-12T02:21:26-03:00: starting new backup on datastore '[removed]': "vm/723/2021-02-12T05:21:25Z"
2021-02-12T02:21:26-03:00: download 'index.json.blob' from previous backup.
2021-02-12T02:21:26-03:00: GET /previous: 400 Bad Request: Unable to open fixed index "/backup/[removed]/vm/723/2021-02-10T05:04:17Z/drive-scsi0.img.fidx" - got unknown magic number
2021-02-12T02:21:26-03:00: backup ended and finish failed: backup ended but finished flag is not set.
2021-02-12T02:21:26-03:00: removing unfinished backup
2021-02-12T02:21:26-03:00: TASK ERROR: backup ended but finished flag is not set.

We will also run verifications for snapshots, but we are not aware about the root cause.

Thank you!

could you also post the client and server-side task logs for that snapshot with the unknown magic number?

audioPhil · Feb 12, 2021

fabian said:
on what kind of storage/file systems are your datastores? any special mount options?

the index files are written to a temporary file and then atomically renamed into place..

For testing I mounted our existing file server via a cifs mount in /etc/fstab and used this directory as datastore in the pbs GUI.
I already read on this forum that remote storages via SMB/NFS are not really recommended, but it worked quite good until just recently.
In /etc/fstab the mount options are credentials=/root/.smbcreds,iocharset=utf8,vers=3.0,noperm,uid=34,gid=34 and most of the time, this seems to work just fine.

However, I just checked the syslog and there I found, right next to the backup messages of "proxmox-backup-proxy", the following line:

Code:

Feb 11 03:00:08 pbs kernel: [223944.175412] CIFS VFS: No writable handle in writepages rc=-22

Interestingly, pbs seems to think that everything went fine and reports a successful backup right after this message.

So I guess for me the issue is having an unsupported storage system mounted via SMB/CIFS. The recommendation probably is to use local storage?

Update: I just checked the fileserver and could identify the root cause: Apparently, a recent upgrade of our TrueNAS installation introduced a bug (https://jira.ixsystems.com/browse/NAS-108664) which, for whatever reasons, has until now only made itself visible during the update process of pbs.

So, while this is not an issue in pbs (at least for me, still need more details on @rada 's issue) my inital remark is still valid I think: The error message shown by PVE is misleading.

rada · Feb 18, 2021

In our case, we had an issue with our PSU (hardware) in PBS. We solved it and had to backup again some containers, now issues are gone.

Search

Search

Misleading error "no previous backup found" when in reality there is an issue with the last backup state

audioPhil

New Member

audioPhil

New Member

rada

Member

fabian

Proxmox Staff Member

fabian

Proxmox Staff Member

audioPhil

New Member

rada

Member