Backup is sparse warning but why?

Proximate · Dec 11, 2022

I got to work this morning to find that all of the backups failed across 4 nodes.
I read that 'backup is sparse means something like a VM that takes 100GiB of space, contains 50GiB of zeros and is not unusual, maybe related to thin provisioning.
However, looking at the errors, it's not clear why since the NFS share is only 7% used yet I received a message that all backups failed.

What really happened?

Code:

INFO: starting new backup job: vzdump 201 120 121 198 199 1009 --mailnotification always --storage nfs_vmstore --prune-backups 'keep-weekly=1' --notes-template '{{guestname}}' --mailto support@aaa.com --quiet 1 --mode snapshot --compress zstd
INFO: skip external VMs: 120, 121, 198, 201, 1009
ERROR: Backup of VM 199 failed - unable to find VM '199'
INFO: Failed at 2022-12-11 01:00:10
INFO: Backup job finished with errors
TASK ERROR: job errors

-----------------------
This was a longer list of each task. I'm showing only the last one for brevity.


Header
Proxmox
Virtual Environment 7.2-7
Datacenter
Status

Cluster: proclust, Quorate: Yes
Nodes

 Online
4

 Offline
0
Virtual Machines
 Running
16
 Stopped
2
  
LXC Container
 Running
0
 Stopped
0
CPU
of 128 CPU(s)
Memory
219.94 GiB of 503.41 GiB
Storage
605.41 GiB of 13.80 TiB
No Subscription


You have at least one node without subscription.
Logs
()
INFO: starting new backup job: vzdump 201 120 121 198 199 1009 --prune-backups 'keep-weekly=1' --compress zstd --notes-template '{{guestname}}' --mailto support@aaa.com --mode snapshot --mailnotification always --quiet 1 --storage nfs_vmstore
INFO: skip external VMs: 120, 121
**SNIP**
INFO: Starting Backup of VM 1009 (qemu)
INFO: Backup started at 2022-12-11 01:17:50
INFO: status = running
INFO: VM Name: c8-devzub
INFO: include disk 'scsi0' 'local-zfs:vm-1009-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/nfs_vmstore/dump/vzdump-qemu-1009-2022_12_11-01_17_50.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'c0a4d2e5-828b-4672-b119-fb92e4ecd605'
INFO: resuming VM again
INFO:   2% (899.0 MiB of 32.0 GiB) in 3s, read: 299.7 MiB/s, write: 60.8 MiB/s
**SNIP**
INFO: 100% (32.0 GiB of 32.0 GiB) in 5m 20s, read: 617.9 MiB/s, write: 37.6 MiB/s
INFO: backup is sparse: 10.26 GiB (32%) total zero data
INFO: transferred 32.00 GiB in 320 seconds (102.4 MiB/s)
INFO: archive file size: 14.72GB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-weekly=1
INFO: removing backup 'nfs_vmstore:backup/vzdump-qemu-1009-2022_12_04-01_23_59.vma.zst'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 1009 (00:05:23)
INFO: Backup finished at 2022-12-11 01:23:13
INFO: Backup job finished with errors
TASK ERROR: job errors

fiona · Dec 12, 2022

Hi,

Proximate said:
I got to work this morning to find that all of the backups failed across 4 nodes.
I read that 'backup is sparse means something like a VM that takes 100GiB of space, contains 50GiB of zeros and is not unusual, maybe related to thin provisioning.

Yes, but it's not related to thin provisioning. When reading the VM image, sections of all zeroes are treated differently to save space in the backup. You just need to remember how long the section of zeroes is, not all the zeroes individually

Proximate said:
However, looking at the errors, it's not clear why since the NFS share is only 7% used yet I received a message that all backups failed.

The message only says that the Backup job finished with errors, i.e. the job as a whole had errors. It does not mean that all backups failed.

Proximate said:

Code:

INFO: starting new backup job: vzdump 201 120 121 198 199 1009 --mailnotification always --storage nfs_vmstore --prune-backups 'keep-weekly=1' --notes-template '{{guestname}}' --mailto support@aaa.com --quiet 1 --mode snapshot --compress zstd
INFO: skip external VMs: 120, 121, 198, 201, 1009
ERROR: Backup of VM 199 failed - unable to find VM '199'
INFO: Failed at 2022-12-11 01:00:10
INFO: Backup job finished with errors
TASK ERROR: job errors

The error is here, prefixed by ERROR. Does VM 199 actually exist? Otherwise you should adapt the job to not include it

The backup is sparse message is just an INFO as indicated, not an error.

Proximate · Dec 12, 2022

Hi, thanks for your reply.

The thing is, I don't have those vms listed above to be backed up and they aren't in the list of backups either.
It's as if they are 'stuck' somewhere in the config or something.

This morning, I see something odd with the errors.
The backups, across the four nodes, all complain about one vm called 'VM 150'.

That was a test vm called VM 150 but it was on node 2 only yet the backups complains about across all four nodes.
Again, there was only one vm called VM 150 and it only resided on node 2 and it's long gone.

BTW, I do see the other vms got backed up.

>You just need to remember how long the section of zeroes is, not all the zeroes individually

I'm not sure what thismeans, that I have to remember how long the section of zeroes is?

fiona · Dec 13, 2022

Proximate said:
The thing is, I don't have those vms listed above to be backed up and they aren't in the list of backups either.
It's as if they are 'stuck' somewhere in the config or something.

You can check /etc/pve/jobs.cfg and /etc/pve/vzdump.cron.

Proximate said:
That was a test vm called VM 150 but it was on node 2 only yet the backups complains about across all four nodes.
Again, there was only one vm called VM 150 and it only resided on node 2 and it's long gone.

With the current implementation, the vzdump invocation on each node is independent, i.e. each one backs up the guests that are on it, ignores guests that are in the cluster and complains about missing guests.

Proximate said:
>You just need to remember how long the section of zeroes is, not all the zeroes individually

I'm not sure what thismeans, that I have to remember how long the section of zeroes is?

Sorry if that wasn't clear. I didn't mean "you" literally, but the "you who's following along/thinking through the backup algorithm". You can also just replace "you" with "the backup algorithm" directly.

Proximate · Dec 13, 2022

I understood you didn't mean that literally but I still needed to better understand your comment

.
Anyhow, somehow, it's fixed itself and now backups are working again across the nodes.
Could it be something that was 'stuck' for a couple of cycles or something?

fiona · Dec 13, 2022

So there are no references to the old IDs in the configuration files? Can you check if there is only a single instance of pvescheduler running? I'd also recommend upgrading to the latest version.

Proximate · Dec 13, 2022

Looking at /etc/pve/jobs.cfg on the first node, I see the list of daily and weekly.
I can see at least one vm that is no longer even in the vm inventory, running or non running.
I see only one instance of pvescheduler running.

As for upgrading, that was a nightmare the last time and I'm nervous of doing that. I don't recall but something got out of sync and I had to rebuild every single node. That took me several days as I needed to migrate all vms to one node, re-install on each, etc.

I doubt it was a problem with upgrading but something that just happened to go wrong at the same time I started doing the upgrade.

fiona · Dec 13, 2022

Proximate said:
Looking at /etc/pve/jobs.cfg on the first node, I see the list of daily and weekly.
I can see at least one vm that is no longer even in the vm inventory, running or non running.

I guess you might want to remove the old ID from the configuration then

Proximate said:
As for upgrading, that was a nightmare the last time and I'm nervous of doing that. I don't recall but something got out of sync and I had to rebuild every single node. That took me several days as I needed to migrate all vms to one node, re-install on each, etc.

That is unfortunate :/ Non-major upgrades are usually smooth and we try our best to avoid issues with major upgrades too.

Proximate said:
I doubt it was a problem with upgrading but something that just happened to go wrong at the same time I started doing the upgrade.

I meant upgrading because there were some improvements to the scheduler and a bug was fixed to remove the old IDs from backup configs when a guest is purged.

Proximate · Dec 13, 2022

>I guess you might want to remove the old ID from the configuration then

I wasn't aware that this had to be done manually due to a bug. I've not looked at the other nodes but I assume all will have the same entries.

>That is unfortunate :/ Non-major upgrades are usually smooth and we try our best to avoid issues with major upgrades too.

I'll give it another try, just very nervous to do it.

>bug was fixed to remove the old IDs from backup configs when a guest is purged.

Ah, so what I've seen is known about then. Good to know.

Search

Search

Backup is sparse warning but why?

Proximate

Member

fiona

Proxmox Staff Member

Proximate

Member

fiona

Proxmox Staff Member

Proximate

Member

fiona

Proxmox Staff Member

Proximate

Member

fiona

Proxmox Staff Member

Proximate

Member

We value your privacy