Backup is sparse warning but why?

Proximate

Member
Feb 13, 2022
219
12
23
64
I got to work this morning to find that all of the backups failed across 4 nodes.
I read that 'backup is sparse means something like a VM that takes 100GiB of space, contains 50GiB of zeros and is not unusual, maybe related to thin provisioning.
However, looking at the errors, it's not clear why since the NFS share is only 7% used yet I received a message that all backups failed.

What really happened?

Code:
INFO: starting new backup job: vzdump 201 120 121 198 199 1009 --mailnotification always --storage nfs_vmstore --prune-backups 'keep-weekly=1' --notes-template '{{guestname}}' --mailto support@aaa.com --quiet 1 --mode snapshot --compress zstd
INFO: skip external VMs: 120, 121, 198, 201, 1009
ERROR: Backup of VM 199 failed - unable to find VM '199'
INFO: Failed at 2022-12-11 01:00:10
INFO: Backup job finished with errors
TASK ERROR: job errors

-----------------------
This was a longer list of each task. I'm showing only the last one for brevity.


Header
Proxmox
Virtual Environment 7.2-7
Datacenter
Status

Cluster: proclust, Quorate: Yes
Nodes

 Online
4

 Offline
0
Virtual Machines
 Running
16
 Stopped
2
  
LXC Container
 Running
0
 Stopped
0
CPU
of 128 CPU(s)
Memory
219.94 GiB of 503.41 GiB
Storage
605.41 GiB of 13.80 TiB
No Subscription


You have at least one node without subscription.
Logs
()
INFO: starting new backup job: vzdump 201 120 121 198 199 1009 --prune-backups 'keep-weekly=1' --compress zstd --notes-template '{{guestname}}' --mailto support@aaa.com --mode snapshot --mailnotification always --quiet 1 --storage nfs_vmstore
INFO: skip external VMs: 120, 121
**SNIP**
INFO: Starting Backup of VM 1009 (qemu)
INFO: Backup started at 2022-12-11 01:17:50
INFO: status = running
INFO: VM Name: c8-devzub
INFO: include disk 'scsi0' 'local-zfs:vm-1009-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/nfs_vmstore/dump/vzdump-qemu-1009-2022_12_11-01_17_50.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'c0a4d2e5-828b-4672-b119-fb92e4ecd605'
INFO: resuming VM again
INFO:   2% (899.0 MiB of 32.0 GiB) in 3s, read: 299.7 MiB/s, write: 60.8 MiB/s
**SNIP**
INFO: 100% (32.0 GiB of 32.0 GiB) in 5m 20s, read: 617.9 MiB/s, write: 37.6 MiB/s
INFO: backup is sparse: 10.26 GiB (32%) total zero data
INFO: transferred 32.00 GiB in 320 seconds (102.4 MiB/s)
INFO: archive file size: 14.72GB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-weekly=1
INFO: removing backup 'nfs_vmstore:backup/vzdump-qemu-1009-2022_12_04-01_23_59.vma.zst'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 1009 (00:05:23)
INFO: Backup finished at 2022-12-11 01:23:13
INFO: Backup job finished with errors
TASK ERROR: job errors
 
Last edited:
Hi,
I got to work this morning to find that all of the backups failed across 4 nodes.
I read that 'backup is sparse means something like a VM that takes 100GiB of space, contains 50GiB of zeros and is not unusual, maybe related to thin provisioning.
Yes, but it's not related to thin provisioning. When reading the VM image, sections of all zeroes are treated differently to save space in the backup. You just need to remember how long the section of zeroes is, not all the zeroes individually :)

However, looking at the errors, it's not clear why since the NFS share is only 7% used yet I received a message that all backups failed.
The message only says that the Backup job finished with errors, i.e. the job as a whole had errors. It does not mean that all backups failed.

Code:
INFO: starting new backup job: vzdump 201 120 121 198 199 1009 --mailnotification always --storage nfs_vmstore --prune-backups 'keep-weekly=1' --notes-template '{{guestname}}' --mailto support@aaa.com --quiet 1 --mode snapshot --compress zstd
INFO: skip external VMs: 120, 121, 198, 201, 1009
ERROR: Backup of VM 199 failed - unable to find VM '199'
INFO: Failed at 2022-12-11 01:00:10
INFO: Backup job finished with errors
TASK ERROR: job errors
The error is here, prefixed by ERROR. Does VM 199 actually exist? Otherwise you should adapt the job to not include it ;)

The backup is sparse message is just an INFO as indicated, not an error.
 
Hi, thanks for your reply.

The thing is, I don't have those vms listed above to be backed up and they aren't in the list of backups either.
It's as if they are 'stuck' somewhere in the config or something.

This morning, I see something odd with the errors.
The backups, across the four nodes, all complain about one vm called 'VM 150'.

That was a test vm called VM 150 but it was on node 2 only yet the backups complains about across all four nodes.
Again, there was only one vm called VM 150 and it only resided on node 2 and it's long gone.

BTW, I do see the other vms got backed up.

>You just need to remember how long the section of zeroes is, not all the zeroes individually :)

I'm not sure what thismeans, that I have to remember how long the section of zeroes is?
 
The thing is, I don't have those vms listed above to be backed up and they aren't in the list of backups either.
It's as if they are 'stuck' somewhere in the config or something.
You can check /etc/pve/jobs.cfg and /etc/pve/vzdump.cron.
That was a test vm called VM 150 but it was on node 2 only yet the backups complains about across all four nodes.
Again, there was only one vm called VM 150 and it only resided on node 2 and it's long gone.
With the current implementation, the vzdump invocation on each node is independent, i.e. each one backs up the guests that are on it, ignores guests that are in the cluster and complains about missing guests.

>You just need to remember how long the section of zeroes is, not all the zeroes individually :)

I'm not sure what thismeans, that I have to remember how long the section of zeroes is?
Sorry if that wasn't clear. I didn't mean "you" literally, but the "you who's following along/thinking through the backup algorithm". You can also just replace "you" with "the backup algorithm" directly.
 
  • Like
Reactions: Thorvi
I understood you didn't mean that literally but I still needed to better understand your comment :).
Anyhow, somehow, it's fixed itself and now backups are working again across the nodes.
Could it be something that was 'stuck' for a couple of cycles or something?
 
So there are no references to the old IDs in the configuration files? Can you check if there is only a single instance of pvescheduler running? I'd also recommend upgrading to the latest version.
 
Looking at /etc/pve/jobs.cfg on the first node, I see the list of daily and weekly.
I can see at least one vm that is no longer even in the vm inventory, running or non running.
I see only one instance of pvescheduler running.

As for upgrading, that was a nightmare the last time and I'm nervous of doing that. I don't recall but something got out of sync and I had to rebuild every single node. That took me several days as I needed to migrate all vms to one node, re-install on each, etc.

I doubt it was a problem with upgrading but something that just happened to go wrong at the same time I started doing the upgrade.
 
Looking at /etc/pve/jobs.cfg on the first node, I see the list of daily and weekly.
I can see at least one vm that is no longer even in the vm inventory, running or non running.
I guess you might want to remove the old ID from the configuration then ;)

As for upgrading, that was a nightmare the last time and I'm nervous of doing that. I don't recall but something got out of sync and I had to rebuild every single node. That took me several days as I needed to migrate all vms to one node, re-install on each, etc.
That is unfortunate :/ Non-major upgrades are usually smooth and we try our best to avoid issues with major upgrades too.

I doubt it was a problem with upgrading but something that just happened to go wrong at the same time I started doing the upgrade.
I meant upgrading because there were some improvements to the scheduler and a bug was fixed to remove the old IDs from backup configs when a guest is purged.
 
>I guess you might want to remove the old ID from the configuration then ;)

I wasn't aware that this had to be done manually due to a bug. I've not looked at the other nodes but I assume all will have the same entries.

>That is unfortunate :/ Non-major upgrades are usually smooth and we try our best to avoid issues with major upgrades too.

I'll give it another try, just very nervous to do it.

>bug was fixed to remove the old IDs from backup configs when a guest is purged.

Ah, so what I've seen is known about then. Good to know.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!