Backup missconfiguration?

Daniel-San · Aug 2, 2025

Hello guys,

I have a problem which I cannot explain technically by myself.
I have set up a Proxmox VE single Host with two running containers and an additional PBS as VM on other hardware where the Hypervisor should store the backups of both containers every night.

One container has a 20 GB rootfs and additionally a 20 GB mount point.
As I am using a ZFS RAID 1 Proxmox VE setup with 1 TB space, both rootfs and mount point have been created each as a subvolume on the local ZFS pool for virtual machines and LXC’s.

The other container has a 15 GB rootfs only.

Now here is the thing:
The backup duration of my container with the additional mount point is approximately 2 hours and nearly 3 TB will be crawled (remind that I have only 1 TB of physical disk space overall) in every running backup job.
The duration and crawled data will not reduce after several days as the dedupe functionality of PBS would imply.

The other container is finishing within two minutes after some days and is uploading approximately 65 Megabytes to the PBS datastore every night. This is an expected growing for me because both LXC’s are currently not in production use but system logs will be written.

Both LXC’s are relying on the Debian 12 template from the Proxmox library.

I have attached the web hook notification of the last backup job.

Do you guys have some hints for me where I can start with the troubleshooting?
It is absolutely weird that a LXC of max. 40 GB in size overall would take such a long time for backing up and nearly 3 TB of data will be crawled for it.

Thank you and best,
Daniel

Impact · Aug 2, 2025

Give this a try: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_ct_change_detection_mode
And this a read: https://pbs.proxmox.com/docs/backup-client.html#change-detection-mode
If you want more information please share your config(s) via pct config CTIDHERE and cat /etc/pve/storage.cfg and cat /etc/pve/jobs.cfg.

Daniel-San · Aug 3, 2025

Impact said:
Give this a try: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_ct_change_detection_mode
And this a read: https://pbs.proxmox.com/docs/backup-client.html#change-detection-mode
If you want more information please share your config(s) via pct config CTIDHERE and cat /etc/pve/storage.cfg and cat /etc/pve/jobs.cfg.

Thank you for the information.

Here is the requested output:

Bash:

arch: amd64
cores: 2
description:
features: keyctl=1,nesting=1
hostname: <$hostname>
memory: 2048
mp0: local-zfs:subvol-101-disk-1,mp=/var/piler,size=20G,backup=1
net0: name=eth0,bridge=lxcnet0,firewall=1,hwaddr=<$mac-address>,ip=dhcp,ip6=dhcp,type=veth
onboot: 1
ostype: debian
rootfs: local-zfs:subvol-101-disk-0,size=20G
swap: 1024
tags: debian;mailpiler
unprivileged: 1

And here:

Bash:

dir: local
        path /var/lib/vz
        content vztmpl,iso,backup

zfspool: local-zfs
        pool rpool/data
        content rootdir,images
        sparse 1

pbs: PBS
        datastore <$storage-reponame>
        server <$pbshostname>
        content backup
        prune-backups keep-all=1
        username <$username>@pbs

I have read the docs. I will try to configure the change detection mode to “metadata”.

Best
Daniel

Daniel-San · Aug 4, 2025

A quick update:
Changing the detection mode in PVE backup job configuration and recreating the datastore on PBS does not have any effect.
Still almost 3 TB of data will be crawled and the job is taking almost 2 hours.

Tried of doing a full clone of problematic LXC on PVE host for testing purposes.
For creating a full clone 2.67 TB of data will be crawled from the origin LXC, too.

Could it be that not the PBS is the problem but the LXC is? That the LXC got somehow broken regarding allocated space?

Impact · Aug 4, 2025

It will not take effect on the first backup and re-creating the datastore wasn't needed. What does the job look like now? I recommend using metadata.
Can you share the CT config for the 3TB disk one?

Daniel-San said:
full clone

But this is about PBS backups.

Daniel-San · Aug 4, 2025

Impact said:
It will not take effect on the first backup and re-creating the datastore wasn't needed. What does the job look like now? I recommend using metadata.
Can you share the CT config for the 3TB disk one?

But this is about PBS backups.

Backup job is changed to metadata over PVE backup job configuration in advanced settings.

CT config is still the same like the above pasted one two threads before (first code block).

My thinking is as follows:
A full clone of my working container in the backup job will crawl 15 GB of data only (rootfs 15 GB) in clone job summary which is expected.
The problematic container has assigned 20 GB rootfs and 20 GB mount point - backup job and full clone are crawling 2.67 TB of data for their jobs in the logs.
Container config for the problematic one is showing 40 GB of aligned space in sum (rootfs and mount point).
So my theory is having a problem on space allocation - maybe ZFS subvolumes and not having a problem with the backup job.
I don’t know how, why and how to check.

Best,
Daniel

Impact · Aug 4, 2025

Please share the task logs.

Daniel-San · Aug 4, 2025

Impact said:
Please share the task logs.

For which job? Backup task log?

Impact · Aug 4, 2025

Yes and please don't quote every message you're replying to.

Daniel-San · Aug 4, 2025

Ok, will do. The size of the log file is about 32 megabytes. Should I crop anything specific?

Impact · Aug 4, 2025

Trimming logs is always hard unless you know exactly what you want to look for and I kind of want to see all. Just .zip it and it should compress well.

Daniel-San · Aug 4, 2025

Here we are. Thank you for analyzing.

Impact · Aug 4, 2025

Now it makes sense why the log is so large. I'm sorry I meant the PVE side task log of the backup job. I'm curious why it would crawl 3T~ of data when your CT only has 40G~ in total.

Daniel-San · Aug 4, 2025

Second try - again, thank you for analyzing

Impact · Aug 4, 2025

Yeah I see what you mean now. Sorry it took me this long to understand. To be honest I'm a bit puzzled where these 2.7T~ come from too.
This is mostly for my curiosity but would you mind sharing this from the PVE side?

Bash:

zfs list -rt all -o name,used,avail,refer,mountpoint,refquota,refreservation,logicalused,compressratio | grep -E "NAME|101"

I saw a similar issue here recently: https://forum.proxmox.com/threads/abnormally-large-processed-data-on-small-lxc.163340/
Make sure to follow the links there and in the linked issue too.

Daniel-San · Aug 4, 2025

Here is the output:

Bash:

NAME                           USED  AVAIL  REFER  MOUNTPOINT                     REFQUOTA  REFRESERV  LUSED  RATIO
rpool/data/subvol-101-disk-0  1.01G  19.0G  1.01G  /rpool/data/subvol-101-disk-0       20G       none  2.00G  2.12x
rpool/data/subvol-101-disk-1  4.46M  20.0G  4.46M  /rpool/data/subvol-101-disk-1       20G       none  5.75M  1.93x

Daniel-San · Aug 4, 2025

I was already thinking about recreating the LXC like @veehexx has doing it in his post.
That will take some time because lots of system customizations have been done around the piler application installation to get it running with current versions for the package dependencies.

Best,
Daniel

Backup missconfiguration?

Member

Attachments

Renowned Member

Member

Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Member

Attachments

Renowned Member

Member

Attachments

Renowned Member

Member

Member

We value your privacy