LXC Containers Backing Up Incredibly Slow

infinityM · Nov 18, 2020

Hey Guys,

We have a LXC container which is about 2TB large, When backing up this server though it takes about 7+ hours to backup.
While I have several VM's which only take about 20 minutes to backup but is 5-6TB large...

Why are LXC's so slow to backup?

matrix · Nov 18, 2020

to help better, please provide your storage config cat /etc/pve/storage.cfg and your container config

infinityM · Nov 18, 2020

matrix said:
to help better, please provide your storage config cat /etc/pve/storage.cfg and your container config

Hey Matrix,

root@c6:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content vztmpl,iso,backup

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

rbd: Default
content images,rootdir
krbd 0
pool Default

nfs: NAS
export /volume1/Backups
path /mnt/pve/NAS
server 10.161.0.247
content snippets,vztmpl,backup,iso,rootdir,images
maxfiles 3

pbs: backups
datastore backups
server pmb.local
content backup
fingerprint # I removed the fingerprint
maxfiles 99
username root@pam

LXC Config

root@c6:~# cat /etc/pve/lxc/127.conf
arch: amd64
cores: 6
hostname: VPS127
memory: 8192
net0: name=eth0,bridge=vmbr0,firewall=1,gw=156.38.175.34,hwaddr=22:72:12:00:8F:59,ip=156.38.175.58/27,type=veth
onboot: 1
ostype: ubuntu
rootfs: Default:vm-127-disk-0,size=2000G
swap: 8192
unprivileged: 1

matrix · Nov 18, 2020

infinityM said:
rbd: Default
content images,rootdir
krbd 0
pool Default

Other VMs are in the same storage?

infinityM · Nov 18, 2020

matrix said:
Other VMs are in the same storage?

Yes, all VM's and LXC's are on the Ceph Storage

fabian · Nov 18, 2020

because for containers, PBS needs to read and chunk all the data to find out what to upload, for (running) VMs there is a shortcut because we can tell Qemu to keep track of which chunks changed.

infinityM · Nov 18, 2020

fabian said:
because for containers, PBS needs to read and chunk all the data to find out what to upload, for (running) VMs there is a shortcut because we can tell Qemu to keep track of which chunks changed.

Hey Fabian,

Is there any plans to get around this problem?

fabian · Nov 18, 2020

there is no cheap solution for this - the short term would definitely be to move such workloads to VMs where there is this intermediate layer that can keep track of writes from the guest and speedup the backup.

infinityM · Nov 18, 2020

I assume you mean to convert the LXC to a QEMU VM?

fabian · Nov 18, 2020

yes, basically. then you only need to re-read all of the data when
- the VM was down (live-migrating to another node will now transfer the bitmap as well!)
- the encryption key or mode was changed

fabian · Nov 18, 2020

of course, increasing your read/chunk performance might also be an option depending on your current hardware

infinityM · Nov 18, 2020

fabian said:
of course, increasing your read/chunk performance might also be an option depending on your current hardware

How can one check/change this?

fabian · Nov 18, 2020

read: I/O benchmark (more/faster disks, to some degree more RAM)
chunk/hash: proxmox-backup-client benchmark (faster CPU/more cores)

carsten2 · Nov 19, 2020

I opened a feature request https://bugzilla.proxmox.com/show_bug.cgi?id=3138 to speed up container backup, but it gets closed multiple times without discussion and even with wrong claims. The feature request is perfectly valid, because PBS would be the ONLY backup solution in the world, which needs several hours to backup UNCHANGED file systems. I claim that a dramatic improvement IS possible and it is even not too complicated to implement.

tom · Nov 19, 2020

@carsten2 No double posts please!

dorijan79 · Apr 25, 2022

Yea, I have the same issue, and I see nothing changed in 2 years

I have 6TB LXC disk and it take 3 days to back it up...
I am using proxmox backup

tomtom13 · Aug 22, 2022

So I'm in the same boat. Before implementing PBS for production, I deployed it in test cluster. Test LXC (4tb) takes 2 hours to backup, while nothing changed. The strange thing is that durign the 2 hour that takes the backup to finish, there isn't much of a disk activity on PBS or machine with LXC ... CPU is even cold.
Test VMs that are 128BG, 300GB and 512GB take seconds to backup. There is one thing that doesn't make sense, all proxmox staff keeps saying that only VM's that are running can have the diff done quickly, but one machine with VM's only get's up periodically and does the backup right after startup, and this one happens within few seconds. All this, while LXC from machine that runs 24/7 takes forever.

To add insult to the injury, all VM verification takes a second or two for small incremental backups, while LXC takes 7 hours to verify incremental backup (which is few meg might I add). and that does hog disks like crazy for LXC.

fabian · Aug 22, 2022

tomtom13 said:
So I'm in the same boat. Before implementing PBS for production, I deployed it in test cluster. Test LXC (4tb) takes 2 hours to backup, while nothing changed. The strange thing is that durign the 2 hour that takes the backup to finish, there isn't much of a disk activity on PBS or machine with LXC ... CPU is even cold.
Test VMs that are 128BG, 300GB and 512GB take seconds to backup. There is one thing that doesn't make sense, all proxmox staff keeps saying that only VM's that are running can have the diff done quickly, but one machine with VM's only get's up periodically and does the backup right after startup, and this one happens within few seconds. All this, while LXC from machine that runs 24/7 takes forever.

besides the big difference (dirty bitmaps with running VMs allow skipping of read operations of unchanged chunks), there's also way less complexity involved in the processing of backup data for VMs
- VM backups are block-based, each input chunk is a fixed size, nothing has to be done except compressing, hashing, potentially encrypting
- CT/host backups are file-based, input chunks are variable size, a directory tree has to be parsed (potentially lots of random I/O), read, converted into a pxar archive stream, then compressed, hashed, potentially encrypted

you can probably guess the latter is affected by how your file systems performs w.r.t. directory and metadata operations.

tomtom13 said:
To add insult to the injury, all VM verification takes a second or two for small incremental backups, while LXC takes 7 hours to verify incremental backup (which is few meg might I add). and that does hog disks like crazy for LXC.

what do you mean by verify? a PBS verification happens on the PBS side, and consists of reading all chunks of the snapshot and hashing their contents, so it's both read and CPU intensive, but it's basically the same for fidx and didx backups, except that the index format is a bit different. the contained data is not logically analysed or verified at all, so the difference in contents doesn't matter.

tomtom13 · Aug 22, 2022

fabian said:
you can probably guess the latter is affected by how your file systems performs w.r.t. directory and metadata operations.

Yes, you are absolutelly right. THOU, you (ie, operating system of proxmox) already knows what is the filesystem for underlying storage, because I select it from drop down in the gui - ZFS. There can be a separate logic per storage type, it's not that much complicated to encapsulate different diff functionality within switch case.

fabian said:
what do you mean by verify? a PBS verification happens on the PBS side, and consists of reading all chunks of the snapshot and hashing their contents, so it's both read and CPU intensive, but it's basically the same for fidx and didx backups, except that the index format is a bit different. the contained data is not logically analysed or verified at all, so the difference in contents doesn't matter.

Yes, this point - I meant on PBS. Still, this one is a bit strange - even thous backups are incremental for CT (amount reported to be sent from PVE to PBS), the verify (periodic job OR on receive - tested both) takes 6 - 7 hours, and PBS seems to scratch through all 4TB of data on the disk. And yes previous backups of this CT are verified ... one would assume that only the incremental backup of 10MB that was sent to PBS would get verified ... as you know, an increment ?. And alto now there are now 10 backups of 4tb CT in PBS, it only reports that storage is 4tb (and a bit more) occupied, so I don't presume every single backup is whole 4tb of data. Even the ZFS list shows that it's not that big:

Code:

zfs list
NAME               USED  AVAIL     REFER  MOUNTPOINT
rpool             12.7G  1.74T       96K  /rpool
rpool/ROOT        12.7G  1.74T       96K  /rpool/ROOT
rpool/ROOT/pve-1  12.7G  1.74T     12.7G  /
rpool/data          96K  1.74T       96K  /rpool/data
storage_cold      3.39T  49.3T     3.39T  /mnt/datastore/storage_cold

Side point (third one):
What I was thinking (more hoping) was that PBS would be (more) aware of it's storage, and if incremental backups would be performed on storage like ZFS, it would be based on built in snapshots,

fabian · Aug 22, 2022

tomtom13 said:
Yes, you are absolutelly right. THOU, you (ie, operating system of proxmox) already knows what is the filesystem for underlying storage, because I select it from drop down in the gui - ZFS. There can be a separate logic per storage type, it's not that much complicated to encapsulate different diff functionality within switch case.

no, the backup client is file-system agnostic, it uses regular file/directory operations, there is no diffing at that level.

tomtom13 said:
Yes, this point - I meant on PBS. Still, this one is a bit strange - even thous backups are incremental for CT (amount reported to be sent from PVE to PBS), the verify (periodic job OR on receive - tested both) takes 6 - 7 hours, and PBS seems to scratch through all 4TB of data on the disk. And yes previous backups of this CT are verified ... one would assume that only the incremental backup of 10MB that was sent to PBS would get verified ... as you know, an increment ?. And alto now there are now 10 backups of 4tb CT in PBS, it only reports that storage is 4tb (and a bit more) occupied, so I don't presume every single backup is whole 4tb of data. Even the ZFS list shows that it's not that big:

Code:

zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 12.7G 1.74T 96K /rpool rpool/ROOT 12.7G 1.74T 96K /rpool/ROOT rpool/ROOT/pve-1 12.7G 1.74T 12.7G / rpool/data 96K 1.74T 96K /rpool/data storage_cold 3.39T 49.3T 3.39T /mnt/datastore/storage_cold

if you verify a single snapshot, all the chunks of that snapshot will be verified. a snapshot is always complete, there are no incremental or full snapshots - all snapshots are "equal". only the uploading part is incremental in the sense that it skips re-uploading chunks that are already there on the server. the chunk store used by the datastore takes care of the deduplication. so yes, in your case, (re)verifying a single snapshot will read and verify 4TB of data. (re)verifying ten snapshots of the same CT with little churn between the snapshots will cause only a little bit more load than verifying a single one of them, since already verified chunks within a single verification task will not be verified a second time for obvious reasons.

TL;DR "verify after backup" is not necessary unless you are really paranoid. scheduled verification with sensible re-verification settings is the right choice for most setups to reduce the load caused by verification while retaining almost all the benefits.

tomtom13 said:
Side point (third one):
What I was thinking (more hoping) was that PBS would be (more) aware of it's storage, and if incremental backups would be performed on storage like ZFS, it would be based on built in snapshots,

PBS (both the client and server side) is pretty much storage agnostic and only uses regular file-system APIs.

LXC Containers Backing Up Incredibly Slow

Well-Known Member

Renowned Member

Well-Known Member

Renowned Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Active Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

We value your privacy