Tape Backup - Large VM snapshots pause for hours before writing

BriBriBri · Feb 16, 2022

We have two 11TB VMs (among many other smaller VMs) backing up to a PBS server.

Admittedly, the hardware PBS is running on is not ideal. It's HDD-based. Twenty-four Seagate ST10000NM0086 10TB SATA drives operating in a RAIDZ2 datastore. There are no SSD-based zfs "special devices" and we have noticed that Proxmox's pleas to use SSDs when possible are well-founded.

However, all that aside, since I'm not sure it's related. . . my question is this:

Is there some type of verification process or some other task that PBS performs before writing to tape for each snapshot? Because we have noticed that for these two large (again, 11TB) VMs, for each of their snapshots, the PBS server stops writing to tape, the tape device (a Quantum SuperLoader 3) goes completely idle, and the PBS server just accesses the disks for 2-3 hours before writing resumes. Then this process repeats when the next snapshot for one of these large VMs is written. Obviously, this can add many hours or even days to a tape job where anything but the latest snapshots are written.

Code:

2022-02-14T06:14:44-08:00: percentage done: 33.65% (63/190 groups, 13/14 snapshots in group #64)
2022-02-14T06:14:44-08:00: backup snapshot vm/162/2022-02-04T02:25:39Z
2022-02-14T06:14:55-08:00: wrote 487 chunks (1097.60 MB at 174.89 MB/s)
2022-02-14T06:14:59-08:00: end backup pbs1-primary:vm/162/2022-02-04T02:25:39Z
2022-02-14T06:14:59-08:00: percentage done: 33.68% (64/190 groups)
2022-02-14T06:14:59-08:00: backup snapshot vm/163/2021-11-06T17:18:10Z
2022-02-14T07:39:39-08:00: wrote 5011 chunks (4296.02 MB at 0.85 MB/s)
2022-02-14T07:40:10-08:00: wrote 1847 chunks (4296.02 MB at 140.13 MB/s)
2022-02-14T07:40:43-08:00: wrote 1810 chunks (4295.49 MB at 135.61 MB/s)
2022-02-14T07:41:21-08:00: wrote 1989 chunks (4297.06 MB at 124.38 MB/s)
2022-02-14T07:41:55-08:00: wrote 1824 chunks (4296.54 MB at 136.88 MB/s)
2022-02-14T07:42:31-08:00: wrote 1885 chunks (4297.85 MB at 129.07 MB/s)
2022-02-14T07:43:04-08:00: wrote 1793 chunks (4298.90 MB at 136.89 MB/s)

That's the prior VM finishing, then the first snapshot of one of our large VMs beginning to be written. There is a delay from 06:14:59 to when the first chunks are written to tape at 07:39:39. That's ~85 minute delay.

Its next snapshot

Code:

2022-02-14T18:55:56-08:00: wrote 2470 chunks (4297.59 MB at 111.09 MB/s)
2022-02-14T18:56:35-08:00: wrote 2032 chunks (4297.59 MB at 127.29 MB/s)
2022-02-14T18:56:41-08:00: wrote 316 chunks (696.25 MB at 178.56 MB/s)
2022-02-14T18:56:47-08:00: end backup pbs1-primary:vm/163/2021-11-06T17:18:10Z
2022-02-14T18:56:47-08:00: percentage done: 33.82% (64/190 groups, 1/4 snapshots in group #65)
2022-02-14T18:56:47-08:00: backup snapshot vm/163/2021-12-02T12:00:02Z
2022-02-14T21:47:04-08:00: wrote 2783 chunks (4298.64 MB at 0.42 MB/s)
2022-02-14T21:47:46-08:00: wrote 2045 chunks (4298.90 MB at 109.25 MB/s)
2022-02-14T21:48:30-08:00: wrote 2071 chunks (4295.75 MB at 105.60 MB/s)

The prior snapshot finished at 18:56:47 and the job to write (to tape) the same VM's next snapshot begins at the same timestamp (18:56:47). But writing doesn't actually begin until 21:47:04. That's ~171 minute delay. Approaching three hours! And again, while this is happening, only disk IO is taking place with no tape drive activity.

So, to restate the question: Is this normal? Is there some type of verification/preparation job/task running on the snapshot prior to writing it to tape that could account for this? Any advice on speeding it up? We anticipate changing to a raid10-based datastore (from raidz2) once our snapshots are all on tape. So that might help. But we'd like to better understand exactly what is happening during these long pauses taking place in the middle of a tape job.

Thanks!

--Brian

dcsapak · Feb 17, 2022

ok, i think what happens is the following: when we start a backup of snapshot, we first sort the chunks by inode number, because in many cases reading the chunks in inode order works better (for ssd it doesn't matter, but for hdds it can make a massive difference). in your case a 11TiB backup of ~2MiB chunks (from the logs) results in >5mio. chunks that get stat'ed at the start which is probably very slow in your setup.

the immediate "fix" would be to add ssd special devices, since that speeds up the stat'ing of the chunks (the sorting shouldn't take very long)
i'll also send a patch (today, maybe tomorrow) so that we can enable/disable that behaviour per datastore
(i'd tell you when that would reach the pbs-test repository and maybe you can test it)

also there are a few internal optimizations that i can implement that may make that faster by not needing to stat every chunk
(this would optimize it when backing up multiple snapshots of the same vm if they share chunks)

changing to a raid10 could make that faster too, but i cannot say if or how much

BriBriBri · Feb 17, 2022

Thank you for that awesome reply! It is very informative and we're very appreciative that dev time will be put into the noted behavior!

Meanwhile, our tape job has finished and we can now nuke our datastore and begin testing/experimenting.

dcsapak · Feb 24, 2022

just fyi, there are now two things commited:

a fix for skipping stat'ing already backed up chunks in the media set:
https://git.proxmox.com/?p=proxmox-...ff;h=dcd9c17ffff50df4552ae4a789590aa95787f956

and an option in the datastore for disabling sorting by inode altogether:
https://git.proxmox.com/?p=proxmox-...ff;h=fef61684b447dfb18eaea3cefc2cca315fb14d02
(this can then be enabled with 'proxmox-backup-manager datastore update <DATASTORE> --tuning chunk-order=none'

both should be available with proxmox-backup-server 2.1.6-1 (or higher), i'll post again when this lands in the pbs-test repository

Tape Backup - Large VM snapshots pause for hours before writing

BriBriBri

Active Member

dcsapak

Proxmox Staff Member

BriBriBri

Active Member

dcsapak

Proxmox Staff Member

We value your privacy