vzdump Output Explained? Finding bottleneck

tufkal

New Member
Feb 3, 2018
11
1
1
44
I have one particular VM that is taking a huge amount of time to backup, and I'm trying to figure out what the problem is. To start, it would be helpful if I understood the backup log. Example:

INFO: status: 90% (1932821856256/2147483648000), sparse 61% (1311810846720), duration 52643, 585/0 MB/s

I understand everything but the XXX/XXX MB/s. At some points near the beginning I see around 14/14 MB/s, but then at around 30% it goes to around 600/0 MB/s for the rest of the backup.

What is the are those numbers referencing? In my head I tried to apply that it was current/average or current/max or some other combination of measurement, but nothing I could think of would scale like it is doing.

What the heck are those numbers in that speed measurement?

BACKGROUND: Backup is over gigabit to a NFS share on a NAS. Have watched iotop, iftop, and top during backup process and see nothing out of the ordinary compared to usual operation.
 
Look at the start of vzdump there is something like "read/write". Until vzdump reaches spares you have no blocks to write, you only get fast reads.
 
Unfortunately, the start of the log doesn't show anything like read/write, and based on the numbers I don't think that's it. Here's a full log of a very small VM for reference.

pastebin.ca/3968200 (gotta add h t t p s : / /, forums wont let me post links)

If I knew that those numbers meant, maybe it would tell me why the one VM I have takes long. None of the documentation I searched on vzdump explains it's output (which I find odd), so that's why I'm asking here.
 
5 minutes for a backup of a 70 GB disk isn't bad.

The two values at the end of a log line separated by a slash are the read/write speeds as @HBO already mentioned.

Code:
INFO: status: 0% (182452224/68719476736), sparse 0% (139419648), duration 3, 60/14 MB/s

The backup process reads with 60 MB/s from the underlying storage (ZFS) and compresses the data on the fly which results in 14 MB/s written to the backup storage.

Code:
INFO: status: 16% (11276124160/68719476736), sparse 14% (10116689920), duration 113, 710/0 MB/s

If the read and write difference is really big then a lot of zeroes (sparse) were found on the disk and they can be compressed to nearly 0 which means that nothing is written to the backup storage.
This occurs typically if you have a big disk (like 70 GB) but the filesystem in the VM has only allocated 1-2 GB of data. The rest of the virtual disk is empty and therefore only has zeroes which results in nothing to backup. Nevertheless the backup process still needs to read the whole disk to check for any data which results in the long backup as even if not much is used on the disk itself. Proxmox isn't aware on what is allocated in the VM and always backs up the whole disk.

If something is unclear or you have further questions, feel free to ask.

Note that you shouldn't make your disks much bigger than the data you actually store on the VM as otherwise you will have long backup times and over time deleted data in the VM will still be backed up as the data is not removed from the disk itself. If you want to avoid this you need to enable the Discard option in the disk options and run `fstrim -a` inside the VM or setup a cron job that does this. Most distributions have one built in by default that you can enable with `systemctl enable fstrim.timer`.
 
Hello Phinitris -
regarding fstrim set up :

1- in a kvm does an adjustment need to be made to /etc/fstab inside the kvm?

2- for lxc : does fstrim work?

3- systemd does not seem to work on pve host on in a kvm using systemd :
Code:
# systemctl status  fstrim.timer
Unit fstrim.timer could not be found.
Is there another step needed to set up fstrim in systemd?

PS: thank you for the write up on fstrim .
 
5 minutes for a backup of a 70 GB disk isn't bad.

The two values at the end of a log line separated by a slash are the read/write speeds as @HBO already mentioned.

Code:
INFO: status: 0% (182452224/68719476736), sparse 0% (139419648), duration 3, 60/14 MB/s

The backup process reads with 60 MB/s from the underlying storage (ZFS) and compresses the data on the fly which results in 14 MB/s written to the backup storage.

Code:
INFO: status: 16% (11276124160/68719476736), sparse 14% (10116689920), duration 113, 710/0 MB/s

If the read and write difference is really big then a lot of zeroes (sparse) were found on the disk and they can be compressed to nearly 0 which means that nothing is written to the backup storage.
This occurs typically if you have a big disk (like 70 GB) but the filesystem in the VM has only allocated 1-2 GB of data. The rest of the virtual disk is empty and therefore only has zeroes which results in nothing to backup. Nevertheless the backup process still needs to read the whole disk to check for any data which results in the long backup as even if not much is used on the disk itself. Proxmox isn't aware on what is allocated in the VM and always backs up the whole disk.

If something is unclear or you have further questions, feel free to ask.

Note that you shouldn't make your disks much bigger than the data you actually store on the VM as otherwise you will have long backup times and over time deleted data in the VM will still be backed up as the data is not removed from the disk itself. If you want to avoid this you need to enable the Discard option in the disk options and run `fstrim -a` inside the VM or setup a cron job that does this. Most distributions have one built in by default that you can enable with `systemctl enable fstrim.timer`.

Thank you. While you and HBO both said the same thing, your answer laid out the process much better and is what I needed to know. The VM I am having a 'too long' backup problem is indeed provisioned much larger than the data that resides on it, so I now see it spends alot of time processing empty areas. This also explains why the final measured MB/s at the end (which is just calculated on total size and time) is so low (40MB/s) on that machine, because when its averaged out its spending a lot of time not writing.

My typical machines with small provisioned disks end up with that final measured MB/s at around 200MB/s, which is what I would expect to see going over gigabit to a RAID10 NFS NAS. It is now clear both why that one takes so long, and why it's final measurement is so low.

I will trim that disk up as you suggested. Problem solved. Thank you!
 
@RobFantini
1. Not necessarily. You can add the discard option to the fstab file which immediately trims the block when it get's deleted but you will probably take a performance hit. It's better to trim once a week like the systemd service does.

2. I never used LXC but I guess so. `fstrim /` should report either if it's supported or not.

3. Not every distro has this timer. See: https://www.digitalocean.com/commun...eriodic-trim-for-ssd-storage-on-linux-servers

@tufkal
You're welcome. But note that trimming won't make it faster as it's currently only reading zeroes as the disk is mostly uninitialised yet. You need to make the physical (virtual) disk smaller. As you write/delete more data on the disk trimming will come in handy but right now you already have a trimmed disk.
 
@tufkal
You're welcome. But note that trimming won't make it faster as it's currently only reading zeroes as the disk is mostly uninitialised yet. You need to make the physical (virtual) disk smaller. As you write/delete more data on the disk trimming will come in handy but right now you already have a trimmed disk.

Bad choice of words heh! When I said i'd trim it down, i meant I was going to shrink it down, not just the literal TRIM process.

If I am reading the documentation properly, once I have used fstrim and all the data is orderly and TRIM, I simply use 'qemu-img convert' and the resulting file will be shrunk down without all of that empty space. Then I can grow it as needed to keep the backup times in check. (Correct me on these points if I am wrong!)

All great information, thanks again!
 
@tufkal
Just wanted to make sure we're both on the same page :)
It's not needed to convert the file after TRIM. It will have no effect on the used storage on the hypervisor and the backup time. A simple TRIM is enough to use minimal space on the hypervisor and have fast backups.
 
If I am reading the documentation properly, once I have used fstrim and all the data is orderly and TRIM, I simply use 'qemu-img convert' and the resulting file will be shrunk down without all of that empty space. Then I can grow it as needed to keep the backup times in check. (Correct me on these points if I am wrong!)

First, I assume you use QCOW2 or ZFS (and thin LVM to some extent), otherwise you will not be able to use TRIM correctly. The same is true for KVM VMs without virtio-scsi and trimming enabled on the disk. If and only if all these requirements are met, TRIM will work. The underlying storage system must support it in order to get it to work. If it does not work, you can always write zeros to your virtual disk and "empty" or "zeroize" the free space, thus increasing your backup speed. I use it all the time on shared storage or just plain lvm volumes.

You do not need to convert the QCOW2 files to get better backup times. The backup process will automatically skip zeros on the disk, so the only benefit you have is to (possible defragment and) shrink the size of the file on your file-based storage.

In general, backup times are influenced by read time of the source disk, compression time (most compression is single thread bound, so the weak point on a multi-core cpu) and store/write time on your NFS storage.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!