High I/O delay during backups on ZFS

decibel83

Renowned Member
Oct 15, 2008
210
1
83
Hi,
I am encountering some problems making backups on many servers running ZFS.

The PVE cluster is made of 4 nodes, every node have a ZFS mirror pool on 2 NVMe SSD drives and a full backup is made once per week at 00:30 on different days per each node:

Screenshot 2020-09-14 at 17.08.58.png

Screenshot 2020-09-14 at 17.08.50.png

Every time a backup is started I see some outages on some virtual machines which remains hanged for a while, and backups takes about 2 hours normally.
This happens on all nodes when a backup is made.

Last night a backup on one node took four times more (about 8 hours!) and I saw 8 to ~12 IO delay during the backup job (from 2:30 to 8:30), and during this period some virtual machines were randomly hanged:

Screenshot 2020-09-14 at 17.04.23.png

I have the same behaviour during a restore of a virtual machine from backup (the 5.34 IO delay peak at 16:30 on the above graph).

Could you help me to understand what's going on, please?

Thank you very much!
 
Hi,

Try to trim your vms and your zfs pool. What are zfs dataset proprieties where you store your backups.
 
Last edited:
Try to trim your vms and your zfs pool.

Thanks for your reply!

I know that I can trim my ZFS pool with zpool trim <poolname>, but why about trimming my vms? What do you mean exactly?

Thank you very much!
 
I also see high I/O delay during backups, however:
It does not create problems for virtual machines or containers, probably because the backups are to a directory on another ZFS pool and disks (and reads are RAID-1).
It does create problems when I use cache=writeback because for unknown reasons Proxmox starts swapping (not a problem itself), which causes synchronous writes that will serialize all writes to disk (expected to be fixed in a future version of ZFS), which makes backups take ten times longer.

Sorry for not giving a solution, but I wonder what kind of caching you selected for your VM and whether you are writing the backups to the same ZFS pool (or same disks)?

You can throttle the backup and restore bandwidth (See also this post). Maybe that can (temporarily) mitigate this issue for you?
 
Hi again,

Inside the VM/CT (linux) you must run:

fstrim -va

It it also wise to use a daily cron job (google). Also you can set noatime option for all your mounted fs in /etc/fstab


Good luck / Bafta
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!