High I/O delay during backups on ZFS

decibel83 · Sep 14, 2020

Hi,
I am encountering some problems making backups on many servers running ZFS.

The PVE cluster is made of 4 nodes, every node have a ZFS mirror pool on 2 NVMe SSD drives and a full backup is made once per week at 00:30 on different days per each node:

Every time a backup is started I see some outages on some virtual machines which remains hanged for a while, and backups takes about 2 hours normally.
This happens on all nodes when a backup is made.

Last night a backup on one node took four times more (about 8 hours!) and I saw 8 to ~12 IO delay during the backup job (from 2:30 to 8:30), and during this period some virtual machines were randomly hanged:

I have the same behaviour during a restore of a virtual machine from backup (the 5.34 IO delay peak at 16:30 on the above graph).

Could you help me to understand what's going on, please?

Thank you very much!

guletz · Sep 15, 2020

Hi,

Try to trim your vms and your zfs pool. What are zfs dataset proprieties where you store your backups.

decibel83 · Sep 15, 2020

guletz said:
Try to trim your vms and your zfs pool.

Thanks for your reply!

I know that I can trim my ZFS pool with zpool trim <poolname>, but why about trimming my vms? What do you mean exactly?

Thank you very much!

leesteken · Sep 15, 2020

I also see high I/O delay during backups, however:
It does not create problems for virtual machines or containers, probably because the backups are to a directory on another ZFS pool and disks (and reads are RAID-1).
It does create problems when I use cache=writeback because for unknown reasons Proxmox starts swapping (not a problem itself), which causes synchronous writes that will serialize all writes to disk (expected to be fixed in a future version of ZFS), which makes backups take ten times longer.

Sorry for not giving a solution, but I wonder what kind of caching you selected for your VM and whether you are writing the backups to the same ZFS pool (or same disks)?

You can throttle the backup and restore bandwidth (See also this post). Maybe that can (temporarily) mitigate this issue for you?

guletz · Sep 15, 2020

Hi again,

Inside the VM/CT (linux) you must run:

fstrim -va

It it also wise to use a daily cron job (google). Also you can set noatime option for all your mounted fs in /etc/fstab

Good luck / Bafta

Search

Search

High I/O delay during backups on ZFS

decibel83

Renowned Member

guletz

Famous Member

decibel83

Renowned Member

leesteken

Distinguished Member

guletz

Famous Member