Any issues to be aware of when deleting large files under ZFS?

Pyromancer

Member
Jan 25, 2021
29
7
8
48
I have two large Proxmox 7.0-11 hosts in a cluster, with 50TB ZFS local storage each, and most VMs replicating between them.

On one of these hosts are two large (5TB) VMs, both shut down and out of use, which were nested hypervisors, one Xen, one VMware. One is VM replicated to the other host, the other exists only on one host.

As disk space is getting tight, I want to remove both of these VMs. However we have had bad experiences with attempting to remove 1TB VMs from VMware hosts in the past where the delete process entirely locks the disks, most recently a host was power-cycled after three days to free the disk, and the delete was found to have only removed ~500 gig of the 1TB used.

So my current plan is to migrate all the active VMs to the other host first, and then do the deletes on the empty host, so that if anything goes awry with it, services won't be affected. But I've a few questions:

1. Is there any risk of any kind of disk lock-up when deleting very large files from a ZFS HDD array? It's RaidZ1, 9 x 6TB disks plus spares.

2. If I turn off replication for the VM of the pair that's being replicated, will the system attempt to delete the 5TB of data from the remote host, or just stop replicating to it but leave the disk file where it is?

3. Any other caveats or issues I should be aware of before proceeding?

I tried googling "zfs large disk delete" and related terms but didn't come across anything concrete, but did encounter people finding deletes taking a long time.
 
  1. There is a small risk of a disk lock-up when deleting large files from a ZFS HDD array, as with any file system. However, ZFS is designed to handle large file deletions efficiently and the risk of lock-up should be minimal.
  2. If you turn off replication for the VM of the pair that is being replicated, the system will stop replicating to the remote host, but will not delete the 5TB of data from the remote host. You will need to manually delete the data from the remote host if you wish to remove it.
  3. Some other things to consider when deleting large files from a ZFS array include:
  • It may take a significant amount of time for the deletion process to complete, depending on the size of the files and the speed of your storage system.
  • If you are deleting a large number of files at once, it may be more efficient to delete them in smaller batches, as this can reduce the load on the storage system and speed up the deletion process.
  • If you are deleting a large number of files and have other VMs running on the same storage array, it may be a good idea to temporarily pause or migrate these VMs to another host to reduce the load on the storage system during the deletion process.
  • If you are deleting a large number of files and have replication enabled, it may be a good idea to temporarily disable replication to reduce the load on the storage system during the deletion process.
 
  1. There is a small risk of a disk lock-up when deleting large files from a ZFS HDD array, as with any file system. However, ZFS is designed to handle large file deletions efficiently and the risk of lock-up should be minimal.
  2. If you turn off replication for the VM of the pair that is being replicated, the system will stop replicating to the remote host, but will not delete the 5TB of data from the remote host. You will need to manually delete the data from the remote host if you wish to remove it.
  3. Some other things to consider when deleting large files from a ZFS array include:
  • It may take a significant amount of time for the deletion process to complete, depending on the size of the files and the speed of your storage system.
  • If you are deleting a large number of files at once, it may be more efficient to delete them in smaller batches, as this can reduce the load on the storage system and speed up the deletion process.
  • If you are deleting a large number of files and have other VMs running on the same storage array, it may be a good idea to temporarily pause or migrate these VMs to another host to reduce the load on the storage system during the deletion process.
  • If you are deleting a large number of files and have replication enabled, it may be a good idea to temporarily disable replication to reduce the load on the storage system during the deletion process.

After posting the query I decided to do a test, there's an old dev VM, shut down, only 32 gig, on one of the hosts, so I set that up to replicate, and saw that the data was copied across and the disk file then existed on both hosts.

I let the process fully complete, log showing all finished, left it for 10 minutes or so after the replication had fully completed, I then removed replication - and the remote copy of the disk file disappeared.

This would appear to suggest that removing replication does in fact delete the remote copy?

Given this, and with the large OOU VMs on Host B, with one replicated to Host A, my current plan is:

Migrate all the VMs to Host B.
Remove replication from the OOU VM.
Wait for Host A to fully delete its replicated copy of the file.
Migrate all the VMs back to Host A.
Delete the two large VMs from Host B.
Migrate some of the VMs back to Host B to balance the load across the systems.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!