VM deletion freezes all VMs on NFS

Aug 12, 2020
3
0
1
Hi,

Our setup is a 2 server proxmox cluster connected to a storage server via NFS over a dedicated network. Every time we try to delete a VM with its disk on the NFS storage, the other VMs (with disks on the NFS) freeze up. This makes it pretty much impossible for us to delete VMs during the workday.

We had the same problem with cloning and all the other intensive read/write operations till we set a bandwith limit in proxmox that fits our network. Only delete operations now still cause problems. I would have expected deleting to not be read/write intensive at all or at least adhere to the bandwith limit.

Any idea what we can do to mitigate this?
 
We had the same problem with cloning and all the other intensive read/write operations till we set a bandwith limit in proxmox that fits our network. Only delete operations now still cause problems. I would have expected deleting to not be read/write intensive at all or at least adhere to the bandwith limit.
Maybe not on the network but on the NFS server. You can check with iostat or atop what's going on.
 
Maybe not on the network but on the NFS server. You can check with iostat or atop what's going on.

Thank you for your response.

As you can see on the attached graphs, when I started the vm destroy process at 09:21, the normal network traffic between the proxmox servers and the nfs server slowed to a crawl and disk I/O went down too. Soon after I force stopped (at ~9:24) the destroy process, everything went back to normal. I checked on the nfs server and the vms disk has been deleted from the filesystem, even though the vm deletion wasn't completed.

proxmox-delete-vm.png


Could it be that our nfs configuration is bad? It does look as if delete operations somehow lock/block normal read/write operations.

The nfs export is configured like this: (rw,no_root_squash,no_subtree_check,crossmnt,fsid=0)
 
Could it be that our nfs configuration is bad? It does look as if delete operations somehow lock/block normal read/write operations.
Can also be a possibility. The IO load graph is from the nfs server and from the disk that holds the export?
 
Can also be a possibility. The IO load graph is from the nfs server and from the disk that holds the export?

Yes, that is the IO load of the storage RAID on the nfs server. We could also replicate the problem by copying or deleting large files directly on the drive via ssh.

I also found out that during the freeze, proxmox writes following log messages over and over:
Code:
pvestatd[1479]: got timeout
pvestatd[1479]: unable to activate storage 'xxxxxxx' - directory '/mnt/pve/xxxxxx' does not exist or is unreachable

It's probably not directly a problem in proxmox but I still wonder why the storage bandwith limitation mitigates the problem in all operations but the delete one.

Any idea what we could do different?
 
It's probably not directly a problem in proxmox but I still wonder why the storage bandwith limitation mitigates the problem in all operations but the delete one.
The delete operation is a command and is well below a transfer bandwidth limit. Once the command is issued the storage on the other side will do the work to free the allocated blocks. Depending on the filesystem used on the storage you might be able to tune it. Also a regular TRIM inside the VM will free unused blocks beforehand.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!