VM deletion freezes all VMs on NFS

Limetec · Aug 13, 2020

Hi,

Our setup is a 2 server proxmox cluster connected to a storage server via NFS over a dedicated network. Every time we try to delete a VM with its disk on the NFS storage, the other VMs (with disks on the NFS) freeze up. This makes it pretty much impossible for us to delete VMs during the workday.

We had the same problem with cloning and all the other intensive read/write operations till we set a bandwith limit in proxmox that fits our network. Only delete operations now still cause problems. I would have expected deleting to not be read/write intensive at all or at least adhere to the bandwith limit.

Any idea what we can do to mitigate this?

Alwin · Aug 13, 2020

Limetec said:
We had the same problem with cloning and all the other intensive read/write operations till we set a bandwith limit in proxmox that fits our network. Only delete operations now still cause problems. I would have expected deleting to not be read/write intensive at all or at least adhere to the bandwith limit.

Maybe not on the network but on the NFS server. You can check with iostat or atop what's going on.

Limetec · Aug 13, 2020

Alwin said:
Maybe not on the network but on the NFS server. You can check with iostat or atop what's going on.

Thank you for your response.

As you can see on the attached graphs, when I started the vm destroy process at 09:21, the normal network traffic between the proxmox servers and the nfs server slowed to a crawl and disk I/O went down too. Soon after I force stopped (at ~9:24) the destroy process, everything went back to normal. I checked on the nfs server and the vms disk has been deleted from the filesystem, even though the vm deletion wasn't completed.

Could it be that our nfs configuration is bad? It does look as if delete operations somehow lock/block normal read/write operations.

The nfs export is configured like this: (rw,no_root_squash,no_subtree_check,crossmnt,fsid=0)

Alwin · Aug 17, 2020

Limetec said:
Could it be that our nfs configuration is bad? It does look as if delete operations somehow lock/block normal read/write operations.

Can also be a possibility. The IO load graph is from the nfs server and from the disk that holds the export?

Limetec · Aug 17, 2020

Alwin said:
Can also be a possibility. The IO load graph is from the nfs server and from the disk that holds the export?

Yes, that is the IO load of the storage RAID on the nfs server. We could also replicate the problem by copying or deleting large files directly on the drive via ssh.

I also found out that during the freeze, proxmox writes following log messages over and over:

Code:

pvestatd[1479]: got timeout
pvestatd[1479]: unable to activate storage 'xxxxxxx' - directory '/mnt/pve/xxxxxx' does not exist or is unreachable

It's probably not directly a problem in proxmox but I still wonder why the storage bandwith limitation mitigates the problem in all operations but the delete one.

Any idea what we could do different?

Alwin · Aug 17, 2020

Limetec said:
It's probably not directly a problem in proxmox but I still wonder why the storage bandwith limitation mitigates the problem in all operations but the delete one.

The delete operation is a command and is well below a transfer bandwidth limit. Once the command is issued the storage on the other side will do the work to free the allocated blocks. Depending on the filesystem used on the storage you might be able to tune it. Also a regular TRIM inside the VM will free unused blocks beforehand.

Search

Search

VM deletion freezes all VMs on NFS

Limetec

Active Member

Alwin

Proxmox Retired Staff

Limetec

Active Member

Alwin

Proxmox Retired Staff

Limetec

Active Member

Alwin

Proxmox Retired Staff

We value your privacy