Missing backup NFS share stalls proxmox

Oct 22, 2009
92
1
26
Hi,

We had a breakdown of a server last friday due to a missing NFS share for backup. It's a rather big logfile, but it seems the lost connection to an NFS server has something to do with it.
I noticed this is similar to this thread: http://forum.proxmox.com/threads/40...g-quot?highlight=not+responding,+still+trying. Is there a supported way for cancelling backup if an NFS share is suddenly unavailable?

The other issue in this case is that the backup NFS is (for the time being) a local openVZ machine with an NFS share. So actually I'm mounting a NFS on the host which is shared from a local openvz vm using unfs3. Could this be the root cause for the trouble? Any body else running a similar setup? As the log file show a lot more went wrong than just a missing NFS share. Previously the backup vm was placed on another proxmox host. Part of the log file is attached.

Code:
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.18-4-pve
proxmox-ve-2.6.18: 1.7-10
pve-kernel-2.6.32-3-pve: 2.6.32-14
pve-kernel-2.6.18-4-pve: 2.6.18-10
qemu-server: 1.1-25
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-9
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm-2.6.18: 0.9.1-10
Thanks in advance,
Bo
 

Attachments

just had the same problem today...
trying to restart my brand new VM hosting the NFS backup..

the GUI was frozen, no way to do qm list, or list /mnt/pve...
the VM were running without issue btw..

the only way i found was to modify the storage.cfg on one node where nothing was running, rebooted the node
scp the conf file from the cluster master and started the NFS server

=> the cluster was ok after that.

that means, I had a "not so good idea" to put the NFS Backup in a VM....
too bad
 
I can confirm the same issue. I was not able to find anything useful in google, except vague hints that nfs writes from a host node to a VM were not throttled and might over-consume kernel memory. Not sure if I believe that. Whatever went wrong is on the nfs server, since I found that the VM was perfectly responsive, but doing anything at all that involved the file/directory would hang the process on the VM in 'D' state. In my case, on the server, I was trying to delete a several GB backup file on the nfs share. Doing so hung. Do an 'ls -l' on the nfs server VM in the directory holding that backup file also hung in 'D' state. I had to reboot everything to clear this up. Ouch...