Hi guys,
Noticed an issue with one of our VZ containers over the weekend and tracked the issue back to a frozen backup.
We have a FreeNAS server running to serve a NFS share for backups. The scheduled backup process froze while initiating the backup which has resulted in many issues:
1) NFS mount is stuck: "df -h", "mount" or navigating to the share freezes the console session immediately.
2) Attempting to unmount results in "umount.nfs: /mnt/pve/xxxxxxx: device is busy".
3) Can no longer log into the proxmox web interface (no idea why this is, the login dialog sits on "Please wait..." for a minute and results in "login failed"). SSH still works. Tried Chrome, IE and Firefox.
4) Can't run "qm list". Responds with:
ipcc_send_rec failed: Resource temporarily unavailable
ipcc_send_rec failed: Resource temporarily unavailable
ipcc_send_rec failed: Resource temporarily unavailable
5) /var/lib/vz got very full. Could this cause everything to die even though the backups weren't even landing there? (going to NFS)
6) Had to "kill -9" all vzdump processes associated with the NFS share which froze.
7) Can't run "lsof" to see if any other processes are tying up the NFS share, therefore can't umount. Just sits on empty command line.
8) I have to run commands in Screen sessions just in case they freeze up the terminal.
Apart from all this, the VMs are all functioning. I'm hesitant to reboot during the day as this is a client's production machine. On top of that, I've run into similar issues like this which prevented clean rebooting due to the inability to unmount the NFS shares. This is the one reason I hate NFS with a passion, no matter how stable my network seems to be, it's the one protocol which, when it dies, goes in a huge ball of flames.
Noticed an issue with one of our VZ containers over the weekend and tracked the issue back to a frozen backup.
We have a FreeNAS server running to serve a NFS share for backups. The scheduled backup process froze while initiating the backup which has resulted in many issues:
1) NFS mount is stuck: "df -h", "mount" or navigating to the share freezes the console session immediately.
2) Attempting to unmount results in "umount.nfs: /mnt/pve/xxxxxxx: device is busy".
3) Can no longer log into the proxmox web interface (no idea why this is, the login dialog sits on "Please wait..." for a minute and results in "login failed"). SSH still works. Tried Chrome, IE and Firefox.
4) Can't run "qm list". Responds with:
ipcc_send_rec failed: Resource temporarily unavailable
ipcc_send_rec failed: Resource temporarily unavailable
ipcc_send_rec failed: Resource temporarily unavailable
5) /var/lib/vz got very full. Could this cause everything to die even though the backups weren't even landing there? (going to NFS)
6) Had to "kill -9" all vzdump processes associated with the NFS share which froze.
7) Can't run "lsof" to see if any other processes are tying up the NFS share, therefore can't umount. Just sits on empty command line.
8) I have to run commands in Screen sessions just in case they freeze up the terminal.
Apart from all this, the VMs are all functioning. I'm hesitant to reboot during the day as this is a client's production machine. On top of that, I've run into similar issues like this which prevented clean rebooting due to the inability to unmount the NFS shares. This is the one reason I hate NFS with a passion, no matter how stable my network seems to be, it's the one protocol which, when it dies, goes in a huge ball of flames.