During some testing tonight sending backups to a CIFS volume, there was a problem with samba on the target server (it consumed over 80GB of RAM and SWAP which trashed the server). The unavailability of the CIFS server impacted on the backups that were running. I'd expected the backup tasks to fail, but they didn't. They just blocked.
Trying to stop the backups through the GUI did not work. Also, trying to kill the vzdump process on the pve nodes did not work. Everything was locked up because of the CIFS volume. Looking back through the logs there lots of messages about hung tasks etc. Restarting smbd and even rebooting the CIFS server didn't help the situation.
We ended up stopping the VMs and rebooting the nodes. But even a reboot would complete as things were still hung up trying to unmount the CIFS volume. We ended up having to do a hard reset of the pve node to get it functional again. A hard reboot just to get over a memory leak in Samba on the box we're sending backups to.
Is this expected behaviour or is there something wrong with our setup? I thought CIFS mounts were soft or interruptible by default. Shouldn't all mounts for volumes like CIFS and NFS be soft or at least interruptible in case something goes wrong? Having to crash a node just because a fileserver had issues is pretty drastic for a production environment.
Thanks
David
Trying to stop the backups through the GUI did not work. Also, trying to kill the vzdump process on the pve nodes did not work. Everything was locked up because of the CIFS volume. Looking back through the logs there lots of messages about hung tasks etc. Restarting smbd and even rebooting the CIFS server didn't help the situation.
We ended up stopping the VMs and rebooting the nodes. But even a reboot would complete as things were still hung up trying to unmount the CIFS volume. We ended up having to do a hard reset of the pve node to get it functional again. A hard reboot just to get over a memory leak in Samba on the box we're sending backups to.
Is this expected behaviour or is there something wrong with our setup? I thought CIFS mounts were soft or interruptible by default. Shouldn't all mounts for volumes like CIFS and NFS be soft or at least interruptible in case something goes wrong? Having to crash a node just because a fileserver had issues is pretty drastic for a production environment.
Thanks
David