All of our servers are in production use. So these lockups during vzdump-backups are disasters. For most of our LXC-based VMs I have been forced to switch backup-methods. At all 3 of our Proxmox VE servers I only have one LXC-VM left using vzdump for backups, and that's only because I haven't found any other methods for creating a snapshot of the 400GB B+ data tree with billions of very small files (without taking the server offline).
I have tried your new lxcfs-package at all 3 of our Proxmox VE servers. I followed your advise and tested if everything worked at one of the servers. And when I didn't see any problems, I installed the package at the other two Proxmox VE servers.Could you try the updated lxcfs packages in pve-test ( http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/lxcfs_2.0.0-pve1_amd64.deb and http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/lxcfs-dbg_2.0.0-pve1_amd64.deb ). This is a new upstream version which does not fix the issue completely (yet), but should move it from "occurs rarely" to "occurs almost never" territory. A complete fix is in the works and will hopefully follow soon.
Note that you need to stop your affected containers after updating before the new lxcfs binary is used, and should probably try this on non-production systems first (I did not encounter issues so far, but better be safe than sorry!)
Sounds good! Please keep us posted if you experience any issues with the updated packages.I have tried your new lxcfs-package at all 3 of our Proxmox VE servers. I followed your advise and tested if everything worked at one of the servers. And when I didn't see any problems, I installed the package at the other two Proxmox VE servers.
Now the servers have been running with the new lxcfs-package in a few days. And I have again activated scheduled vzdumps for all of the LXC-based VM. And up until now I haven't seen any lockups… Before the lxcfs-package update, the VM's locked up about one third of the times.
So I would like to say thanks… And from my experiences it looks like your changes are in the correct area.
lxc-pve: 1.1.5-7 lxcfs: 2.0.0-pve1
2228: Feb 25 22:26:30 INFO: Starting Backup of VM 2228 (lxc) 2228: Feb 25 22:26:30 INFO: status = running 2228: Feb 25 22:26:30 INFO: found old vzdump snapshot (force removal) 2228: Feb 25 22:26:30 ERROR: Backup of VM 2228 failed - Can't delete snapshot: 2228 vzdump zfs error: could not find any snapshots to destroy; check snapshot names.
INFO: Starting Backup of VM 7597 (lxc) INFO: status = running INFO: found old vzdump snapshot (force removal) ERROR: Backup of VM 7597 failed - Can't delete snapshot: 7597 vzdump zfs error: could not find any snapshots to destroy; check snapshot names.
VMID STATUS TIME SIZE FILENAME 2100 ok 00:00:28 298MB /bkup2/dump/vzdump-lxc-2100-2016_02_27-16_32_02.tar.lzo 2214 ok 00:00:52 1.03GB /bkup2/dump/vzdump-lxc-2214-2016_02_27-16_32_30.tar.lzo 2217 ok 00:04:20 3.17GB /bkup2/dump/vzdump-lxc-2217-2016_02_27-16_33_22.tar.lzo 2219 ok 00:01:57 959MB /bkup2/dump/vzdump-lxc-2219-2016_02_27-16_37_42.tar.lzo 2227 ok 00:03:34 2.62GB /bkup2/dump/vzdump-lxc-2227-2016_02_27-16_39_39.tar.lzo 2228 ok 00:01:41 960MB /bkup2/dump/vzdump-lxc-2228-2016_02_27-16_43_13.tar.lzo 2249 ok 00:01:54 985MB /bkup2/dump/vzdump-lxc-2249-2016_02_27-16_44_54.tar.lzo