I'm running a CT which mounts a CIFS and a SSHFS mounts (not mounting-point, but mounted inside the privileged CT). When the nightly backup runs (mode: snapshot) it stalls; in the morning it says "Config locked (snapshot)" and I can't SSH into the box.

Question-1: Is the problem because of the underlying SSHFS mount?
- UPD: Yes, I've tracked it down to the SSHFS mount: if I mount a SSHFS, the "creating snapshot" hangs forever in the host
Question-2: As all my mounts always go under /mnt/ is it solved by putting the exclude-path: /mnt/ into the /etc/vzdump.conf?
Question-3: How to unblock this deadlock without rebooting the whole host?
I've tried:
	
	
	
		
	
	
	
		
	
	
	
		
Both fuser and lsof are dead after calling them: no response, just dead, don't return to the console with any output/result.
Question-4: What does the 'snapshot' actually mean? Is it a ZFS-snapshot or some kind of vzdump thing?
- I don't see any 'vzdump' snapshot on the underlying ZFS at that subvolume.
- pct listsnapshot gives me some kind of 'snapshot' which doesn't seem to relate to any ZFS-snapshot; I can see it only in the 102.conf but nowhere else.
Question-5: Who exactly (which process) is deadlocked? In the ps -ex I see only a general 'vzdump -a', but not a specific blocking process. I could attach gdb to it and look inside, but it's easier to ask I guess ;-).
UPD: I had to reboot the host at the end, as I was not able to kill the create storage snapshot 'vzdump' result/command. The added exclude-path didn't help as well, as the snapshot being taken ignores this - I guess it relates later only to the backup's file-enumeration. Also today, the "create storage" log entry today is the very last killing-line: server hanging once again. But this time I've killed the task forked from the perl-vzdump, which made the backup-routine to continue, but the CT 102 keeps totally unresponsive: I can't kill it (lxc-stop 102), the command blocks and never returns.
I've tried to kill every CT-102 related task: this made the CT-102 offline in the GUI, but I was not able to start it anymore:
Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 102 lxc pre-start produced output: failed to remove directory '/sys/fs/cgroup/systemd/lxc/102/ns/user.slice/user-0.slice/session-21.scope': Device or resource busy
I have to reboot most probably once again. Very ugly, gives bad feeling about the whole thing.
Proxmox should definitely realize it's taking a snapshot forever with no return - and warn me, and allow me to abort, and has some obvious setting to exclude anything 'dangerous' from snapshoting. Like this, the backup runs, halts on taking snapshot and I see no way out =>
Question-6: Those users paying a subscription get the support/response faster? What about the weekends? Is there a difference when asking as a free and as a paying user? I need this only at home and only privately, but it drives me crazy anyway when waiting the whole weekend for a response, as one has time for this private stuff actually on the weekend !
!
Thank you very much
				
			
Question-1: Is the problem because of the underlying SSHFS mount?
- UPD: Yes, I've tracked it down to the SSHFS mount: if I mount a SSHFS, the "creating snapshot" hangs forever in the host
Question-2: As all my mounts always go under /mnt/ is it solved by putting the exclude-path: /mnt/ into the /etc/vzdump.conf?
Question-3: How to unblock this deadlock without rebooting the whole host?
I've tried:
		Code:
	
	root@prox:~# pct unlock 102
trying to acquire lock...
can't lock file '/run/lock/lxc/pve-config-102.lock' - got timeout
		Code:
	
	- root@prox:~# pct listsnapshot 102
    `-> vzdump                      2020-05-17 00:32:57     vzdump backup snapshot
    `-> current                                             You are here!
		Code:
	
	root@prox:~# cat /etc/pve/lxc/102.conf
#mp0%3A /wd1/encrypted,mp=/mnt/wd1
#mp1%3A /wd2/encrypted,mp=/mnt/wd2
arch: amd64
cores: 1
features: mount=cifs
hostname: vm-linuxtasks
lock: snapshot
memory: 1024
mp0: /wd1/encrypted,mp=/mnt/wd1
mp1: /wd2/encrypted,mp=/mnt/wd2
mp2: /wd3/encrypted,mp=/mnt/wd3
mp3: /data8x3/encrypted,mp=/mnt/data8x3
net0: name=eth0,bridge=vmbr0,firewall=1,gw=172.16.77.2,hwaddr=1A:2F:CB:57:43:FC,ip=172.16.77.30/24,type=veth
ostype: ubuntu
rootfs: encrypted-zfs:subvol-102-disk-4,size=50G
swap: 0
[vzdump]
#vzdump backup snapshot
arch: amd64
cores: 1
features: mount=cifs
hostname: vm-linuxtasks
memory: 1024
mp0: /wd1/encrypted,mp=/mnt/wd1
mp1: /wd2/encrypted,mp=/mnt/wd2
mp2: /wd3/encrypted,mp=/mnt/wd3
mp3: /data8x3/encrypted,mp=/mnt/data8x3
net0: name=eth0,bridge=vmbr0,firewall=1,gw=172.16.77.2,hwaddr=1A:2F:CB:57:43:FC,ip=172.16.77.30/24,type=veth
ostype: ubuntu
rootfs: encrypted-zfs:subvol-102-disk-4,size=50G
snapstate: prepare
snaptime: 1589668377
swap: 0
root@prox:~#Both fuser and lsof are dead after calling them: no response, just dead, don't return to the console with any output/result.
Question-4: What does the 'snapshot' actually mean? Is it a ZFS-snapshot or some kind of vzdump thing?
- I don't see any 'vzdump' snapshot on the underlying ZFS at that subvolume.
- pct listsnapshot gives me some kind of 'snapshot' which doesn't seem to relate to any ZFS-snapshot; I can see it only in the 102.conf but nowhere else.
Question-5: Who exactly (which process) is deadlocked? In the ps -ex I see only a general 'vzdump -a', but not a specific blocking process. I could attach gdb to it and look inside, but it's easier to ask I guess ;-).
UPD: I had to reboot the host at the end, as I was not able to kill the create storage snapshot 'vzdump' result/command. The added exclude-path didn't help as well, as the snapshot being taken ignores this - I guess it relates later only to the backup's file-enumeration. Also today, the "create storage" log entry today is the very last killing-line: server hanging once again. But this time I've killed the task forked from the perl-vzdump, which made the backup-routine to continue, but the CT 102 keeps totally unresponsive: I can't kill it (lxc-stop 102), the command blocks and never returns.
I've tried to kill every CT-102 related task: this made the CT-102 offline in the GUI, but I was not able to start it anymore:
Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 102 lxc pre-start produced output: failed to remove directory '/sys/fs/cgroup/systemd/lxc/102/ns/user.slice/user-0.slice/session-21.scope': Device or resource busy
I have to reboot most probably once again. Very ugly, gives bad feeling about the whole thing.
Proxmox should definitely realize it's taking a snapshot forever with no return - and warn me, and allow me to abort, and has some obvious setting to exclude anything 'dangerous' from snapshoting. Like this, the backup runs, halts on taking snapshot and I see no way out =>
Question-6: Those users paying a subscription get the support/response faster? What about the weekends? Is there a difference when asking as a free and as a paying user? I need this only at home and only privately, but it drives me crazy anyway when waiting the whole weekend for a response, as one has time for this private stuff actually on the weekend
 !
!Thank you very much
			
				Last edited: 
				
		
	
										
										
											
	
										
									
								 
	 
	 
 
		 . It should not be possible to break the host by making an mount inside an unpriviledged CT, should it?
. It should not be possible to break the host by making an mount inside an unpriviledged CT, should it?
 , I'll refactore it to have the host mount everything itself, also it's far from ideal. I'll put another script and separate credential files onto the enc ZFS, so the initial script will unlock the volume and create the mounts from the secured area.
, I'll refactore it to have the host mount everything itself, also it's far from ideal. I'll put another script and separate credential files onto the enc ZFS, so the initial script will unlock the volume and create the mounts from the secured area.