Snapshot freezes Debian guest

mike87

New Member
Jan 20, 2017
6
0
1
38
Kaiserslautern/Germany
Normaly I'm using pve-autosnap to generate snapshots on a regular base.
On one cluster I got a weird problem: Whenever I snapshot my Debian system it instantly freezes. I can only stop this machine. There are absolutly no logs regarding this incident. The cpu load just raises.
First I thought it was caused by the harddisk encryption in the guest machine. So I made a freshinstall without encryption but exact the same error happens.
Did anyone experienced a similar problem and got a solution? I support several systems and never saw this behaviour.

Host System:
CPUs 12 x Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz (1 Socket)
Kernel Version Linux 4.4.35-2-pve #1 SMP Mon Jan 9 10:21:44 CET 2017
PVE Manager Version pve-manager/4.4-5/c43015a5
Storage: local raid-z1 6x 4TB HDD

Guest:
Debian 8
Linux hostname 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux
 
Hi,

Do you use the qemu-guest-agent?
If yes check the guest fs is still frozen.

Code:
qm agent  <VMID> fsfreeze-status
 
So I installed the qemu-guest-agent, activated in proxmox and run the script. No freeze.
May I was just lucky or it solved the problem. I'll keep an eye on it.

But shouldn't the system run normaly after a snapshot? Without the agent it should just risk a iconsistent file system or am I wrong?
 
Tonight it happend again.
This morning I found this VM at ~85% CPU usage (4 CPUs) and stuck. The local console returned from black after several returns and showed me:
BUG: soft lockup - CPU#1 stuck for 4294876409s! [ksoftirqd/1:13]

On the host:
qm agent 10005 fsfreeze-status
VM 10005 qmp command 'guest-fsfreeze-status' failed - got timeout
 
Last last two nights it happend again. Just the same result. Guest is all black, high CPU load, timeout on guest-fsfreeze-status and system dead.
So only stop and start brings me back in business.
 
I got no solution for it but a workaround. I added qm shutdown and qm start for this specific host around the qm snapshot.

No real solution but at least does the task...