I have a problem with snapshots. I run "vzdump 100 --mode snapshot", but it stops the virtual machine for the complete duration of the backup. It also does not seem to start again after it completes the dump, I had to reset the VM to get it to work again.
The backup device is the standard /var/lib/vz, a lvm device.
At this point, after about 30 min, the vma file size still shows zero, and the VM hangs hard. Can't ssh to it, can do anything.
$ ssh 3605
ssh: connect to host 3605 port 22: Network is unreachable
I then run vzdump -stop, and it kinda triggers something to continue
The VM is still completely unresponsive, so I run teh vzdump -stop again
Now the snapshot appears to have been stopped but the VM is still completely unresponsive. I try everything but the only way I can get it back is by issuing a qm reset.
All other VM's on the machine snapshots just fine, but this one is bigger, 200G total, 120G used. /var/lib/vz is 200G. I don't know why that should be the problem though as it doesn't even start writing anything before it freezes.
What is going on here? I need to be able to snapshot this VM. Any logfiles that could give a hint?
The backup device is the standard /var/lib/vz, a lvm device.
Code:
# vzdump 105 --mode snapshot
INFO: starting new backup job: vzdump 105 --mode snapshot
INFO: Starting Backup of VM 105 (qemu)
INFO: Backup started at 2019-11-02 09:13:22
INFO: status = running
INFO: update VM 105: -lock backup
INFO: VM Name: 3605-EU
INFO: include disk 'scsi0' 'local-lvm:vm-105-disk-0' 200G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/var/lib/vz/dump/vzdump-qemu-105-2019_11_02-09_13_22.vma'
At this point, after about 30 min, the vma file size still shows zero, and the VM hangs hard. Can't ssh to it, can do anything.
$ ssh 3605
ssh: connect to host 3605 port 22: Network is unreachable
I then run vzdump -stop, and it kinda triggers something to continue
Code:
ERROR: interrupted by signal
ERROR: VM 105 qmp command 'guest-fsfreeze-thaw' failed - unable to connect to VM 105 qga socket - timeout after 101 retries
INFO: started backup task '07c365f3-5d7b-4367-b5d6-b6ea07fb16d5'
INFO: status: 1% (3611295744/214748364800), sparse 0% (1058140160), duration 3, read/write 1203/851 MB/s
INFO: status: 2% (4366991360/214748364800), sparse 0% (1061658624), duration 6, read/write 251/250 MB/s
INFO: status: 3% (6603407360/214748364800), sparse 0% (1084887040), duration 16, read/write 223/221 MB/s
INFO: status: 4% (8758231040/214748364800), sparse 0% (1084887040), duration 26, read/write 215/215 MB/s
INFO: status: 5% (10879369216/214748364800), sparse 0% (1084903424), duration 35, read/write 235/235 MB/s
INFO: status: 6% (12886605824/214748364800), sparse 0% (1096822784), duration 45, read/write 200/199 MB/s
INFO: status: 7% (15205138432/214748364800), sparse 0% (1096822784), duration 56, read/write 210/210 MB/s
INFO: status: 8% (17340039168/214748364800), sparse 0% (1097998336), duration 65, read/write 237/237 MB/s
INFO: status: 9% (19419561984/214748364800), sparse 0% (1191800832), duration 74, read/write 231/220 MB/s
INFO: status: 10% (21576351744/214748364800), sparse 0% (1201209344), duration 83, read/write 239/238 MB/s
INFO: status: 11% (23652925440/214748364800), sparse 0% (1219858432), duration 92, read/write 230/228 MB/s
The VM is still completely unresponsive, so I run teh vzdump -stop again
Code:
ERROR: interrupted by signal
INFO: aborting backup job
ERROR: Backup of VM 105 failed - interrupted by signal
INFO: Failed at 2019-11-02 09:47:49
ERROR: Backup job failed - interrupted by signal
interrupted by signal
root@server36:~#
Now the snapshot appears to have been stopped but the VM is still completely unresponsive. I try everything but the only way I can get it back is by issuing a qm reset.
All other VM's on the machine snapshots just fine, but this one is bigger, 200G total, 120G used. /var/lib/vz is 200G. I don't know why that should be the problem though as it doesn't even start writing anything before it freezes.
What is going on here? I need to be able to snapshot this VM. Any logfiles that could give a hint?