[SOLVED] Bug or feature? During backup with stop, VM wakes up?

cosmos

Renowned Member
Apr 1, 2013
125
4
83
Fresh pve 3.1 installation here:
Code:
~# pveversion -v
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-21 (running version: 3.1-21/93bf03d4)
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-8
libpve-access-control: 3.0-7
libpve-storage-perl: 3.0-17
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-4
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1

A fresh install of a single Windows 7 32-bit VM:
Code:
# qm config  100
balloon: 1024
boot: cd
bootdisk: virtio0
cores: 2
ide0: none,media=cdrom
memory: 1536
name: testmachine
net0: virtio=XX:XX:0C:1F:XX:XX,bridge=vmbr1
onboot: 1
ostype: win7
sockets: 1
virtio0: images:vm-100-disk-1,cache=writethrough,size=60G

I have configured a single task to backup this VM, with a schedule. For that purpose, I decided to look at how it behaves. Strange things happen. Backup is configured to run as a "stop" job. It starts correctly at 9:30 and finishes at approximately 9:48, according to the logs:
Code:
Feb 28 09:30:03 INFO: Starting Backup of VM 100 (qemu)
Feb 28 09:30:03 INFO: status = running
Feb 28 09:30:04 INFO: update VM 100: -lock backup
Feb 28 09:30:04 INFO: backup mode: stop
Feb 28 09:30:04 INFO: ionice priority: 7
Feb 28 09:30:04 INFO: stopping vm
Feb 28 09:30:20 INFO: creating archive '/mnt/backups/dump/vzdump-qemu-100-2014_02_28-09_30_03.vma.lzo'
Feb 28 09:30:20 INFO: starting kvm to execute backup task
Feb 28 09:30:21 INFO: started backup task '48a528bd-116d-4a9d-8728-b4fa2eaa510c'
Feb 28 09:30:21 INFO: resume VM
Feb 28 09:30:24 INFO: status: 0% (363266048/64424509440), sparse 0% (34025472), duration 3, 121/109 MB/s
Feb 28 09:30:33 INFO: status: 1% (693174272/64424509440), sparse 0% (40873984), duration 12, 36/35 MB/s
Feb 28 09:30:44 INFO: status: 2% (1303052288/64424509440), sparse 0% (48566272), duration 23, 55/54 MB/s
Feb 28 09:31:06 INFO: status: 3% (1939668992/64424509440), sparse 0% (53215232), duration 45, 28/28 MB/s
Feb 28 09:31:36 INFO: status: 4% (2584412160/64424509440), sparse 0% (58703872), duration 75, 21/21 MB/s
Feb 28 09:31:57 INFO: status: 5% (3238264832/64424509440), sparse 0% (64454656), duration 96, 31/30 MB/s
Feb 28 09:32:26 INFO: status: 6% (3867344896/64424509440), sparse 0% (110977024), duration 125, 21/20 MB/s
Feb 28 09:33:02 INFO: status: 7% (4512350208/64424509440), sparse 0% (120266752), duration 161, 17/17 MB/s
[...]
Feb 28 09:48:42 INFO: status: 98% (63241584640/64424509440), sparse 61% (39877832704), duration 1101, 96/0 MB/s
Feb 28 09:48:48 INFO: status: 99% (63863128064/64424509440), sparse 62% (40499376128), duration 1107, 103/0 MB/s
Feb 28 09:48:53 INFO: status: 100% (64424509440/64424509440), sparse 63% (41058656256), duration 1112, 112/0 MB/s
Feb 28 09:48:53 INFO: transferred 64424 MB in 1112 seconds (57 MB/s)
Feb 28 09:48:53 INFO: archive file size: 14.03GB
Feb 28 09:48:56 INFO: vm is online again after 1132 seconds
Feb 28 09:48:56 INFO: Finished Backup of VM 100 (00:18:53)

And also from /var/log/syslog:
Code:
Feb 28 09:30:01 pve /USR/SBIN/CRON[384894]: (root) CMD (vzdump --quiet 1 --mode stop --mailto receiver@server --all 1 --compress lzo --storage backups)
Feb 28 09:30:03 pve vzdump[384895]: <root@pam> starting task UPID:pve:0005DF80:051416F2:53103AFB:vzdump::root@pam:
Feb 28 09:30:03 pve vzdump[384896]: INFO: starting new backup job: vzdump --quiet 1 --mailto receiver@server --mode stop --compress lzo --storage backups --all 1
Feb 28 09:30:03 pve vzdump[384896]: INFO: Starting Backup of VM 100 (qemu)
Feb 28 09:30:04 pve qm[384902]: <root@pam> update VM 100: -lock backup
Feb 28 09:30:05 pve qm[384910]: <root@pam> starting task UPID:pve:0005DF8F:051417CB:53103AFD:qmshutdown:100:root@pam:
Feb 28 09:30:05 pve qm[384911]: shutdown VM 100: UPID:pve:0005DF8F:051417CB:53103AFD:qmshutdown:100:root@pam:
Feb 28 09:30:19 pve kernel: vmbr1: port 2(tap100i0) entering disabled state
Feb 28 09:30:19 pve kernel: vmbr1: port 2(tap100i0) entering disabled state
Feb 28 09:30:20 pve qm[384910]: <root@pam> end task UPID:pve:0005DF8F:051417CB:53103AFD:qmshutdown:100:root@pam: OK
[...]
Feb 28 09:48:56 pve vzdump[384896]: INFO: Finished Backup of VM 100 (00:18:53)
Feb 28 09:48:56 pve vzdump[384896]: INFO: Backup job finished successfully
[...]
Feb 28 09:48:57 pve vzdump[384895]: <root@pam> end task UPID:pve:0005DF80:051416F2:53103AFB:vzdump::root@pam: OK
Feb 28 09:49:45 pve rrdcached[2157]: flushing old values
Feb 28 09:49:45 pve rrdcached[2157]: rotating journals
Feb 28 09:49:45 pve rrdcached[2157]: started new journal /var/lib/rrdcached/journal/rrd.journal.1393573785.397599
Feb 28 09:49:45 pve rrdcached[2157]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1393566585.382441
Feb 28 09:51:37 pve pvedaemon[384413]: <root@pam> successful auth for user 'root@pam'

One note in this last snippet, is that I do not see where the VM is instructed to power up! I'd expect that would happen immediately after 09:48:57, when the backup job has completed.

Question 1: why are there no vm start commands in the log?

Question 2: does "vzdump --quiet 1 --mailto receiver@server --mode stop" stop the machine (ie plug the power off -> unclean shutdown) or does it shut the VM down cleanly (via ACPI shutdown)? Reading the vzdump man and wiki pages it was not clear to me what actually happens.

Anyways the actual issue I am encountering is the following. After the backup took place, I logged into the Windows 7 machine examining the event logs to see how the system behaved. This is what I saw in the system log (note the VM is also NTP sync'ed):
Code:
Reverse chronological order:

28/2/2014 9:30:36 am    Kernel-General    12    
28/2/2014 9:30:16 am    Kernel-General    13    
28/2/2014 9:30:14 am    Service Control Manager    7036    
28/2/2014 9:30:12 am    Kernel-Power    109    (103)
28/2/2014 9:30:11 am    Service Control Manager    7036    
28/2/2014 9:30:10 am    Service Control Manager    7036    
28/2/2014 9:30:10 am    Service Control Manager    7036    
28/2/2014 9:30:10 am    Service Control Manager    7036    
28/2/2014 9:30:10 am    Service Control Manager    7036    
28/2/2014 9:30:08 am    Service Control Manager    7036    
28/2/2014 9:30:08 am    Service Control Manager    7036    
28/2/2014 9:30:08 am    Service Control Manager    7036    
28/2/2014 9:30:08 am    Service Control Manager    7036    
28/2/2014 9:30:08 am    Service Control Manager    7036    
28/2/2014 9:30:07 am    Service Control Manager    7036    
28/2/2014 9:30:07 am    Service Control Manager    7036    
28/2/2014 9:30:07 am    Service Control Manager    7036    
28/2/2014 9:30:07 am    Service Control Manager    7036    
28/2/2014 9:30:07 am    Service Control Manager    7036    
28/2/2014 9:30:07 am    Service Control Manager    7036    
28/2/2014 9:30:07 am    UserPnp    20010    (7010)
28/2/2014 9:30:07 am    Service Control Manager    7036    
28/2/2014 9:30:07 am    Service Control Manager    7036    
28/2/2014 9:30:07 am    Service Control Manager    7036    
28/2/2014 9:30:07 am    Service Control Manager    7036    
28/2/2014 9:30:07 am    Service Control Manager    7036    
28/2/2014 9:30:07 am    Service Control Manager    7036

I could not c/p the descriptions, but examining the logs, at 9:30:07 a shutdown process seems to start taking place. At 9:30:16 am, the Kernel-General signifies that the operating system was shutdown at ‎2014‎-‎02‎-‎28T07:30:16.734375000Z (+2 hours), that is at 9:30:16.734375000 local time. So far, all is well and correlate with /var/log/system.

And here is the funny thing. I would expect that the system would be powered after the end of the backup, that is about 18 minutes later. This does not happen! Instead, only 20 seconds later the system starts up again, while the backup is in progress (Kernel-general event 13 at 9:30:36 am)!!!

Question (3): What is happening here, is this normal behaviour for a vzdump -stop operation? Or is it a bug?
 
Last edited:
In http://pve.proxmox.com/wiki/Backup_and_Restore the following is mentioned:
stop (KVM): Shutdown the VM to get a consistent state, Then start KVM live backup and restart the VM (short downtime).

Does this mean that after bringing the system down, it is booted almost immediately after (and, as a result, it is alive when the backup is underway)? That would explain what I was experiencing above, that is the whole behaviour is a expected (not a bug).
 
In http://pve.proxmox.com/wiki/Backup_and_Restore the following is mentioned:


Does this mean that after bringing the system down, it is booted almost immediately after (and, as a result, it is alive when the backup is underway)? That would explain what I was experiencing above, that is the whole behaviour is a expected (not a bug).

Yes, vzdump stop mode (KVM) is doing a clean shutdown, starts the backup and boots the VM immediately. So the VM is running and in the background the backup is created, in a clean state.
 
Quite impressive, although I must say that I do not understand how that is possible, does pve take a snapshot of some sort immediately after the VM is powered down and before powering it up? Furthermore, in the stop case, perhaps /var/log/syslog must be corrected:
Code:
Feb 28 09:48:53 INFO: transferred 64424 MB in 1112 seconds (57 MB/s)
Feb 28 09:48:53 INFO: archive file size: 14.03GB
Feb 28 09:48:56 INFO: vm is online again after 1132 seconds
The last line is not correct. The VM actually was offline for less than a minute...


Furthermore, which would you recommend for Windows 7 backup, stop or snapshot? Absolute priority here is to have a reliable backup / maximize the probability to have a fully working and consistent VM when restoring from scratch.
 
Quite impressive, although I must say that I do not understand how that is possible, does pve take a snapshot of some sort immediately after the VM is powered down and before powering it up?

the backup is included in the KVM code and yes, very cool part of Proxmox VE - we introduced this in 2.3.
http://forum.proxmox.com/threads/13019-Proxmox-VE-2-3-released!

Furthermore, in the stop case, perhaps /var/log/syslog must be corrected:
Code:
Feb 28 09:48:53 INFO: transferred 64424 MB in 1112 seconds (57 MB/s)
Feb 28 09:48:53 INFO: archive file size: 14.03GB
Feb 28 09:48:56 INFO: vm is online again after 1132 seconds
The last line is not correct. The VM actually was offline for less than a minute...

thanks for reporting, looks like a (minor) bug.


Furthermore, which would you recommend for Windows 7 backup, stop or snapshot? Absolute priority here is to have a reliable backup / maximize the probability to have a fully working and consistent VM when restoring from scratch.

stop mode (if clean shutdown works for you and you can accept the short downtime/reboot)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!