vzump -mode stop did not restart VM on HA cluster

piccardi

New Member
Oct 20, 2012
12
0
1
Hi, I just finished installing and configure Proxmox 2.1 (by the way many thanks for your wonderful work) on a cluster of 4 Fujitsu blades. Cluster is UP, all 4 nodes seems OK:

Code:
root@lama9:~# pvecm nodes 
Node  Sts   Inc   Joined               Name 
   1   M     40   2012-10-20 11:19:27  lama1 
   2   M     40   2012-10-20 11:19:27  lama2 
   3   M     28   2012-10-20 11:19:27  lama9 
   4   M     40   2012-10-20 11:19:27  lama10

To test everything I created a first KVM virtual machine, put it under HA, and having it running inside one of the blades. Everything seemed to work fine, but I got problems when I did a backup. I'm doing them running vzdump to saving VM images on a dedicated /backup directory. Because its filesystem is made inside a shared logical volume extracted from an SAN, I'm not using the Web console for the backups, instead I'm going to make them using a cron script on each blade to mount the filesystem, do the dump, and unmount it, so it would accessible from next blade. I need this also because I have to access to the same filesystem from another machine outside the cluster to copy the VM images over tapes. Because I want to be sure that all service are stopped when I did the backup, and the services are always unavailable during night, I just did the backup this using vzdump -mode stop. Because now I have just a single VM I manually launched the script (using screen to detach the console). The image was generated fine, but just after the VM was stopped I saw it migrated to another blade, and when the image generation was completed it was not restarted. The following is the command output I saved:


Code:
INFO: starting new backup job: vzdump 101 --dumpdir /backup --mode stop --maxfiles 1 
INFO: Starting Backup of VM 101 (qemu) 
INFO: status = running 
INFO: backup mode: stop 
INFO: ionice priority: 7 
INFO: stopping vm 
INFO: creating archive '/backup/vzdump-qemu-101-2012_10_20-16_36_25.tar' 
INFO: adding '/backup/vzdump-qemu-101-2012_10_20-16_36_25.tmp/qemu-server.conf' to archive ('qemu-server.conf') 
INFO: adding '/dev/fast/vm-101-disk-1' to archive ('vm-disk-virtio0.raw') 
INFO: Total bytes written: 450979957248 (85.67 MiB/s) 
INFO: archive file size: 420.01GB 
INFO: no such VM ('101') command 'qm unlock 101' failed: exit code 2 
INFO: restarting vm 
INFO: command 'clusvcadm -e pvevm:101 -m lama9' failed: exit code 255 
INFO: Executing HA start for VM 101 
INFO: Member lama9 trying to enable pvevm:101...Failure command 'qm start 101 --skiplock' failed: exit code 255 
INFO: Finished Backup of VM 101 (01:24:30) 
INFO: Backup job finished successfully

The problem seems that the VM was migrated after it was stopped and so not restarted on the blade running vzdump. But I do not understand why, I avoided any operation after launching the script. When I tried to restart the VM on the blade were it migrated (with qm start 101) I got the same error. And then it also migrated to another blade. I managed to restart it (going to the other blade it was) after removing the lock with qm unlock.

I still do not understand why the VM is migrated when it is stopped by vzdump, but if this is the correct behaviour it seems quite strange to me, and it undermines my backup strategy. I suspect I did not use the right way to launch vzdump but I could not find any hint on its manpage.

So there is a canonical way to make a dump for an HA VM from a blade, and make sure that the VM will stay on it?

Regards
Simone
 
Last edited:
I still do not understand why the VM is migrated when it is stopped by vzdump, but if this is the correct behaviour it seems quite strange to me, and it undermines my backup strategy.

Please do not use vzdump 'stop' mode on HA managed VMs (I guess we should disable that mode for HA VMs). Use 'snapshot' or 'suspend' mode instead.
 
Please do not use vzdump 'stop' mode on HA managed VMs (I guess we should disable that mode for HA VMs). Use 'snapshot' or 'suspend' mode instead.

Ok, I'll avoid it, but needing to backup a virtual machine in a stopped state what I must do?

Its wise to stop it, made the backup (with a valid mode) and then restart it? That will be fine for me, until it do no trigger a migration.

Simone
 
if you want stop mode, you cannot enable HA for this VM/CT.

the idea of HA is to make sure that the VM/CT is always running.
 
if you want stop mode, you cannot enable HA for this VM/CT.

the idea of HA is to make sure that the VM/CT is always running.

My idea of HA is to have the VM/CT always running when I want that active. But I'd like to be able to stop a machine when I want it to be inactive, not having it restarted in another node by the HA system. That's too much an HA.

Does this means that to do this I have to remove the VM from HA, stop it, make my dumps, then restart it and put that back in the HA?

I can do this way, but which CLI command can I use to do this ?

Regards
Simone
 
My idea of HA is to have the VM/CT always running when I want that active. But I'd like to be able to stop a machine when I want it to be inactive, not having it restarted in another node by the HA system. That's too much an HA.

in this case, just press "stop" via GUI. if the VM/CT is HA enabled, it will do a gracefull shutdown and the service will be disabled for HA - no restart on another node.

Does this means that to do this I have to remove the VM from HA, stop it, make my dumps, then restart it and put that back in the HA?

I can do this way, but which CLI command can I use to do this ?

Regards
Simone

in the case of backups, I do not understand why you want to stop. use snapshot or suspend or a third party online backup tool.

the concept of HA is never stopping VM/CT, so if you want this then you probably don´t want what we call HA.
 
in this case, just press "stop" via GUI. if the VM/CT is HA enabled, it will do a gracefull shutdown and the service will be disabled for HA - no restart on another node.

Hi Tom and piccardi

Piccardi, the best way by CLI is:

To pause the "HA" for a VM:
clusvcadm -Z pvevm:[ID VM Number]
- example:
clusvcadm -Z pvevm:150

To resume "HA" for a VM:
clusvcadm -U pvevm:[ID VM Number]
- example:
clusvcadm -U pvevm:150

To view VMs status including the status of "HA" - with pause or not:
clustat

But I suggest doing simulations of breakdowns of nodes, ie you have to think what if Node decomposes while doing the backup? - I believe that the command for resume "HA" is sufficient.

Update:
Tom:
I think that the best way for PVE into the GUI is having this option for the user, then PVE run these commands as frontend (and to me would also like), then completed successfully or not the backup "vzdump" resumes "HA" for each VM backuped (will be better that by CLI executed in only one Host that can decompose and after will be needed human intervention for activate "HA").

Best regards
Cesar
 
Last edited:
I want to stop (in the night) because I have a requirement to have a clean and consistent backup, all database files synced, so I can have a clean restart from a precise time point, without needing any kind of recovery. For what I understand about vzump in snapshot mode it takes an image of a running VM filesystem, so the filesystem is mounted and all services are running. That very far from a coherent status. I want HA from 6:00 am to 22:00 pm, in that window time services must be always up, also on hardware failure. During the night services can be down for maintenance. Simone
 
Hi Tom and piccardi Piccardi, the best way by CLI is: To pause the "HA" for a VM: clusvcadm -Z pvevm:[ID VM Number] - example: clusvcadm -Z pvevm:150 To resume "HA" for a VM: clusvcadm -U pvevm:[ID VM Number] - example: clusvcadm -U pvevm:150 To view VMs status including the status of "HA" - with pause or not: clustat But I suggest doing simulations of breakdowns of nodes, ie you have to think what if Node decomposes while doing the backup? - I believe that the command for resume "HA" is sufficient.
Thank you, I'll try them as soon I could get back my cluster working... Simone
 
Thank you, I'll try them as soon I could get back my cluster working... Simone

Hi piccardi

If you will do the test of breakdown of Node, while the HA of one VM is disabled and while a backup is in process, please let me know the results about of resume the HA for this VM.

Best regards
Cesar
 
in this case, just press "stop" via GUI. if the VM/CT is HA enabled, it will do a gracefull shutdown and the service will be disabled for HA - no restart on another node.

in the case of backups, I do not understand why you want to stop. use snapshot or suspend or a third party online backup tool.

the concept of HA is never stopping VM/CT, so if you want this then you probably don´t want what we call HA.

Hi Tom
please see this:

To pause the "HA" for a VM using CLI:
clusvcadm -Z pvevm:[ID VM Number]
- example:
clusvcadm -Z pvevm:150

To resume "HA" for a VM using CLI:
clusvcadm -U pvevm:[ID VM Number]
- example:
clusvcadm -U pvevm:150

To view VMs status including the status of "HA" - with pause or not using CLI:
clustat

But I suggest doing simulations of breakdowns of nodes, ie you have to think what if Node decomposes while doing the backup? - I believe that the command for resume "HA" is sufficient.

Tom, I want to do a suggestion and know your opinion about this:

I think that the best way for PVE into the GUI is having this option (pause/resume "HA") for the user, then PVE run these commands as frontend (pause/resume "HA"). And to me would also like. Then completed successfully or not the backup PVE resumes HA for each VM backuped (will be better that by CLI executed in only one Host that can decompose with a backup in process and after will be needed human intervention for activate "HA").

Why this manner will be very useful:
1- For get a backup of the a VM in coherent status (stopped).
2- With this option I will sure that don't will have unnecessary parts of files into my filesystem.
3- And then when I want to restore this VM, I will not wait the auto-check of filesystem (That if I do snapshot backup with a disk that is very large, then when I need to restore it, I will be waiting long time for restore it and wait long time for a complete auto check of filesystem)
4- Usually when you have "HA" and you need to resort to a backup, you have to recover many backups, for after decide which will be useful, and it takes a lot of time out, and your fine attention. And this will be worse if you also have to wait to finish the auto filesystem check.

Waiting for your reply I say goodbye

Best regards
Cesar
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!