[SOLVED] : HA option in Proxmox VE

ecirb

New Member
Jun 23, 2020
7
1
3
44
Hello everyone,
I'm a new user of Proxmox (V6.2) and also newbie in virtualisation. I'm trying to train.
I just tried to install a cluster with HA mode at home :
  • 2 Dell T110_II servers with Proxmox 6.2
  • One cluster with these nodes
  • One HA group with these nodes
  • One NFS shared space on a OMV NAS.
So after installation, I created a VM using the NFS shared space, attributed to the pve node.

First try was to migrate the VM while it was running. Migration was from pve node to pve2 node/NFS share state.
Since the VM is on a shared space, I would have think that the migration was only a start of the VM on the second node since it is already reachable by each cluster node (VM on a shared space), but the migration lasts almost 6 minutes, making me think that it is not so simple as I thought.
Would someone be able to explain me why it is so long ? (but it works)
Maybe I missed something in the configuration ?

Second test :
while the VM is running on PVE1 node still with NFS shared space, and being linked to the HA group (the group is composed of the two nodes) I unplug the network wire of this server in order to simulate a crash.
I would have think that the VM would be automatically migrated on the second node ... but that was not the case. I just lost my VM while the cable was unplugged.
Once again, would it be possible that I missed something on the configuration or is it normal ?

thanks a lot for your help
 
Thanks a lot for your answer.
That's what I was beginning to understand (I'm really sorry, but I promised to be a virtualization newbie). As far as I understood, with two nodes, I can't have the quorum if one node fails and the remaining node is "turned" read only so the migration can't be efficient ...
I'm currently trying (just for testing) to decrease the number of expected votes to 1. I tried the pvecm expected 1 command, but when I launch the pvecm status command just after, the number of expected votes remains equal to 2. For the moment I don't understand why.
Once I have tested that, I will try to add a "node" (probably not the correct word in this case) just for quorum in order to have up to 3 votes. That seems to solve my problem as far as I understood.
Well, to be honnest, right now I have to solve another problem I created trying a modification found on a tuto. I modified the /etc/pve/corosync.conf file. Unfortunately, the modification is not correct and the corosync service does not start anymore. And what's fun is that the file is now considered as read-only and even in root, I can't delete or modify it to revert ... I have to search :) :)

By the way, would you have any idea about the time needed to migrate a VM (almost 6 minutes in my case, while the VM is stored on a shared space) ? It seems a lot at first sight, isn't it ?
 
Thank you for your link !
It's more clear to me now regarding the corosync file, and I could repair it thanks to you.
I'm gonna try to find out the way to transmit you the correct logs for the migration.
 
So here is an example of logs (I copied syslog content).
What I did was to modify my HA group to indicate priorities : 1 for pve node et 2 for pve2 node.
The VM 100 was attributed to pve.
If I check at reduced logs in the web interfaces, I started the VM at 09:08:15.
At 09:08:18, the migration was launched (that was what I expected since the higher priority was on pve2 node) and it finished at 09:14:42.

Code:
Jun 24 09:08:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 24 09:08:01 pve systemd[1]: pvesr.service: Succeeded.
Jun 24 09:08:01 pve systemd[1]: Started Proxmox VE replication runner.
Jun 24 09:08:15 pve pvedaemon[1162]: <root@pam> starting task UPID:pve:000007C2:000051E9:5EF2FBDF:hastart:100:root@pam:
Jun 24 09:08:16 pve pvedaemon[1162]: <root@pam> end task UPID:pve:000007C2:000051E9:5EF2FBDF:hastart:100:root@pam: OK
Jun 24 09:08:18 pve pve-ha-lrm[1212]: successfully acquired lock 'ha_agent_pve_lock'
Jun 24 09:08:18 pve pve-ha-lrm[1212]: watchdog active
Jun 24 09:08:18 pve pve-ha-lrm[1212]: status change wait_for_agent_lock => active
Jun 24 09:08:18 pve pve-ha-lrm[1989]: VM isn't running. Doing offline migration instead.
Jun 24 09:08:18 pve pve-ha-lrm[1989]: <root@pam> starting task UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:
Jun 24 09:08:23 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
(...) 6 times same line
Jun 24 09:08:58 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
Jun 24 09:09:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 24 09:09:01 pve systemd[1]: pvesr.service: Succeeded.
Jun 24 09:09:01 pve systemd[1]: Started Proxmox VE replication runner.
Jun 24 09:09:03 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
(...) 10 times same line
Jun 24 09:09:58 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
Jun 24 09:10:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 24 09:10:01 pve systemd[1]: pvesr.service: Succeeded.
Jun 24 09:10:01 pve systemd[1]: Started Proxmox VE replication runner.
Jun 24 09:10:03 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
(...) 8 times same line
Jun 24 09:10:48 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
Jun 24 09:10:49 pve pmxcfs[1106]: [status] notice: received log
Jun 24 09:10:51 pve systemd[1]: Created slice User Slice of UID 0.
Jun 24 09:10:51 pve systemd[1]: Starting User Runtime Directory /run/user/0...
Jun 24 09:10:51 pve systemd[1]: Started User Runtime Directory /run/user/0.
Jun 24 09:10:51 pve systemd[1]: Starting User Manager for UID 0...
Jun 24 09:10:51 pve systemd[2439]: Listening on GnuPG cryptographic agent and passphrase cache.
Jun 24 09:10:51 pve systemd[2439]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Jun 24 09:10:51 pve systemd[2439]: Reached target Timers.
Jun 24 09:10:51 pve systemd[2439]: Listening on GnuPG network certificate management daemon.
Jun 24 09:10:51 pve systemd[2439]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Jun 24 09:10:51 pve systemd[2439]: Reached target Paths.
Jun 24 09:10:51 pve systemd[2439]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jun 24 09:10:51 pve systemd[2439]: Reached target Sockets.
Jun 24 09:10:51 pve systemd[2439]: Reached target Basic System.
Jun 24 09:10:51 pve systemd[2439]: Reached target Default.
Jun 24 09:10:51 pve systemd[2439]: Startup finished in 259ms.
Jun 24 09:10:51 pve systemd[1]: Started User Manager for UID 0.
Jun 24 09:10:51 pve systemd[1]: Started Session 1 of user root.
Jun 24 09:10:53 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
Jun 24 09:10:54 pve qm[2456]: VM 100 qmp command failed - VM 100 not running
Jun 24 09:10:54 pve systemd[1]: session-1.scope: Succeeded.
Jun 24 09:10:55 pve pmxcfs[1106]: [status] notice: received log
Jun 24 09:10:58 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
Jun 24 09:11:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 24 09:11:03 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
Jun 24 09:11:04 pve systemd[1]: Stopping User Manager for UID 0...
Jun 24 09:11:04 pve systemd[2439]: Stopped target Default.
Jun 24 09:11:04 pve systemd[2439]: Stopped target Basic System.
Jun 24 09:11:04 pve systemd[2439]: Stopped target Sockets.
Jun 24 09:11:04 pve systemd[2439]: gpg-agent-extra.socket: Succeeded.
Jun 24 09:11:04 pve systemd[2439]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Jun 24 09:11:04 pve systemd[2439]: dirmngr.socket: Succeeded.
Jun 24 09:11:04 pve systemd[2439]: Closed GnuPG network certificate management daemon.
Jun 24 09:11:04 pve systemd[2439]: gpg-agent-browser.socket: Succeeded.
Jun 24 09:11:04 pve systemd[2439]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jun 24 09:11:04 pve systemd[2439]: Stopped target Timers.
Jun 24 09:11:04 pve systemd[2439]: Stopped target Paths.
Jun 24 09:11:04 pve systemd[2439]: gpg-agent-ssh.socket: Succeeded.
Jun 24 09:11:04 pve systemd[2439]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Jun 24 09:11:04 pve systemd[2439]: gpg-agent.socket: Succeeded.
Jun 24 09:11:04 pve systemd[2439]: Closed GnuPG cryptographic agent and passphrase cache.
Jun 24 09:11:04 pve systemd[2439]: Reached target Shutdown.
Jun 24 09:11:04 pve systemd[2439]: systemd-exit.service: Succeeded.
Jun 24 09:11:04 pve systemd[2439]: Started Exit the Session.
Jun 24 09:11:04 pve systemd[2439]: Reached target Exit the Session.
Jun 24 09:11:04 pve systemd[1]: user@0.service: Succeeded.
Jun 24 09:11:04 pve systemd[1]: Stopped User Manager for UID 0.
Jun 24 09:11:04 pve systemd[1]: Stopping User Runtime Directory /run/user/0...
Jun 24 09:11:05 pve systemd[1]: run-user-0.mount: Succeeded.
Jun 24 09:11:05 pve systemd[1]: user-runtime-dir@0.service: Succeeded.
Jun 24 09:11:05 pve systemd[1]: Stopped User Runtime Directory /run/user/0.
Jun 24 09:11:05 pve systemd[1]: Removed slice User Slice of UID 0.
Jun 24 09:11:07 pve systemd[1]: pvesr.service: Succeeded.
Jun 24 09:11:07 pve systemd[1]: Started Proxmox VE replication runner.
Jun 24 09:11:08 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
(...) 9 times same line
Jun 24 09:11:58 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
Jun 24 09:12:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 24 09:12:03 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
Jun 24 09:12:06 pve systemd[1]: pvesr.service: Succeeded.
Jun 24 09:12:06 pve systemd[1]: Started Proxmox VE replication runner.
Jun 24 09:12:08 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
(...) 9 times same line
Jun 24 09:12:58 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
Jun 24 09:13:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 24 09:13:03 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
Jun 24 09:13:08 pve systemd[1]: pvesr.service: Succeeded.
Jun 24 09:13:08 pve systemd[1]: Started Proxmox VE replication runner.
Jun 24 09:13:08 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
(...) 9 times same line
Jun 24 09:13:58 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
Jun 24 09:14:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 24 09:14:02 pve systemd[1]: pvesr.service: Succeeded.
Jun 24 09:14:02 pve systemd[1]: Started Proxmox VE replication runner.
Jun 24 09:14:03 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
(...) 6 times same line
Jun 24 09:14:38 pve pve-ha-lrm[1989]: Task 'UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam:' still active, waiting
Jun 24 09:14:42 pve pve-ha-lrm[1989]: <root@pam> end task UPID:pve:000007C6:0000531F:5EF2FBE2:qmigrate:100:root@pam: OK
Jun 24 09:14:54 pve pmxcfs[1106]: [status] notice: received log
Jun 24 09:15:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 24 09:15:02 pve systemd[1]: pvesr.service: Succeeded.
Jun 24 09:15:02 pve systemd[1]: Started Proxmox VE replication runner.
Jun 24 09:15:10 pve pmxcfs[1106]: [status] notice: received log
Jun 24 09:16:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 24 09:16:01 pve systemd[1]: pvesr.service: Succeeded.
Jun 24 09:16:01 pve systemd[1]: Started Proxmox VE replication runner.
Jun 24 09:17:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 24 09:17:01 pve systemd[1]: pvesr.service: Succeeded.
Jun 24 09:17:01 pve systemd[1]: Started Proxmox VE replication runner.
 
Please post the config of that vm (/etc/pve/qemu-server/100.conf) as well as the log from the migration.
Just copy the tasks log which you can find in the web interface.
 
Here is the VM conf :
Code:
bootdisk: scsi0
cores: 1
ide2: none,media=cdrom
memory: 2048
name: xubuntu
net0: virtio=DA:64:43:D6:43:D0,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: nfsshare:100/vm-100-disk-0.qcow2,size=20G
scsihw: virtio-scsi-pci
smbios1: uuid=43ec0656-05b9-46db-89ad-ddfd22cfde2e
sockets: 1
unused0: local-lvm:vm-100-disk-0
vmgenid: 3e189437-c37d-440e-b016-9d1fc06db93a

and the tasks log :
Capture.PNG
 
I meant the task log of the migration task itself not the overview, if you click on the task it will display the log from the actual migration.
However looking at your config explains the "slow" migration there's a local disk as well which has to be migrated too.

If you had used the web ui to start the migration, this would have been displayed as a warning as well.
 
OK I understand, you're definitely right.
I made a mistake creating the VM, then I changed the disk from local to shared, but I didn't think I had to remove the local because it was tagged as "unused". That was my mistake !
I'm gonna try to change that and I'll tell you.
 
Well,
I created a new VM without the local storage at all. The result is good !
Outline migration is lasting a couple of seconds !
Online migration seems to be much more longer (several minutes), because, for what I understand reading the logs, the cache is migrated. I assume the cache is stored somewhere on the server of the node and has to be transfered on the other node during the migration ? That seems to make sense ...
Next step for me will be to test the HA functionality (3rd node if I can, quorum node separated on a debian system or reduction of the quorum value to 1 (just for testing purposes)).

Thanks a lot, I guess this topic can be closed thanks to you ?
 
  • Like
Reactions: tim
You can mark your thread as solved if you don't have further questions about the topic here.
I am pleased to see that you get familiar with our product, keep on learning and reading the docs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!