Proxmox 6.2 cluster HA Vms don't migrate if node fails

pnc1001 · Jul 20, 2020

Hello,

I have installed and configured a three node cluster with ceph file system.

All seems working fine, We can migrate vm from one node to another nodes.

Now, I have done another test, we've roughly unplugged but power cables from server's cpu. The vms are not migrating to the another node.

When I plug ac power cords again, after a few seconds the vms starts on the another or same node. I have also configured watchdog from software level. When I stop the service, VM migrated to another node but restarted.

Can you please help me how to migrate VM without rebooted if node down in cluster?

Regards,

Paresh Chauhan

aaron · Jul 20, 2020

How long did you wait? The HA stacks will wait for about 3 minutes before starting the guest on another node.

This is because should a running node lose contact to the quorum part of the cluster (corosync), it will wait for about 2 minutes, hoping to be able to reconnect, before it will fence itself (hard reset) to make sure that none of the guests on it will be running anymore.

pnc1001 · Jul 20, 2020

Hi,

I have waited around 3 mins and VM moving from one node to another node but VM is restarted when VM moved.

Can you please suggest me which services require for it ?

We are using Dell PowerEdge C6220 server for this.

Regards,
Paresh Chauhan

aaron · Jul 20, 2020

Can you show the output of the following commands?

Code:

ha-manager status
pvecm status

pnc1001 · Jul 20, 2020

Hi,

I am sharing output above command.

root@pve:~# ha-manager status
quorum OK
master pve (active, Mon Jul 20 18:45:34 2020)
lrm pve (active, Mon Jul 20 18:45:38 2020)
lrm pve1 (active, Mon Jul 20 18:45:36 2020)
lrm pve2 (idle, Mon Jul 20 18:45:36 2020)
service ct:106 (pve1, started)
service vm:100 (pve1, started)
service vm:101 (pve1, started)
service vm:102 (pve1, started)
service vm:103 (pve1, started)
service vm:104 (pve1, started)
service vm:105 (pve1, started)
root@pve:~# pvecm status
Cluster information
-------------------
Name: proxmoxcluster
Config Version: 3
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Mon Jul 20 18:45:46 2020
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000003
Ring ID: 1.34c
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 172.16.246.97
0x00000002 1 172.16.246.93
0x00000003 1 172.16.246.60 (local)
root@pve:~#

Regards,
Paresh Chauhan

pnc1001 · Jul 21, 2020

Hi Aaron,

Is there any update on it?

Regards,
Paresh Chauhan

aaron · Jul 21, 2020

I did miss the following yesterday:

pnc1001 said:
I have waited around 3 mins and VM moving from one node to another node but VM is restarted when VM moved.

So the VM is started after 3 minutes on one of the remaining nodes? Then everything works as expected. The VM will always be started anew.

There is no way to have a VM running on two nodes in parallel so that the fallback node will take over without a glitch should the main node fail.

pnc1001 · Jul 21, 2020

Hi Aaron,

Okay thanks for update but Can we move without reboot if node failed or node hardware failed ?

If there is any specific requirement or specific hardware for it so I can test on it.

Regards,
Paresh Chauhan

aaron · Jul 21, 2020

pnc1001 said:
Okay thanks for update but Can we move without reboot if node failed or node hardware failed ?

No. If the node, on which a VM is running on, dies (for whatever reason), that VM is dead as well and needs to be started on another node.

ermanishchawla · Jul 21, 2020

Are your vm running on ceph ??
If yes what is pool size

pnc1001 · Jul 21, 2020

Hi,

Yes, vm is running on ceph drive and pool size around 3TB.

Regards,
Paresh Chauhan

ermanishchawla · Jul 21, 2020

Not capacity
Pool size.
Post output of the following command
ceph osd dump
ceph df
pveceph lspools

budy · Jul 21, 2020

This doesn't matter anyway, at least regarding the "issue" the op has. Any kind of shared storage is required to have HA in the first place. The guest runs on a pve node and if that nodes crashes, then so do all guests running on it, regardless of which storage the guest is stored on. Just remember that high availabiliy != always on. You simply can't get that with a traditional PC/Server setup, not even talking such complex systems as a guest running a OS kernel and any arbitrary application running on it.

Better mentally translate "high availability" with "least downtime"…

pnc1001 · Jul 22, 2020

Hi,

As per your request, I am sending above command output.

root@pve:~# ceph osd dump
epoch 887
fsid 21d92554-9a23-49a0-871d-69311ac11215
created 2020-07-11 12:18:03.851970
modified 2020-07-21 12:35:41.296222
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 7
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release nautilus
pool 1 'ceph' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 17 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
removed_snaps [1~3]
max_osd 3
osd.0 up in weight 1 up_from 774 up_thru 886 down_at 772 last_clean_interval [763,771) [v2:172.16.246.60:6800/2437,v1:172.16.246.60:6801/2437] [v2:172.16.246.60:6802/2437,v1:172.16.246.60:6803/2437] exists,up 14773685-e244-4481-9df5-e8b7eb402017
osd.1 up in weight 1 up_from 780 up_thru 882 down_at 778 last_clean_interval [705,777) [v2:172.16.246.97:6800/3328,v1:172.16.246.97:6801/3328] [v2:172.16.246.97:6802/3328,v1:172.16.246.97:6803/3328] exists,up 7f6e8185-4144-4779-8a8f-d5bdbb410dfb
osd.2 up in weight 1 up_from 882 up_thru 882 down_at 879 last_clean_interval [876,878) [v2:172.16.246.93:6800/3438,v1:172.16.246.93:6801/3438] [v2:172.16.246.93:6802/3438,v1:172.16.246.93:6803/3438] exists,up 5e0e0728-bb89-44d0-8b06-a0a793057f6b
pg_temp 1.59 [0,2,1]

root@pve:~# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 1.8 TiB 1.8 TiB 45 GiB 46 GiB 2.46
ssd 689 GiB 598 GiB 89 GiB 91 GiB 13.24
TOTAL 2.5 TiB 2.4 TiB 134 GiB 137 GiB 5.37

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
ceph 1 45 GiB 12.21k 134 GiB 6.58 635 GiB

root@pve:~# pveceph lspools
┌──────┬──────┬──────────┬────────┬───────────────────┬─────────────────┬────────────────────┬──────────────┐
│ Name │ Size │ Min Size │ PG Num │ PG Autoscale Mode │ Crush Rule Name │ %-Used │ Used │
╞══════╪══════╪══════════╪════════╪═══════════════════╪═════════════════╪════════════════════╪══════════════╡
│ ceph │ 3 │ 2 │ 128 │ warn │ replicated_rule │ 0.0657794624567032 │ 143962703009 │
└──────┴──────┴──────────┴────────┴───────────────────┴─────────────────┴────────────────────┴──────────────┘
root@pve:~#

Regards,
Paresh Chauhan

ermanishchawla · Jul 22, 2020

Paresh,
As per that you have only 3 osd in the ceph pool and if one server goes down 33% placement groups are degraded causing the issue
Once u make the server down, ha migrate will try to migrate the vm as u still have quorum but ceph is degraded or may be have stale and inactive pgs due to which VMS may not be moving
To reproduce shutdown the server and check ceph -w for checking if it is degraded or inactive , VMS wil migrate if degraded but not when inactive , you may have to wait 10 mints for ceph to recover on its own

Search

Search

Proxmox 6.2 cluster HA Vms don't migrate if node fails

pnc1001

New Member

aaron

Proxmox Staff Member

pnc1001

New Member

aaron

Proxmox Staff Member

pnc1001

New Member

pnc1001

New Member

aaron

Proxmox Staff Member

pnc1001

New Member

aaron

Proxmox Staff Member

ermanishchawla

Well-Known Member

pnc1001

New Member

ermanishchawla

Well-Known Member

budy

Well-Known Member

pnc1001

New Member

ermanishchawla

Well-Known Member

We value your privacy