Problems in the HA

Melanxolik

Well-Known Member
Dec 18, 2013
86
0
46
Good afternoon.
Our company has a cluster based on proxmox, with purchased licenses for 6ti node, but we have always had problems with ha, periods cease to migrate virtual machines between nodes when trying to migrate from console get hands:



root@cluster-1-1:/var/log# qm migrate 185 cluster-1-3 --online
Executing HA migrate for VM 185 to node cluster-1-3
Trying to migrate pvevm:185 to cluster-1-3...Target node dead / nonexistent
command 'clusvcadm -M pvevm:185 -m cluster-1-3' failed: exit code 244
root@cluster-1-1:/var/log#

root@cluster-1-1:/var/log# clustat |more
Cluster Status for freecluster01 @ Thu Mar 13 11:34:30 2014
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
cluster-1-1 1 Online, Local, rgmanager
cluster-1-2 2 Online, rgmanager
cluster-1-3 3 Online
cluster-1-4 4 Online, rgmanager
cluster-1-5 5 Online, rgmanager
cluster-1-6 6 Online, rgmanager

root@cluster-1-2:~# clustat |more
Cluster Status for freecluster01 @ Thu Mar 13 11:34:45 2014
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
cluster-1-1 1 Online, rgmanager
cluster-1-2 2 Online, Local, rgmanager
cluster-1-3 3 Online, rgmanager
cluster-1-4 4 Online, rgmanager
cluster-1-5 5 Online, rgmanager
cluster-1-6 6 Online, rgmanager

Service Name Owner (Last) State

root@cluster-1-3:~# clustat |more
Cluster Status for freecluster01 @ Thu Mar 13 11:34:03 2014
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
cluster-1-1 1 Online, rgmanager
cluster-1-2 2 Online, rgmanager
cluster-1-3 3 Online, Local, rgmanager
cluster-1-4 4 Online, rgmanager
cluster-1-5 5 Online, rgmanager
cluster-1-6 6 Online, rgmanager

Service Name Owner (Last) State

Как здесь видно, почему-то нода 1-1 не видит rgmanager на ноде 1-3, хотя другие ноды все видят, подскажите как это можно диагностировать, мы бы очень хотели чтобы наши проблемы с HA закончились и кластер начал жить, так как планируем приобрести еще 6ть лицензий.
Unfortunately the log files we could not find error reports.




root@cluster-1-1:/var/log# group_tool ls
fence domain
member count 6
victim count 0
victim now 0
master nodeid 1
wait state none
members 1 2 3 4 5 6

dlm lockspaces
name rgmanager
id 0x5231f3eb
flags 0x00000000
change member 6 joined 1 remove 0 failed 0 seq 13,13
members 1 2 3 4 5 6

name storage02
id 0x72bc4bef
flags 0x00000008 fs_reg
change member 6 joined 1 remove 0 failed 0 seq 5,5
members 1 2 3 4 5 6

name storage01
id 0x5991182c
flags 0x00000008 fs_reg
change member 6 joined 1 remove 0 failed 0 seq 5,5
members 1 2 3 4 5 6

name clvmd
id 0x4104eefa
flags 0x00000000
change member 6 joined 1 remove 0 failed 0 seq 9,9
members 1 2 3 4 5 6

gfs mountgroups
name storage02
id 0xa14c9488
flags 0x00000008 mounted
change member 6 joined 1 remove 0 failed 0 seq 5,5
members 1 2 3 4 5 6

name storage01
id 0x8a61c74b
flags 0x00000008 mounted
change member 6 joined 1 remove 0 failed 0 seq 5,5
members 1 2 3 4 5 6

root@cluster-1-1:/var/log#
 
could be the node cluster-1-1 t have problems? for some reasons it seems that it's the only node not "seeing" rgmanager on cluster-1-3

can you test migration cluster-1-2 => cluster-1-3 just to be sure?

but I'm not an expert here...

Marco
 
Migration from node cluster-1-1 to cluster-1-3
root@cluster-1-1:/scripts# qm migrate 161 cluster-1-3 --online
Executing HA migrate for VM 161 to node cluster-1-3
Trying to migrate pvevm:161 to cluster-1-3...Target node dead / nonexistent
command 'clusvcadm -M pvevm:161 -m cluster-1-3' failed: exit code 244

migration from cluster-1-1 to cluster-1-2
root@cluster-1-1:/scripts# qm migrate 161 cluster-1-2 --online
Executing HA migrate for VM 161 to node cluster-1-2
Trying to migrate pvevm:161 to cluster-1-2...Success
root@cluster-1-1:/scripts#


migration from cluster-1-2 to cluster-1-3

root@cluster-1-2:/etc/cluster# qm migrate 161 cluster-1-3 --online
Executing HA migrate for VM 161 to node cluster-1-3
Trying to migrate pvevm:161 to cluster-1-3...Success
root@cluster-1-2:/etc/cluster#


migration from cluster-1-3 to cluster-1-1
root@cluster-1-3:~# qm migrate 161 cluster-1-1 --online
Executing HA migrate for VM 161 to node cluster-1-1
Trying to migrate pvevm:161 to cluster-1-1...Success
root@cluster-1-3:~#

1-1 to 1-3 error
1-1 to 1-2 good
1-2 to 1-3 good
1-3 to 1-1 good

I already becoming not funny, we always some problems of migration.
 
Hi, I know this post is a little bit older, but if anyone needs a solution for that: check that rg-manager is running on all instances well. For me that solved the problem.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!