High Availability - Migration problems - Proxmox 3

superwemba

New Member
Oct 14, 2013
16
0
1
Hello everybody,

[I'm french, sorry for my english]

Today I have a problem with my Proxmox servers.

Explanation:

My cluster is composed of 3 Proxmox servers. They are running the same pve version [pve-manager/3.3-5/bfebec03 (running kernel: 2.6.32-32-pve)] and they have exactly the same hardware.

My VMs are stored into iSCSI disk which is provided by a server.

Fence devices [IPMI] are completely reliable.

So, my High Availability cluster works perfectly when I simulate a node failure [for exemple : I shutdown one proxmox] the VMs migrate on the other node according to the failover configuration written in the cluster.conf file.

It works well, no problem here.

So now my problem is when the proxmox [which had a failure with the suthdown] come back, the VMs want migrate on this node but, most of the time, I get migration problems and I don't no why because I'm able to manually migrate the VMs just after.

Time to Time some VMs migrate when the node come back and others can't migrate... it depend.

What is the problem ? Did you have already encounter this problem ?

Regards,

superwemba.
 
Hello francisco,

For sure i can do that for you:

I have tested beetween two servers (proxmox-2 and proxmox-3)

I have just suthdown proxmox-3.

So this is the corosync.log of proxmox-2:
Nov 28 09:04:37 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:04:37 corosync [CLM ] New Configuration:
Nov 28 09:04:37 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:04:37 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:04:37 corosync [CLM ] Members Left:
Nov 28 09:04:37 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 09:04:37 corosync [CLM ] Members Joined:
Nov 28 09:04:37 corosync [QUORUM] Members[2]: 1 2
Nov 28 09:04:37 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:04:37 corosync [CLM ] New Configuration:
Nov 28 09:04:37 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:04:37 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:04:37 corosync [CLM ] Members Left:
Nov 28 09:04:37 corosync [CLM ] Members Joined:
Nov 28 09:04:37 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 28 09:04:37 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.150.1) ; members(old:3 left:1)
Nov 28 09:04:37 corosync [MAIN ] Completed service synchronization, ready to provide service.
Nov 28 09:06:41 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:06:41 corosync [CLM ] New Configuration:
Nov 28 09:06:41 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:06:41 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:06:41 corosync [CLM ] Members Left:
Nov 28 09:06:41 corosync [CLM ] Members Joined:
Nov 28 09:06:41 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:06:41 corosync [CLM ] New Configuration:
Nov 28 09:06:41 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:06:41 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:06:41 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 09:06:41 corosync [CLM ] Members Left:
Nov 28 09:06:41 corosync [CLM ] Members Joined:
Nov 28 09:06:41 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 09:06:41 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 28 09:06:41 corosync [QUORUM] Members[3]: 1 2 3
Nov 28 09:06:41 corosync [QUORUM] Members[3]: 1 2 3
Nov 28 09:06:41 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.150.1) ; members(old:2 left:0)
Nov 28 09:06:41 corosync [MAIN ] Completed service synchronization, ready to provide service.
Nov 28 09:18:43 corosync [QUORUM] Members[3]: 1 2 3
Nov 28 09:20:45 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:20:45 corosync [CLM ] New Configuration:
Nov 28 09:20:45 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:20:45 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:20:45 corosync [CLM ] Members Left:
Nov 28 09:20:45 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 09:20:45 corosync [CLM ] Members Joined:
Nov 28 09:20:45 corosync [QUORUM] Members[2]: 1 2
Nov 28 09:20:45 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:20:45 corosync [CLM ] New Configuration:
Nov 28 09:20:45 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:20:45 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:20:45 corosync [CLM ] Members Left:
Nov 28 09:20:45 corosync [CLM ] Members Joined:
Nov 28 09:20:45 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 28 09:20:45 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.150.1) ; members(old:3 left:1)
Nov 28 09:20:45 corosync [MAIN ] Completed service synchronization, ready to provide service.
Nov 28 09:22:49 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:22:49 corosync [CLM ] New Configuration:
Nov 28 09:22:49 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:22:49 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:22:49 corosync [CLM ] Members Left:
Nov 28 09:22:49 corosync [CLM ] Members Joined:
Nov 28 09:22:49 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:22:49 corosync [CLM ] New Configuration:
Nov 28 09:22:49 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:22:49 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:22:49 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 09:22:49 corosync [CLM ] Members Left:
Nov 28 09:22:49 corosync [CLM ] Members Joined:
Nov 28 09:22:49 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 09:22:49 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 28 09:22:49 corosync [QUORUM] Members[3]: 1 2 3
Nov 28 09:22:49 corosync [QUORUM] Members[3]: 1 2 3
Nov 28 09:22:49 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.150.1) ; members(old:2 left:0)
Nov 28 09:22:49 corosync [MAIN ] Completed service synchronization, ready to provide service.
Nov 28 09:24:02 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:24:02 corosync [CLM ] New Configuration:
Nov 28 09:24:02 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:24:02 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:24:02 corosync [CLM ] Members Left:
Nov 28 09:24:02 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 09:24:02 corosync [CLM ] Members Joined:
Nov 28 09:24:02 corosync [QUORUM] Members[2]: 1 2
Nov 28 09:24:02 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:24:02 corosync [CLM ] New Configuration:
Nov 28 09:24:02 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:24:02 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:24:02 corosync [CLM ] Members Left:
Nov 28 09:24:02 corosync [CLM ] Members Joined:
Nov 28 09:24:02 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 28 09:24:02 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.150.1) ; members(old:3 left:1)
Nov 28 09:24:02 corosync [MAIN ] Completed service synchronization, ready to provide service.
Nov 28 09:26:06 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:26:06 corosync [CLM ] New Configuration:
Nov 28 09:26:06 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:26:06 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:26:06 corosync [CLM ] Members Left:
Nov 28 09:26:06 corosync [CLM ] Members Joined:
Nov 28 09:26:06 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:26:06 corosync [CLM ] New Configuration:
Nov 28 09:26:06 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:26:06 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:26:06 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 09:26:06 corosync [CLM ] Members Left:
Nov 28 09:26:06 corosync [CLM ] Members Joined:
Nov 28 09:26:06 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 09:26:06 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 28 09:26:06 corosync [QUORUM] Members[3]: 1 2 3
Nov 28 09:26:06 corosync [QUORUM] Members[3]: 1 2 3
Nov 28 09:26:06 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.150.1) ; members(old:2 left:0)
Nov 28 09:26:06 corosync [MAIN ] Completed service synchronization, ready to provide service.
Nov 28 09:38:15 corosync [QUORUM] Members[3]: 1 2 3
Nov 28 09:45:07 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:45:07 corosync [CLM ] New Configuration:
Nov 28 09:45:07 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:45:07 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:45:07 corosync [CLM ] Members Left:
Nov 28 09:45:07 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 09:45:07 corosync [CLM ] Members Joined:
Nov 28 09:45:07 corosync [QUORUM] Members[2]: 1 2
Nov 28 09:45:07 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:45:07 corosync [CLM ] New Configuration:
Nov 28 09:45:07 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:45:07 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:45:07 corosync [CLM ] Members Left:
Nov 28 09:45:07 corosync [CLM ] Members Joined:
Nov 28 09:45:07 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 28 09:45:07 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.150.1) ; members(old:3 left:1)
Nov 28 09:45:07 corosync [MAIN ] Completed service synchronization, ready to provide service.
Nov 28 09:47:11 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:47:11 corosync [CLM ] New Configuration:
Nov 28 09:47:11 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:47:11 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:47:11 corosync [CLM ] Members Left:
Nov 28 09:47:11 corosync [CLM ] Members Joined:
Nov 28 09:47:11 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 09:47:11 corosync [CLM ] New Configuration:
Nov 28 09:47:11 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 09:47:11 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 09:47:11 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 09:47:11 corosync [CLM ] Members Left:
Nov 28 09:47:11 corosync [CLM ] Members Joined:
Nov 28 09:47:11 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 09:47:11 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 28 09:47:11 corosync [QUORUM] Members[3]: 1 2 3
Nov 28 09:47:11 corosync [QUORUM] Members[3]: 1 2 3
Nov 28 09:47:11 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.150.1) ; members(old:2 left:0)
Nov 28 09:47:11 corosync [MAIN ] Completed service synchronization, ready to provide service.
Nov 28 10:21:38 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 10:21:38 corosync [CLM ] New Configuration:
Nov 28 10:21:38 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 10:21:38 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 10:21:38 corosync [CLM ] Members Left:
Nov 28 10:21:38 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 10:21:38 corosync [CLM ] Members Joined:
Nov 28 10:21:38 corosync [QUORUM] Members[2]: 1 2
Nov 28 10:21:38 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 10:21:38 corosync [CLM ] New Configuration:
Nov 28 10:21:38 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 10:21:38 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 10:21:38 corosync [CLM ] Members Left:
Nov 28 10:21:38 corosync [CLM ] Members Joined:
Nov 28 10:21:38 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 28 10:21:38 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.150.1) ; members(old:3 left:1)
Nov 28 10:21:38 corosync [MAIN ] Completed service synchronization, ready to provide service.
Nov 28 10:23:41 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 10:23:41 corosync [CLM ] New Configuration:
Nov 28 10:23:41 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 10:23:41 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 10:23:41 corosync [CLM ] Members Left:
Nov 28 10:23:41 corosync [CLM ] Members Joined:
Nov 28 10:23:41 corosync [CLM ] CLM CONFIGURATION CHANGE
Nov 28 10:23:41 corosync [CLM ] New Configuration:
Nov 28 10:23:41 corosync [CLM ] r(0) ip(192.168.150.1)
Nov 28 10:23:41 corosync [CLM ] r(0) ip(192.168.150.2)
Nov 28 10:23:41 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 10:23:41 corosync [CLM ] Members Left:
Nov 28 10:23:41 corosync [CLM ] Members Joined:
Nov 28 10:23:41 corosync [CLM ] r(0) ip(192.168.150.3)
Nov 28 10:23:41 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 28 10:23:41 corosync [QUORUM] Members[3]: 1 2 3
Nov 28 10:23:41 corosync [QUORUM] Members[3]: 1 2 3
Nov 28 10:23:41 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.150.1) ; members(old:2 left:0)
Nov 28 10:23:41 corosync [MAIN ] Completed service synchronization, ready to provide service.

and on proxmox-3 the fenced.log

Nov 28 09:06:45 fenced fenced 1364188437 started
Nov 28 09:22:53 fenced fenced 1364188437 started
Nov 28 09:26:09 fenced fenced 1364188437 started
Nov 28 09:47:15 fenced fenced 1364188437 started
Nov 28 10:23:44 fenced fenced 1364188437 started

I've spotted something: When i define one or two vm, that works fine but whenever i configure more than three vms I get migration failure when the node come back !

I repeat: All my vms migrate fine when the node crash but it's after when the crashed node come back(proxmox-3), the vm trying to reallocate on proxmox-3 and at this time i've migration errors with more than three vms...

Thanks,

superwemba
 
hi,

I did some manual tests with the qm command:

When I execute this command:

for vm in '116' '205' '105' '210' '300'; do qm migrate $vm proxmox-2 --online & done

where 'vm' are my vms contained in proxmox-3, I get the same problems describe above ! This command try to migrate all vms at the same time.

but, when I execute this command :

for vm in '116' '205' '105' '210' '300'; do qm migrate $vm proxmox-2 --online ; done

There is no problem ! This command try to migrate the vms one by one.

any idea ?

superwemba
 
hello,

no one got this problem ?

Could you try those above commands on your dev infra ?

superwemba
 
Hi e100,

Node fails (ok)
It get fenced (ok)
HA VMs migrate on another node (define in my cluster.conf with the failover configuration) (ok)
It boot up (ok)
HA VM is auto migrated back to failed node and that migration fails ! (yes this is the problem)

So i'm using failover domains and this is my cluster.conf


<?xml version="1.0" ?>
<cluster config_version="112" name="proxmox">
<cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<clusternodes>
<clusternode name="proxmox-1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmifence1"/>
</method>
</fence>
</clusternode>
<clusternode name="proxmox-2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmifence2"/>
</method>
</fence>
</clusternode>
<clusternode name="proxmox-3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="ipmifence3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="192.168.xxx.xxx" login="xxx" name="ipmifence1" passwd="xxx"/>
<fencedevice agent="fence_ipmilan" ipaddr="192.168.xxx.xxx" login="xxx" name="ipmifence2" passwd="xxx"/>
<fencedevice agent="fence_ipmilan" ipaddr="192.168.xxx.xxx" login="xxx" name="ipmifence3" passwd="xxx"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="faildomain-3" nofailback="0" ordered="1" restricted="0">
<failoverdomainnode name="proxmox-3" priority="1"/>
<failoverdomainnode name="proxmox-2" priority="2"/>
<failoverdomainnode name="proxmox-1" priority="3"/>
</failoverdomain>
<failoverdomain name="faildomain-2" nofailback="0" ordered="1" restricted="0">
<failoverdomainnode name="proxmox-3" priority="3"/>
<failoverdomainnode name="proxmox-2" priority="1"/>
<failoverdomainnode name="proxmox-1" priority="2"/>
</failoverdomain>
<failoverdomain name="faildomain-1" nofailback="0" ordered="1" restricted="0">
<failoverdomainnode name="proxmox-3" priority="3"/>
<failoverdomainnode name="proxmox-2" priority="2"/>
<failoverdomainnode name="proxmox-1" priority="1"/>
</failoverdomain>
</failoverdomains>
<pvevm autostart="1" domain="faildomain-1" vmid="200"/>
<pvevm autostart="1" domain="faildomain-2" vmid="201"/>
<pvevm autostart="1" domain="faildomain-3" vmid="202"/>
<pvevm autostart="1" domain="faildomain-1" vmid="203"/>
<pvevm autostart="1" domain="faildomain-2" vmid="204"/>
<pvevm autostart="1" domain="faildomain-3" vmid="205"/>
<pvevm autostart="1" domain="faildomain-2" vmid="206"/>
<pvevm autostart="1" domain="faildomain-3" vmid="209"/>
<pvevm autostart="1" domain="faildomain-3" vmid="210"/>
<pvevm autostart="1" domain="faildomain-2" vmid="208"/>
<pvevm autostart="1" domain="faildomain-1" vmid="100"/>
<pvevm autostart="1" domain="faildomain-1" vmid="101"/>
<pvevm autostart="1" domain="faildomain-1" vmid="102"/>
<pvevm autostart="1" domain="faildomain-3" vmid="103"/>
<pvevm autostart="1" domain="faildomain-2" vmid="104"/>
<pvevm autostart="1" domain="faildomain-1" vmid="105"/>
<pvevm autostart="1" domain="faildomain-2" vmid="106"/>
<pvevm autostart="1" domain="faildomain-2" vmid="107"/>
<pvevm autostart="1" domain="faildomain-1" vmid="108"/>
<pvevm autostart="1" domain="faildomain-2" vmid="109"/>
<pvevm autostart="1" domain="faildomain-2" vmid="110"/>
<pvevm autostart="1" domain="faildomain-3" vmid="111"/>
<pvevm autostart="1" domain="faildomain-1" vmid="112"/>
<pvevm autostart="1" domain="faildomain-1" vmid="113"/>
<pvevm autostart="1" domain="faildomain-2" vmid="114"/>
<pvevm autostart="1" domain="faildomain-3" vmid="115"/>
<pvevm autostart="1" domain="faildomain-3" vmid="116"/>
</rm>
</cluster>




thank you !

superwemba
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!