Problem with HA & DRBD

kgoytendia

New Member
Oct 28, 2014
6
0
1
Hello

I would like to help me with the config of cluster and HA, I just config all and it's working but sometimes we have a problem when I want to do some changes, one of the cluster nodes appear like down, with the red color, but the vm is working normally, if I want to destroy any vm I can't, I check the status of drbd and show me this :

root@servpx1:/# service drbd status drbd driver loaded OK; device status: version: 8.3.13 (api:88/proto:86-96) GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 2012-10-09 12:47:51m:res cs ro ds p mounted fstype 0:r0 StandAlone Primary/Unknown UpToDate/DUnknown r-----


root@servpx2:~# service drbd status drbd driver loaded OK; device status: version: 8.3.13 (api:88/proto:86-96) GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 2012-10-09 12:47:51m:res cs ro ds p mounted fstype 0:r0 WFConnection Primary/Unknown UpToDate/DUnknown C

What I will have to do?

Another problem is that I can't take a snapshot, I supposed because the format of the disk is raw disk, is possible to take a snapshot with this disk format?

Best regards
Kenny Goytendia
 
hello,

Red icon on cluster nodes usually show cluster communication issue but you must provide contents of syslog for more info.

DRBD status shows that you have connectivity issues between nodes or you have a split brain situation which you must manually resolve using the apropriate drbdadm commands.

As per per live snapshots, no they are not supported on lvm volumes.You can use backup snapshot method though if you want.

Provide more info if you need further help like logs, how are your nodes connected etc
 
I have to reset the server and wait to start, I remember some problem about fencing when check the cman status and I have to start that service, How I have to check the log especificly of this service? What commands I have to type to resolv this problem?

My nodes is connected whith a router and both is in the same network, I have only two nodes.

Another question, when I reboot one of the servers and the vm restart in the another node, it will lose the changes that I did in the first one, How long time take to syncronised the nodes and his information? For example I restart the node 1 and his vm restart in the node 2, but restart whith a old iformation, How I force to sincronised la information before restart the node?

Thanks for your help.
 
I would not recommend HA with just 2 nodes.Try to disable HA to keep things simple.

If drbd is correctly configured in dual primary mode you should not loose infirmation as drbd sync data on both nodes in real time.So something is wrong there.

Also I would recommend using separate network cards dedicated for drbd replication.

I have similar setup with you but with HA disabled,dedicated nics for drbd replication and everything works as excpected.
 
You would recommend me that, just install DRBD without HA? Because I need, for example: If the server 1 will down, the vm pass to server 2 without lose the changes, is it possible without HA?
How I can check if the config is correct?
I'm new in proxmox.
Help me please!
 
You would recommend me that, just install DRBD without HA?

Yes

Because I need, for example: If the server 1 will down, the vm pass to server 2 without lose the changes, is it possible without HA?

Yes, you can but not automatically. I mean you have to do it manually. When the 1st node is down you should manually move the vm config file to the appropriate directory of node2 (e.g "mv /etc/pve/nodes/proxmox1/qemu-server/100.conf /etc/pve/nodes/proxmox2/qemu-server/"). Then just start the vm on node2 by giving "qm start 100" from cli or start it from webgui.Storage replication should not be a problem as DRBD replicates the data in real time (primary/primary mode) so you won't loose any data.

How I can check if the config is correct?
I'm new in proxmox.
Help me please!

Post your cluster config: cat /etc/pve/cluster.conf

Mine looks like this:

<?xml version="1.0"?><cluster config_version="5" name="pvecluster">
<cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
<clusternodes>
<clusternode name="proxmox" nodeid="1" votes="1">
</clusternode>
<clusternode name="proxmox2" nodeid="2" votes="1">
</clusternode>
</clusternodes>
</cluster>

I do not have HA enabled on proxmox so I do not need to configure fencing.
To correctly configure DRBD follow this wiki:
http://pve.proxmox.com/wiki/DRBD

For proxmox cluster configuration follow this wiki:
https://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster

...especially this: https://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster#Two_nodes_cluster_and_quorum_issues

hope I helped enough :)
 
Hello.

This is my config of server 1:


<?xml version="1.0"?>
<cluster config_version="10" name="fibercluster">
<cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/> <fencedevices>
<fencedevice agent="fence_ilo" ipaddr="10.100.5.234" login="root" name="fenceA" passwd="s0p0rt3*13"/>
<fencedevice agent="fence_ilo" ipaddr="10.100.5.235" login="root" name="fenceB" passwd="s0p0rt3*13"/>
</fencedevices>

<clusternodes>
<clusternode name="servpx1" nodeid="1" votes="1"> <fence>
<method name="1">
<device action="reboot" name="fenceA"/> </method>
</fence>
</clusternode>
<clusternode name="servpx2" nodeid="2" votes="1"> <fence>
<method name="1">
<device action="reboot" name="fenceB"/> </method>
</fence>
</clusternode> </clusternodes> <rm>
<pvevm autostart="1" vmid="100"/>
<pvevm autostart="1" vmid="101"/>
<pvevm autostart="1" vmid="102"/>
<pvevm autostart="1" vmid="103"/>
<pvevm autostart="1" vmid="104"/> </rm>
</cluster>
</cluster>

############And this is my config of server2:

<?xml version="1.0"?>
<cluster config_version="10" name="fibercluster">
<cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/> <fencedevices>
<fencedevice agent="fence_ilo" ipaddr="10.100.5.234" login="root" name="fenceA" passwd="s0p0rt3*13"/>
<fencedevice agent="fence_ilo" ipaddr="10.100.5.235" login="root" name="fenceB" passwd="s0p0rt3*13"/>
</fencedevices>
<clusternode name="servpx1" nodeid="1" votes="1"> <fence>
<method name="1">
<device action="reboot" name="fenceA"/> </method> </fence> </clusternode>
<clusternode name="servpx2" nodeid="2" votes="1"> <fence>
<method name="1">
<device action="reboot" name="fenceB"/> </method> </fence> </clusternode> </clusternodes> <rm>
<pvevm autostart="1" vmid="100"/>
<pvevm autostart="1" vmid="101"/>
<pvevm autostart="1" vmid="102"/>
<pvevm autostart="1" vmid="103"/>
<pvevm autostart="1" vmid="104"/> </rm>
</cluster>


I hope that you can help me, I need to migrate one of the vm and I can't, this is the error:

Nov 03 16:45:17 starting migration of VM 106 to node 'servpx2' (10.100.5.235)
Nov 03 16:45:17 copying disk images
Nov 03 16:45:17 starting VM 106 on remote node 'servpx2'
Nov 03 16:45:18 can't activate LV '/dev/myvg/vm-106-disk-1': One or more specified logical volume(s) not found.
Nov 03 16:45:18 ERROR: online migrate failure - command '/usr/bin/ssh -o 'BatchMode=yes' root@10.100.5.235 qm start 106 --stateuri tcp --skiplock --migratedfrom servpx1 --machine pc-i440fx-2.1' failed: exit code 255
Nov 03 16:45:18 aborting phase 2 - cleanup resources
Nov 03 16:45:18 migrate_cancel
Nov 03 16:45:18 ERROR: migration finished with problems (duration 00:00:01)
TASK ERROR: migration problems

.


Thanks for all!
 
looks like you having issues with backend storage of vm.
Can you post:
cat /proc/drbd
cat /etc/drbd.d/r0.res
 
Hello!

SERVER 1:


root@servpx1:/# cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 2012-10-09 12:47:51
0: cs:WFConnection ro:primary/Unknown ds:UpToDate/DUnknown C r-----
ns:0 nr:0 dw:55541833 dr:115554852 al:27491 bm:4841 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:22975676

SERVER 2:

root@servpx2:~# cat /proc/drbd version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 2012-10-09 12:47:51
0: cs:StandAlone ro:primary/Unknown ds:UpToDate/DUnknown r-----
ns:0 nr:0 dw:5602074 dr:6029967 al:1985 bm:531 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:5955208

SERVER 1:

root@servpx1:/# cat /etc/drbd.d/r0.res
resource r0 {
on servpx1 {
device /dev/drbd0;
disk /dev/sdb1;
address 10.100.5.234:7788;
meta-disk internal; }
on servpx2 {
device /dev/drbd0;
disk /dev/sdb1;
address 10.100.5.235:7788;
meta-disk internal; } }

SERVER 2:

root@servpx2:~# cat /etc/drbd.d/r0.res
resource r0 {
on servpx1 {
device /dev/drbd0;
disk /dev/sdb1;
address 10.100.5.234:7788;
meta-disk internal; }
on servpx2 {
device /dev/drbd0;
disk /dev/sdb1;
address 10.100.5.235:7788;
meta-disk internal; }


Thanks for all! }
 
root@servpx1:/# cat /proc/drbd
0: cs:WFConnection ro:primary/Unknown ds:UpToDate/DUnknown
root@servpx2:~# cat /proc/drbd
0: cs:StandAlone ro:primary/Unknown ds:UpToDate/DUnknown

The problem is here. You must resolve drbd split brain situation.
It should be ds:UpToDate/UpToDate
to be able to live migrate.

To resolve this follow this guide:

https://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster#DRBD_split-brain

Also you must be sure that 10.100.5.234 and 10.100.5.235 can see each other(can you ping from each side?).
 
Last edited:
I will try with that.

I read in this wiki that I will lost one of the servers and his vm, I have important vm in both servers, if I make a backup, after that will lost the vm, Can I restore the backup?
Thanks, I think this will solve the problem.
 
yes first do a backup of all vms.
Then decide which server will overwrite the data of the other (actually it will write only the differences, not full sync will initiated).
Split brains are often on drbd that is why I told you that you must not use proxmox HA in combination with drbd.
When these situations occur you must manually interfere to give a solution.
HA is unaware of these split brains, so it may start a vm automatically on a node with outdated data and corrupt it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!