proxmox4 - Dealing with node failure and drbd9

uwonlineict

New Member
Dec 26, 2015
10
3
3
37
Hello,

We have a 3 node cluster setup with drbd9. Proxmox4 in combination with drbd9 really is very nice!
One node crashed, we have to bring this node back in the cluster. We
cannot get this done.
We reinstalled it on the same way as the others, following this setup
https://pve.proxmox.com/wiki/DRBD9 and https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster. Now we want te remove the failed node from the cluster on the primary node. Unfortunately we cannot
delete an offline node from the cluster. It will give these messages:
"drbd .drbdctrl: Auto-promote failed: Multiple primaries not allowed by config"

We think it would be the most easy way if we just remove the node from the cluster and add it again, so that all configuration and existing volumes will get synchronised automatically.

Here are some details:
drbdmanage list-nodes
+----------------------------------------------------------------------+
| Name | Pool Size | Pool Free | Site | | State |
|----------------------------------------------------------------------|
| hyp10 | 16777216 | 0 | N/A | | ok |
| hyp20 | 16777216 | 16521161 | N/A | | OFFLINE |
| hyp30 | 16777216 | 9690519 | N/A | | ok |
+----------------------------------------------------------------------+

drbdmanage remove-node -f hyp20
You are going to remove the node 'hyp20' from the cluster. This will
remove all resources from the node.
Please confirm:
yes/no: yes
Jan 5 23:09:43 hyp30 kernel: [717721.879728] drbd .drbdctrl:
Auto-promote failed: Multiple primaries not allowed by config
Jan 5 23:09:48 hyp30 kernel: [717726.880759] drbd .drbdctrl:
Auto-promote failed: Multiple primaries not allowed by config
Jan 5 23:09:52 hyp30 kernel: [717731.278239] drbd .drbdctrl:
Auto-promote failed: Multiple primaries not allowed by config
Jan 5 23:09:56 hyp30 kernel: [717735.163982] drbd .drbdctrl:
Auto-promote failed: Multiple primaries not allowed by config
Jan 5 23:10:00 hyp30 kernel: [717738.849674] drbd .drbdctrl:
Auto-promote failed: Multiple primaries not allowed by config
Jan 5 23:10:05 hyp30 kernel: [717743.614836] drbd .drbdctrl:
Auto-promote failed: Multiple primaries not allowed by config
Traceback (most recent call last):
File "/usr/bin/drbdmanage", line 30, in <module>
drbdmanage_client.main()
File "/usr/lib/python2.7/dist-packages/drbdmanage_client.py", line
3520, in main
client.run()
File "/usr/lib/python2.7/dist-packages/drbdmanage_client.py", line
1130, in run
self.parse(sys.argv[1:])
File "/usr/lib/python2.7/dist-packages/drbdmanage_client.py", line
991, in parse
args.func(args)
File "/usr/lib/python2.7/dist-packages/drbdmanage_client.py", line
1301, in cmd_remove_node
dbus.String(node_name), dbus.Boolean(force)
File "/usr/lib/python2.7/dist-packages/dbus/proxies.py", line 70, in
__call__
return self._proxy_method(*args, **keywords)
File "/usr/lib/python2.7/dist-packages/dbus/proxies.py", line 145, in
__call__
**keywords)
File "/usr/lib/python2.7/dist-packages/dbus/connection.py", line 651,
in call_blocking
message, timeout)
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did
not receive a reply. Possible causes include: the remote application did
not send a reply, the message bus security policy blocked the reply, the
reply timeout expired, or the network connection was broken.

We also followed these guides
http://drbd.linbit.com/users-guide-9.0/s-node-failure.html#s-perm-node-failure,
but without any success. How can we connect the resource .drbdctrl?

Anybode some advice how to restore this cluster?
It would be great if these steps are added to the Proxmox wiki. It will probably be used quite some times in the future.
 
Would you mind to post such question on the drbd user list? I guess drbdadmin developers are there...
 
Would you mind to post such question on the drbd user list? I guess drbdadmin developers are there...

I already did that 2 days ago, but unfortunately there is no reply..

I think it is also very useful in this forum because a lot of users use proxmox4 in combination with drbd9. I would appreciate any advice very much :)
 
Because drbd9 with drbdmanage is too new there is very little documentation available. It is still in development mode, so I advice anyone at the moment to choose a different storage solution. I switched to Ceph instead.
 
Did you verify that your fencing was set up properly? Looks like a split-brain issue. We run a 5 node and it works fine.
I cannot see what this should have to do with fencing? I have no fencing. But I don't allow multiple primary disks. But let me ask you one question. Could you explain why the .drbdctl.0 and .drbdctl.1 are for? In the beginning on all nodes they were secondary. But at some time I don't know one gets Primary. But I dont understand where this is used for.
 
Hi@all
I'm newbie at proxmox and linux
I run into a simular situation.

root@pve1:~# drbdmanage list-nodes
+------------------------------------------------------------------------------------------------------------+
| Name | Pool Size | Pool Free | | State |
|------------------------------------------------------------------------------------------------------------|
| pve1 | 76800 | 76692 | | ok |
| pve2 | unknown | unknown | | OFFLINE, pending actions: adjust connections |
| pve3 | 76800 | 76462 | | ok |
+------------------------------------------------------------------------------------------------------------+
root@pve1:~#


and i figured out that i got 2 versions of drbdmange, maybe that's it but I'm not sure:
root@pve3:~# apt-cache showpkg drbdmanage
Package: drbdmanage
Versions:
0.91-1 (/var/lib/apt/lists/download.proxmox.com_debian_dists_jessie_pve-no-subscription_binary-amd64_Packages) (/var/lib/dpkg/status)
Description Language:
File: /var/lib/apt/lists/download.proxmox.com_debian_dists_jessie_pve-no-subscription_binary-amd64_Packages
MD5: c7bffaad25204bd2f70884f8679d0737

0.50-2 (/var/lib/apt/lists/download.proxmox.com_debian_dists_jessie_pve-no-subscription_binary-amd64_Packages)
Description Language:
File: /var/lib/apt/lists/download.proxmox.com_debian_dists_jessie_pve-no-subscription_binary-amd64_Packages
MD5: c7bffaad25204bd2f70884f8679d0737


Reverse Depends:
Dependencies:
0.91-1 - init-system-helpers (2 1.18~) python (0 (null)) python:any (3 2.8) python:any (2 2.7.5-5~) drbd-utils (2 8.9.4) python-gobject (0 (null)) python-dbus (0 (null)) lvm2 (0 (null)) thin-provisioning-tools (0 (null))
0.50-2 - init-system-helpers (2 1.18~) python (0 (null)) python:any (3 2.8) python:any (2 2.7.5-5~) drbd-utils (2 8.9.4) python-gobject (0 (null)) python-dbus (0 (null)) lvm2 (0 (null)) thin-provisioning-tools (0 (null))
Provides:
0.91-1 -
0.50-2 -
Reverse Provides:
root@pve3:~#

Which of these drbdmange versions should we take?
Maybe it's a bad shot..., I don't know.

regards
christine
 
Last edited:
hi again,

now i know it better :)
sry for my bad shot! i think it was phyton

I uninstalled drbdmange and made it new:

on all 3 nodes i did this:

root@pve3:~# apt-get remove drbdmanage
Reading state information... Done
The following packages were automatically installed and are no longer required:
drbd-utils libdbus-glib-1-2 python-dbus python-dbus-dev python-gobject
Use 'apt-get autoremove' to remove them.
The following packages will be REMOVED:
drbdmanage
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 1,341 kB disk space will be freed.
Do you want to continue? [Y/n] y
.....................................
.....................................
root@pve3:~# apt-get autoremove
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
drbd-utils libdbus-glib-1-2 python-dbus python-dbus-dev python-gobject
0 upgraded, 0 newly installed, 5 to remove and 0 not upgraded.
..................................
..................................

root@pve3:~# apt-get install drbdmanage
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
drbd-utils libdbus-glib-1-2 python-dbus python-dbus-dev python-gobject
Suggested packages:
python-dbus-doc python-dbus-dbg
The following NEW packages will be installed:
drbd-utils drbdmanage libdbus-glib-1-2 python-dbus python-dbus-dev python-gobject
0 upgraded, 6 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/1,240 kB of archives.
After this operation, 3,403 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
..................................

Now i installed the suggested packages:
root@pve3:~# apt-get install python-dbus-doc python-dbus-dbg

Reading package lists... Done
..................................
Then i deleted the nodes on node 1:

root@pve1:~# drbdmanage remove-node pve2 10.10.10.10
You are going to remove the node 'pve2' from the cluster. This will remove all resources from the node.
Please confirm:
yes/no: yes
Removing node 'pve2':
Operation completed successfully

You are going to remove the node '10.10.10.10' from the cluster. This will remove all resources from the node.
Please confirm:
yes/no: yes
Removing node '10.10.10.10':
Error: Debug exception / internal error
Error: Debug exception / internal error
------------------------------------------------------------

then it should look like this
root@pve1:~# drbdmanage list-nodes
+------------------------------------------------------------------------------------------------------------+
| Name | Pool Size | Pool Free | | State |
|------------------------------------------------------------------------------------------------------------|
| pve1 | 76800 | 76692 | | ok |
+------------------------------------------------------------------------------------------------------------+

Then I removed the .drbdctrl's:
root@pve1:~# drbdmanage uninit

You are going to remove the drbdmanage server from this node.
CAUTION! Note that:
* All temporary configuration files for resources managed by drbdmanage
will be removed
* Any remaining resources managed by this drbdmanage installation
that still exist on this system will no longer be managed by drbdmanage

Confirm:

yes/no: yes
Logical volume ".drbdctrl_0" successfully removed
Logical volume ".drbdctrl_1" successfully removed

And followed the steps:
http://pve.proxmox.com/wiki/DRBD9

After that it works fine:
root@pve1:~# drbdmanage list-nodes
+------------------------------------------------------------------------------------------------------------+
| Name | Pool Size | Pool Free | | State |
|------------------------------------------------------------------------------------------------------------|
| pve1 | 76800 | 76692 | | ok |
| pve2 | 76800 | 76692 | | ok |
| pve3 | 76800 | 76462 | | ok |
+------------------------------------------------------------------------------------------------------------+

root@pve1:~# drbdmanage server-version
server_version=0.91
server_git_hash=GIT-hash: UNKNOWN
drbd_kernel_version=version: 9.0.0 (api:2/proto:86-110)
drbd_kernel_git_hash=GIT-hash: 360c65a035fc2dec2b93e839b5c7fae1201fa7d9 build by root@elsa, 2016-02-03 16:38:17
drbd_utils_version=<unknown>
drbd_utils_git_hash=<unknown>
Operation completed successfully
root@pve1:~#

regards
christine
 
  • Like
Reactions: JensDoe
Hi,
I try this solution but when i add a node (drbdmanage add-node -q dmz-pve2 192.168.2.20)
i got :
Operation completed successfully
Operation completed successfully

Executing join command using ssh.
IMPORTANT: The output you see comes from dmz-pve2
IMPORTANT: Your input is executed on dmz-pve2
Error: Operation not allowed on satellite node

drbdmanage list-nodes give :
+------------------------------------------------------------------------------------------------------------+
| Name | Pool Size | Pool Free | | State |
|------------------------------------------------------------------------------------------------------------|
| dmz-pve2 | unknown | unknown | | pending actions: adjust connections |
| dmz-pve3 | 4033728 | 4033728 | | ok |
+------------------------------------------------------------------------------------------------------------+

and drbd-overview give :
0:.drbdctrl/0 Connected(2*) Secondary(2*) UpToDa/UpToDa
1:.drbdctrl/1 Connected(2*) Secondary(2*) UpToDa/UpToDa

Any idea ?

Regards, Jean-Daniel
 
Hi, again
Finaly, it's works for me, but not so easy as the primary documentation explain.
Thanks
 
Hi, again
Finaly, it's works for me, but not so easy as the primary documentation explain.
Thanks
Hi JD, did you perform any steps additional to what Christine noted? I have followed these steps multiple times and still receive the "operation not allowed on satellite node".
Thanks,
Dan
 
Hi Dan, hi Chistine
I remove drbd stuff on the master, reinstall drbdmanage with suggested package, and redo the step, and it works.
But today, i reboot a node and now in this node i don't find my vm.res in /var/lib/drbd.d
drbdadm status give :
volume:0 disk:UpToDate
volume:1 disk:UpToDate
dmz-pve1 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate
dmz-pve2 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate

vm-100-disk-2 role:Secondary
disk:UpToDate
dmz-pve1 role: Primary
peer-disk:UpToDate
dmz-pve2 role:Secondary
peer-disk:UpToDate

drbd-overview give :
0:.drbdctrl/0 Connected(3*) Secondary(3*) UpTo(dmz-pve3)/UpTo(dmz-pve2,dmz-pve1)
1:.drbdctrl/1 Connected(3*) Secondary(3*) UpTo(dmz-pve3)/UpTo(dmz-pve2,dmz-pve1)
101:vm-100-disk-2/0 Connected(3*) Seco(dmz-pve2,dmz-pve3)/Prim(dmz-pve1) UpTo(dmz-pve3)/UpTo(dmz-pve2,dmz-pve1)

and ha-manager status give :
quorum OK
master dmz-pve1 (active, Fri Sep 2 16:56:21 2016)
lrm dmz-pve1 (active, Fri Sep 2 16:56:25 2016)
lrm dmz-pve2 (active, Fri Sep 2 16:56:25 2016)
lrm dmz-pve3 (idle, Fri Sep 2 16:56:25 2016)

For two node i got in /var/lib/drb.d
root@dmz-pve1:~ # ll /var/lib/drbd.d/
total 8
-rw-r--r-- 1 root root 29 août 29 17:17 drbdmanage_global_common.conf
-rw-r--r-- 1 root root 1086 août 29 17:17 drbdmanage_vm-100-disk-2.res

for the third which is freshly rebooted :
root@dmz-pve3:~ # ll /var/lib/drbd.d/
total 4
-rw-r--r-- 1 root root 29 sept. 2 14:49 drbdmanage_global_common.conf

I got an error if i try to migrate the VM on this node :
Could not open '/dev/drbd/by-res/vm-100-disk-2/0
lvdisplay give correct information but nothing in /dev/drbd but .drbdctrl_0 and .drbdctrl_1

It's quite difficult to get an HA cluster with that.
 
Last edited:
Hi JD,

I suppose you mixed up your self written drbd.res and conf with "drbdmanage"?
Yes? but that won't work, sry.
drbdmange will do this for you.

All the best
Christine
 
Hi Christine,
I don't write any *.res. I remark that drbdmanage_vm-100-disk-2.res has been automatically created by Proxmox in /var/lib/drbd.d/ on the three node when I created a VM with Proxmox Web interface. But when I rebooted the third node, this file disappear from /var/lib/drbd.d/. So migration is no more possible on this node.

Best regards
Jean-Daniel
 
Hi JD again, I hope it helps.
yesterday evening I build a mini pve-3-node-cluster in my virtual box:

what you need:
to build the cluster : http://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster
to get drbdmanage: http://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_Jessie
to prepare your drbd: http://pve.proxmox.com/wiki/DRBD9

when you create your first vm it looks similar like that:

root@pve3:~# drbdadm status
.drbdctrl role:Secondary
volume:0 disk:UpToDate
volume:1 disk:UpToDate

pve1 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate

pve2 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate

vm-100-disk-1 role:Secondary
disk:UpToDate
pve1 role:primary
peer-disk:UpToDate
pve2 role:Secondary
peer-disk:UpToDate

pve1 ist now primary.

NOTE: Before i migrate the VM, I disabled „KVM hardware virtualization“ in the „options“ because the cluster runs in my virtual box.

when you migrate vm-100-disk-1 to pve2 it looks similar like that:

root@pve3:~# drbdadm status
.drbdctrl role:Secondary
volume:0 disk:UpToDate
volume:1 disk:UpToDate
pve1 role:Secondary

volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate

pve2 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate

vm-100-disk-1 role:Secondary
disk:UpToDate
pve1 role:Secondary
peer-disk:UpToDate
pve2 role:primary
peer-disk:UpToDate

now pve2 is primary

your resources looks similar like this:

root@pve3:~# drbdmanage list-resources

+------------------------------------------------------------------------------------------------------------+
| Name | | State |
| vm-100-disk-1 | | ok |
+------------------------------------------------------------------------------------------------------------+

drbdmanage list-assignments shows you your resource on all nodes

root@pve3:~# drbdmanage list-assignments
+------------------------------------------------------------------------------------------------------------+
| Node | Resource | Vol ID | | State |
|-------------------------------------------------------------------------------------------------------------|
| pve1 | vm-100-disk-1 | * | | ok |
| pve2 | vm-100-disk-1 | * | | ok |
| pve3 | vm-100-disk-1 | * | | ok |
+------------------------------------------------------------------------------------------------------------+

I hope it helps.
for deeper studies I recommend this books
https://forum.proxmox.com/threads/books-on-proxmox-ve.28122/
to get support by subscriptions:
http://www.proxmox.com/en/proxmox-ve/pricing

all the best
christine
 
Hi JD,

after you create
When you create your vm's in WebGui and you having a look to /var/lib/drbd.d/:
it should be looking similar like that

root@pve1:~# ls -al /var/lib/drbd.d/
total 20
drwxr-xr-x 2 root root 4096 Aug 12 11:32 .
drwxr-xr-x 45 root root 4096 Feb 16 2016 ..
-rw-r--r-- 1 root root 29 Aug 12 11:50 drbdmanage_global_common.conf
-rw-r--r-- 1 root root 1127 Aug 12 11:32 drbdmanage_vm-205-disk-1.res
-rw-r--r-- 1 root root 1166 Jul 12 16:05 drbdmanage_vm-310-disk-1.res
root@pve1:~#


to activate your "HA":
at the webGui, choose "datacenter" go to the tab "HA" -- create a "group" and put in your nodes which should do the HA-stuff :)
then you can add your resources (VM's) when these are all done, you can do "online migration" with your vm's

your ha-manger showing you something like that at cli:

root@pve1:~# ha-manager status
quorum OK
master pve3 (active, Mon Sep 5 11:57:02 2016)
lrm pve1 (active, Mon Sep 5 11:56:58 2016)
lrm pve2 (active, Mon Sep 5 11:56:58 2016)
lrm pve3 (active, Mon Sep 5 11:57:02 2016)
service ct:310 (pve2, started)
service vm:205 (pve1, started)


all the best
christine
 
Hi,
some devices disappears from /dev/drbd/by-res (my VM devices so it cannot start on other node). After creating theses dev files, I can migrate and it's seem to resist to reboot.
Another question is about choosing a node to be the master of the cluster. I don't know how to do that.
I give priority to node to get the most powerful to be the default in case of crash of one node.
 
Hi,
The master depends on the timestamp of the last heartbeat is checked.

yesterday pve3 was master.

root@pve1:~# ha-manager status
quorum OK
master pve3 (active, Mon Sep 5 11:57:02 2016)
lrm pve1 (active, Mon Sep 5 11:56:58 2016)
lrm pve2 (active, Mon Sep 5 11:56:58 2016)
lrm pve3 (active, Mon Sep 5 11:57:02 2016)
service ct:310 (pve2, started)
service vm:205 (pve1, started)


today i do some updates and further tests and pve1 is master
root@pve2:~# ha-manager status
quorum OK
master pve1 (active, Tue Sep 6 12:42:02 2016)
lrm pve1 (active, Tue Sep 6 12:42:02 2016)
lrm pve2 (idle, Tue Sep 6 12:42:07 2016)
lrm pve3 (old timestamp - dead?, Tue Sep 6 12:39:15 2016) -----> it's dead because of reboot
service ct:310 (pve1, started)
service vm:205 (pve1, started)


I hope it helps
all the best.
christine
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!