I need help with DRBD - "Failure: (114) Lower device is already claimed."

atinazzi

Active Member
Mar 15, 2010
57
1
26
Hi,
I have been using Proxmox on a stand alone server for a couple of years without any trouble. I love Proxmox and I would like to start recommending it to my customers. Lately I have been trying to implement a cluster with DRBD but I have encountered a number of problems explained below.

I have 2 servers with 1 logical volume each subdivided in 2 partitions (by Proxmox installer). I am running 1.8 with kernel 2.6.32... more specifically:

pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.8-33
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.24-8-pve: 2.6.24-16
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.28-1pve1
vzdump: 1.2-14
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.1-1
ksm-control-daemon: 1.0-6

I have a number of VMs running and I managed to set up this two servers as a cluster using pveca. I would like now to enable DRBD for the second partition /dev/sda2. To do so I have followed instructions found on http://pve.proxmox.com/wiki/DRBD

However, when I try to start the drbd service I get the following:

proxmox-01:~# /etc/init.d/drbd start
Starting DRBD resources:[ d(r0) 0: Failure: (114) Lower device is already claimed. This usually means it is mounted.

[r0] cmd /sbin/drbdsetup 0 disk /dev/sda2 /dev/sda2 internal --set-defaults --create-device failed - continuing!

n(r0) ]..........
***************************************************************
DRBD's startup script waits for the peer node(s) to appear.
- In case this node was already a degraded cluster before the
reboot the timeout is 60 seconds. [degr-wfc-timeout]
- If the peer was available before the reboot the timeout will
expire after 15 seconds. [wfc-timeout]
(These values are for resource 'r0'; 0 sec -> wait forever)
To abort waiting enter 'yes' [ 14]:
0: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk
Command '/sbin/drbdsetup 0 primary' terminated with exit code 17
0: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk
Command '/sbin/drbdsetup 0 primary' terminated with exit code 17
0: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk
Command '/sbin/drbdsetup 0 primary' terminated with exit code 17
0: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk
Command '/sbin/drbdsetup 0 primary' terminated with exit code 17
0: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk
Command '/sbin/drbdsetup 0 primary' terminated with exit code 17

Someone told me that it may be because the service is already running, so I moved to the next step which was not successful either:

proxmox-01:~# drbdadm create-md r0
md_offset 999114268672
al_offset 999114235904
bm_offset 999083745280

Found LVM2 physical volume signature
975695872 kB data area apparently used
975667720 kB left usable by current configuration

Device size would be truncated, which
would corrupt data and result in
'access beyond end of device' errors.
You need to either
* use external meta data (recommended)
* shrink that filesystem first
* zero out the device (destroy the filesystem)
Operation refused.

Command 'drbdmeta 0 v08 /dev/sda2 internal create-md' terminated with exit code 40
drbdadm create-md r0: exited with code 40

I am not sure what I am doing wrong... could someone please help me?

Thanks
 
Hi,
you are sure that sda2 isn't used by other processes (mounted, or as lvm-device)?
On a normal proxmox-installation sda2 is the volume for the pve-volumegroup!

See the output of "mount" and "pvdisplay".

Udo
 
Yes this is correct.

Should I stop all running VMs and umount the partition?

Below are the outputs as requested

proxmox-01:~# mount
/dev/mapper/pve-root on / type ext3 (rw,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
procbususb on /proc/bus/usb type usbfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/dev/mapper/pve-data on /var/lib/vz type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)
proxmox-01:~# pvdisplay
--- Physical volume ---
PV Name /dev/sda2
VG Name pve
PV Size 930.50 GB / not usable 1.62 MB
Allocatable yes
PE Size (KByte) 4096
Total PE 238207
Free PE 1023
Allocated PE 237184
PV UUID vgHPLe-93vH-KPSy-oK2n-sg2N-UlyJ-iSA8bz
 
...[/QUOTE]
Hi,
you can't unmount sda2 because it's used for the volumegroup pve.
You need a free partition/disk for DRBD (you can use an logical-volume, but i don't prefer that).
The best way for testing is a free disk on both nodes, each disk with two partitions (same size on both hosts) and two drbd-devices (one for each node as primary device for VM-disks).
On top of the drbd-devices you create than an volumegroup and use them inside the pve-storage-section.

Udo
 
Hi Udo and thanks for helping me with this. I do not have a good understanding of Proxmox architecture yet.
Unfortunately I cannot add any additional disk. The 2 servers are Dell R210 with 2 x 1Tb HDs with hardware RAID. The controller allows only 1 volume.

So.... can I boot each server with a Gparted Live CD and resize /dev/sda2? If possible, how big should be each partition and how should I mount them?

Thanks again... you have been very helpful
 
Hi Udo and thanks for helping me with this. I do not have a good understanding of Proxmox architecture yet.
Unfortunately I cannot add any additional disk. The 2 servers are Dell R210 with 2 x 1Tb HDs with hardware RAID. The controller allows only 1 volume.

So.... can I boot each server with a Gparted Live CD and resize /dev/sda2? If possible, how big should be each partition and how should I mount them?

Thanks again... you have been very helpful
Hi,
AFAIK gparted-live-CDs can't handle lvm-partitions (shrinking).
Don't can you use for testing an external disk?

If you finish your test and DRBD fits your usage, you can do an handmade partitioning (in short: extend vg with second disk/partion, move content (free sda2), repartitioning sda, move content back - with reduced pve-data).

Udo
 
So... you are suggesting I should backup sda2, then delete the partition, then recreate a smaller sda2 for pve-data and create a sda3 for DRBD.
Having 1Tb in total, how big should I make sda2 (pve-data) and sda3 (DRBD). What data are these partitions holding? I thought pve-data was storing VMs and theirs virtual disks, so if this is the case, what is sda3 (DRBD) storing? Sorry about this silly question, but I am trying to figure out data flow and architecture.

What commands should i use to recreate sda2 (pve-data) once I deleted it? Does fdisk handle LVM?

I have found this http://whattheit.wordpress.com/2010/10/07/proxmox-with-drbd-on-a-single-raid-volume/ Would this work for me? Is this what I should be doing?

Once again.... I really appreciate your help.

Thanks
 
I wrote the article you're linking to. I think you are trying to do the same thing I was when I wrote it, but that setup never went into production and, as is, should not be attempted on production hardware. (Hence the warning at the top and my reluctance to post the second half)

The biggest issue is that Proxmox uses LVM for a lot of the 'magic' that is there in the backups, the locking for DRBD, maybe part of the migration stuff, etc. It plays a big part. So the problem with what I was attempting is that /dev/mapper/pve-data goes away and its data ends up being inside /dev/mapper/pve-root. I do not know whether it is sufficient for that data to be on an LV like pve-root or whether Proxmox actually still expects it to be on pve-data. There may be other concerns too that I'm still ignorant of.

I can say that my cluster still booted when I made the changes, but I didn't try to put any VMs on it or back them up or migrate. I believe the method for shrinking the LVM is technically sound, but getting rid of that LV entirely was a mistake. Were someone to put together a more comprehensive (safe) guide on it, what I wrote up might represent roughly half of the stuff that needed to be done.

I really really would keep the idea away from production hardware. Actually I'm gong to go put a bigger disclaimer on the article and video. :)
 
Thanks for replying... So from my understanding possible options for a reliable and cost effective solution are:
1) NO RAID: 1 HD for pve-data and 1HD for DRBD. My understanding is that DRBD is the equivalent of a network based RAID 1
2) RAID 1 + external storage (iSCSI, FC, NFS, etc): Hardware RAID controller with 2 HDs (mirrored) for pve-data and external storage of any kind.
At the end of the day, the objective is to ensure that VM images are available to any server in the cluster at any time, isn't it?
I think I will be going with option 2... more expensive but also more reliable.
 
Thanks for replying... So from my understanding possible options for a reliable and cost effective solution are:
1) NO RAID: 1 HD for pve-data and 1HD for DRBD. My understanding is that DRBD is the equivalent of a network based RAID 1
2) RAID 1 + external storage (iSCSI, FC, NFS, etc): Hardware RAID controller with 2 HDs (mirrored) for pve-data and external storage of any kind.
At the end of the day, the objective is to ensure that VM images are available to any server in the cluster at any time, isn't it?
I think I will be going with option 2... more expensive but also more reliable.
Hi,
point 2 is not realy more reliable like a version with DRBD (but also not more expensive).
Because: If you have your data on a SAN (iSCSI, FC) you have there a SPOF (single point of failure) - you can do a lot (multi powersupply, redundant controller) but not all.
With DRBD you got two undependet server with the storage - but this is also expensive if you wan't good speed (raidcontroller, fast network).

I use a mix of both and it's depends on the usage, which kind i prefer.

Try to play with DRBD without break you excisting pve-installation. In my eyes DRBD makes sense, if you got the storage goot connected to the hosts (some drive-bays, good raid-controller).

Udo
 
I have just purchased 2 x 1Tb eSATA drives. As soon as they arrive I will start testing and post my experience. I am sure other new users may find this useful.
 
UPDATE...

I connected 2x1Tb eSATA drives to the 2 DELL R210 servers of my Proxmox cluster following step-by-step instructions from http://pve.proxmox.com/wiki/DRBD

Everything went quite smoothly and from what I can see DRBD is working very well and it seems quite fast as well.

I have connected the two servers via a dedicated VLAN using a Gb switch rather than using a crossover cable so that I will be able to add more nodes in the future.

I have installed ubuntu 10.04 amd64 on a new VM and I have run a few tests to assess its performances and I am seriously impress by the outcome. I also tested a live migration that happened in just a few seconds.

I have a few questions which I would like to throw to this forum:

1) When i started the syncronization of the new disks (drbdadm -- --overwrite-data-of-peer primary r0) the system reported a transfer speed of about 30 Mb/s which does not look much to me... but then... system performances and live migration are excellent. Any thought about this?

2) When I created the new VM, I have noticed that if I select the newly added storage (which I called 'shared') the only image format available is RAW, while if I select the local storage I can also chose qcow2 and vmdk. Is this normal or have I done something wrong?

3) I have also noticed that the new LVM group just supports virtual disks (no ISO images and no templates). When I live-migrated the VM, I forgot to remove the iso image from the virtual cdrom drive. The VM was transfered from node 1 to node 2 as intended but it did not resumed operations. This problem was caused by the missing iso image which the VM was unable to find on the new node. Again... is this how it is supposed to be or have I done something wrong? Is there a way to replicate ISOs & templates?

4) I have a number of existing VMs, currently running locally on each node. These VMs uses VMDK and qcow2 images. Can they be migrated to the new shared storage? If yes how?

Thanks
 
ad 1: read the drbd user guide how to do performance tuning.

ad 2: if you use block devices, there is no disk image format as you use just the block device directly. (the gui is a bit misleading here)

ad 3: use NFS server for storing ISO images, so you have access on all nodes. also we will improve this behavior in future versions.

ad 4: if you have raw disk images you can do a simple backup/restore. so you can transform the qcow2/vmdk images to raw files, then backup and finally restore with qmrestore, using the --storage flag pointing to the new storage. alternative, you can just use clonezilla live cd´s to copy the disks.