Device mapper doesn't remove the device in a clustered setup

pws · Jul 22, 2011

Hi all

I'm running a Proxmox 1.8 cluster with DRBD, as described here: http://pve.proxmox.com/wiki/DRBD (Versions below)

I think I've found a problem with deleting virtual machines and volumes. I've been able to replicate it on a test cluster, so it's not just my system! My master proxmox host is called coquet, my second proxmox host is called loddont. Steps I took to replicate:

1. Create a new VM on coquet, with a 10Gb hard drive on the DRBD LVM volume group
2. Start the VM and live-migrate it to loddont and back to coquet again, so the LV gets marked as active on both hosts
3. Stop the VM
4. Run this on each host to find out what device mapper node the LV appears as: lvdisplay|awk '/LV Name/{n=$3} /Block device/{d=$3; sub(".*:","dm-",d); print d,n;}'
In my setup, it happens to appear as dm-8.
5. grep for that device mapper node in /proc/partitions in each host: grep dm-8 /proc/partitions
6. Delete the VM
7. Look in /proc/partitions again. You'll find that on coquet, the dm- entry has gone, but it's still there on loddont.

To see how this is a problem:
8. Create a new VM on loddont, but this time give it a 15Gb hard drive
9. Boot the VM and run a linux liveCD or something.
10. Look at dmesg or fdisk, and notice that it detects a 10Gb drive

I can fix this by either:
1) Rebooting the host with the bad mapping, or:
2) Stopping the VM and running 'dmsetup remove /dev/mapper/drbdvg-vm--106--disk--1' and then either re-activate the LV if I've create a new VM, or just remove the LV if I haven't. I don't know if these actions will cause problems though...

Can someone take a look at it please?

Thanks!

Phil

Versions:
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.8-11
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.35-1-pve: 2.6.35-11
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.28-1pve1
vzdump: 1.2-14
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.1-1
ksm-control-daemon: 1.0-6

dietmar · Jul 23, 2011

pws said:
I can fix this by either:
1) Rebooting the host with the bad mapping, or:

Have you tried to simply run 'vgscan'? Please can you test if that fixes the problem?

kobuki · Jul 23, 2011

In clustered setups where LVM (even when cLVM is used) gets used it's always advisable to disable the LVM cache completely. This basically involves 2 steps: change write_cache_state to 0 in /etc/lvm/lvm.conf and remove the /etc/lvm/cache directory. This effectively forces LVM not to use cached data and always re-read state from disk. I think your system is confused because it uses the cached data.

pws · Jul 25, 2011

dietmar said:
Have you tried to simply run 'vgscan'? Please can you test if that fixes the problem?

I've tested, and this doesn't solve it:

loddont:~# vgscan
Reading all physical volumes. This may take a while...
Found volume group "pve" using metadata type lvm2
Found volume group "drbdvg" using metadata type lvm2

and the problem still shows (the entry is still in /proc/partitions).

kobuki said:
In clustered setups where LVM (even when cLVM is used) gets used it's always advisable to disable the LVM cache completely. This basically involves 2 steps: change write_cache_state to 0 in /etc/lvm/lvm.conf and remove the /etc/lvm/cache directory. This effectively forces LVM not to use cached data and always re-read state from disk. I think your system is confused because it uses the cached data.

Just tested this too, with the same method as above - no change, the problem still shows up in the same way.

dietmar · Jul 25, 2011

pws said:
and the problem still shows (the entry is still in /proc/partitions).

But you can't see the logical volume?

pws · Jul 25, 2011

dietmar said:
But you can't see the logical volume?

Correct. Once I delete the VM, 'lvscan' on both machines shows that the LV has gone (it's not in the list). On the primary node, /proc/partitions no longer has the 'dm-' entry that was for that LV. On the secondary node, the dm- entry is still in /proc/partitions.

dietmar · Jul 26, 2011

pws said:
On the secondary node, the dm- entry is still in /proc/partitions.

Seems linux does not update that correctly. I have no real idea how to fix that. Does that happen with kernel 2.6.32 too?

pws · Jul 26, 2011

dietmar said:
Seems linux does not update that correctly. I have no real idea how to fix that. Does that happen with kernel 2.6.32 too?

Yes, it does. I've just switched both of my testing nodes o 2.6.32-4-pve and re-done the test (from my first post). Still happens in just the same way.

The only way I've found to fix it is reboot the host, or use the 'dmsetup remove' line from my first post.

pws · Aug 8, 2011

Did anyone manage to replicate this issue?

Thanks

dietmar · Aug 10, 2011

It would be great if you can track down the bug a bit more, i.e. does it occur on any kind of shared storage (iSCSI), or does it depend on DRBD?

pws · Aug 11, 2011

dietmar said:
It would be great if you can track down the bug a bit more, i.e. does it occur on any kind of shared storage (iSCSI), or does it depend on DRBD?

I've just created an ISCSI connection (based on FreeNAS, and following http://pve.proxmox.com/wiki/Storage_Model#LVM_Groups_with_Network_Backing) - and it still does it (using the method I outlined in my first post). No DRBD anywhere in sight! Let me know if there's anything else you'd like me to try.

Can anyone else replicate this? Or is it just me?

tylerdurden81 · Aug 14, 2011

I have the same problem.

e100 · Aug 23, 2011

I also have this problem with DRBD.
Have seen it described in the manner above, but also discovered a new way it can happen.

I have two DRBD volumes DRBD0 and DRBD1. each with their own LVM.
All VMs running on server A are stored on DRBD0 All VMs running on server B are stored on DRBD1.

Last night when Server B started backups, at the moment it made a snapshot DRBD1 went split brain.
I could not make DRBD1 go to secondary on server A to correct the split brain because something was holding open DRBD1 on server A
Tried to deactivate the volumes but in the end just like everyone else in this thread I had to reboot server A to release whatever is holding it open.

I am using 2.6.35, I see others are having this issue with 2.6.32 as well.
My LVM cache was disabled on both servers when this happened.

dietmar · Sep 20, 2011

The only way to resolve such situation is to use 'dmsetup remove <device>'. But make sure you remove the right (stale) device.
We need to use clvm to avoid it completely (PVE V2.0 will have clvm enabled).

pws · Sep 22, 2011

Thanks for looking into this dietmar

We've now written a simple perl script that checks for inconsistency so we can manually fix it, and we eagarly await pve v2.0!

nicorac · Dec 13, 2011

pws said:
We've now written a simple perl script that checks for inconsistency so we can manually fix it...

Could you share it here? I'm experiencing the same issue (using DRBD).

Search

Search

Device mapper doesn't remove the device in a clustered setup

pws

New Member

dietmar

Proxmox Staff Member

kobuki

Renowned Member

pws

New Member

dietmar

Proxmox Staff Member

pws

New Member

dietmar

Proxmox Staff Member

pws

New Member

pws

New Member

dietmar

Proxmox Staff Member

pws

New Member

tylerdurden81

Member

e100

Renowned Member

dietmar

Proxmox Staff Member

pws

New Member

nicorac

Member

We value your privacy