Strange lvm issue on proxmox cluster

lozair · Apr 13, 2011

Hi,
we have 2 nodes runing proxmox 1.7.

we use shared storage between proxmox servers on FC SAN using multipath.
This shared storage is used as LVM storage named vg_guests on proxmox.
On this storage we create Virtual raw disk as LV for each virtual machine.

We attempt to migrate our Xen vm from old cluster to this new proxmox cluster.

I precise that for migrating our Xen machine we script the process.
This script use "qm create" command to create a raw disk and we use kpartx/fdisk/tar/grub commands to configure the vm disk from the first proxmox node.

The "transfer" from xen cluster to proxmox cluster seems to be ok.

We encounter a problem when we migrate from first proxmox node to second node.
The FS of the VM seems to be corrupted after live migration.

Looking the LVM status on the second node, all seems ok.
But it appears there is a big "LVM" problem.

Indeed for the vm (vmid=202) the LV /dev/vg_guests/vm-202-disk-1 present the same "LVM" metadata (same lvdisplay result on the 2 nodes for this disk) than on first node.
But using kpartx -l comand we can view that the partition table is not the same than on first node.....

In fact in first node /dev/vg_guests/vm-202-disk-1 point to a linux disk and on second node it point to a windows disk.......

We have solved the problem rebooting the second node. After reboot all the LV are ok on the second node.

We plan to migrate our 70 Xen vm to proxmox and i attempt to understand what was the problem before achieve this.

Perhaps our procedure for migrating from xen to kvm can cause problem on proxmox.

Any advice will be great

Regards

lozair · Apr 14, 2011

Anyone have seen the same problem before ?

Today we transfer 2 Xen vm to proxmox and all was fine.

We have added a third node and all was fine too.

All migrations between proxmox node was ok

Can someone confirm that we can used shared storage (not shared filesystem) with proxmox whithout risk of corrupted data ?

Reading the docs it seems that should be ok.

Thanks for your help

udo · Apr 14, 2011

lozair said:
Anyone have seen the same problem before ?

Today we transfer 2 Xen vm to proxmox and all was fine.

We have added a third node and all was fine too.

All migrations between proxmox node was ok

Can someone confirm that we can used shared storage (not shared filesystem) with proxmox whithout risk of corrupted data ?

Reading the docs it seems that should be ok.

Thanks for your help

Hi,
i use shared storage (first with FC-SAN only, now also with drbd) since this option is available (app. 1.5 years) without trouble in production.

Udo

lozair · Apr 15, 2011

OK thanks for your response.
Another question, we use here only one LVM Volume Group and to extend the space we use vgextend and add more luns to the VG.
Do you use the same method ?

Thks

udo · Apr 15, 2011

lozair said:
OK thanks for your response.
Another question, we use here only one LVM Volume Group and to extend the space we use vgextend and add more luns to the VG.
Do you use the same method ?

Thks

Hi,
it's depends... if the lun on another raid, i nomaly use it as an new lvm-storage.
With drbd-setups i use allways own drdb-devices for each server (to get no trouble in case of a split brain situation).

Udo

lozair · Apr 18, 2011

Hi,

I'm coming again with my strange LVM problem which appears again on my cluster.
We use here scripts with qm command to create new machines.

It seems i encountered the problem when i create a vm (id=202), destroy the vm and recreate a vm using the same vmid (id=202).
On one node i have strange lvm issue with dm devices pointing on bad direction, when i migrate the vm on this node all the migration was ok but when there is some write on disk all fails.
Is there any restriction about reusing free proxmox vm id on a cluster ?

I probably make a mistake but i can't identify the error...

Thks for your help.

dietmar · Apr 19, 2011

lozair said:
On one node i have strange lvm issue with dm devices pointing on bad direction

What exactly is wrong on that node?

lozair · Apr 19, 2011

I have the following for the vmId 202 on the node :

#lvscan
ACTIVE '/dev/vg_guests/vm-202-disk-1' [10,00 GB] inherit

#dmsetup info
Name: vg_guests-vm--202--disk--1
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 254, 9
Number of targets: 1
UUID: LVM-WAnsDzU7yk5qdqZ2tYhbpxZzmnF4Zy5htR2As2foul5nJK0o34IoXxDTFPU5zTeo

All seems ok but when i migrate the 202 vm, the vm fail to run correctly.
This vm host a redmine instance and the application fails.
If i stop the vm and i attempt to restart on the same node, i can't boot, grub doesn't recognize partitions.
If i migrtae to another node, all works fine, the vm start and all is ok.

It seems the device identified by vg_guests-vm--202--disk--1 not pointing the same data than another node...

Regards

dietmar · Apr 19, 2011

lozair said:
If i migrtae to another node, all works fine, the vm start and all is ok.

Are there different CPUs on that nodes?

lozair · Apr 19, 2011

no all node are DELL M610 servers with intel 5620 processors.

cyberbootje · Apr 19, 2011

I can second this issue, we made a backup of VM 112 and restored it again(from the old SAN-iscsi to the new SAN-also iscsi) to node 2.
We live migrated VM 112(a debian VM installed a year ago fresh within proxmox) from node 2 to node 1.
That worked fine, but after a couple of days the disk became read-only, we rebooted the VM and it didn't boot anymore, no grub menu, error 15.
Eventually we migrated(offline) to node 2 again and the VM started again, no problems.

This is on a 1.6 cluster, since we are trying to upgrade to 1.8 now it would be nice to figure out what is wrong here.

l.mierzwa · Apr 19, 2011

I had the same problem with lvm on top of shared iscsi, after migrating kvm guest to another node it did not boot as the lvm volume had different data, migrating it back to original node made it boot again. This is why I started to use clvm and had no problems since.
http://forum.proxmox.com/threads/4893-LVM-on-shared-storage

lozair · Apr 19, 2011

Ok i'm thinking than with proxmox we can use simple lvm on shared storage....and proxmox assume only one lvm was accessed only by one vm and assume LVM consistency over the cluster using vgscan/lvscan/etc...

We use here multipathing to acces our SAN Luns, do you have the same.

It seems Udo use proxmox and LVM without clvm and that works fine....

I read about clvm long time ago.

Do you have any docs/howto about setting clvm in proxmox.

Regards

l.mierzwa · Apr 19, 2011

lozair said:
Ok i'm thinking than with proxmox we can use simple lvm on shared storage....and proxmox assume only one lvm was accessed only by one vm and assume LVM consistency over the cluster using vgscan/lvscan/etc...

We use here multipathing to acces our SAN Luns, do you have the same.

It seems Udo use proxmox and LVM without clvm and that works fine....

I read about clvm long time ago.

Do you have any docs/howto about setting clvm in proxmox.

Regards

Short tutorial:

1. install cman and clvm
2. configure cman in /etc/cluster/cluster.conf (see man cluster.conf) on all nodes
3. start cman and clvm on all nodes
4. enable clvm for given vg (vgchange -cy <your vg>)

example cluster.conf:

<?xml version="1.0"?>
<cluster name="clvm_cluster" config_version="10">

<clusternodes>

<clusternode name="pve186" votes="1" nodeid="1">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.1.186"/>
</method>
</fence>
</clusternode>

<clusternode name="pve185" votes="1" nodeid="2">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.1.185"/>
</method>
</fence>
</clusternode>

<clusternode name="pve184" votes="1" nodeid="3">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.1.184"/>
</method>
</fence>
</clusternode>

<clusternode name="pve183" votes="1" nodeid="4">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.1.183"/>
</method>
</fence>
</clusternode>

<clusternode name="pve182" votes="1" nodeid="5">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.1.182"/>
</method>
</fence>
</clusternode>

<clusternode name="pve180" votes="1" nodeid="6">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.1.180"/>
</method>
</fence>
</clusternode>

</clusternodes>

<fence_daemon clean_start="1" post_fail_delay="3" post_join_delay="3"/>
<fencedevices>
<fencedevice name="manual" agent="fence_manual"/>
</fencedevices>

</cluster>

Notes:
I did not enabled any fencing which is recomended
clvm means not lvm snapshots !! so no online backups with vzdump
cman version in debian lenny is old and buggy, qdisc on top of multipath devices will not work (qdisk is a optional special partition or whole LUN that cman uses to check if it is connected properly to shared storage - very usefull)

l.mierzwa · Apr 19, 2011

lozair said:
We use here multipathing to acces our SAN Luns, do you have the same.

It seems Udo use proxmox and LVM without clvm and that works fine....

I've had identical problem on two clusters in two separate data centers, one was using FC storage and multipath, other iscsi first without multipath, and then I've added multipath to protect myself from iscsi reconnects (one reconnect changed iscsi disk name from sda to sdb and my lvm stopped working as dm-mapper was pointing all I/O to sda). It all was fine until one day I have migrated several kvm guest to another node, few of them did not work, I've checked second cluster and it had the same problem with migration. I can risk such data loss so just to be on the safe side I've added clvm. Some people may be fine without it, maybe it's something I did that broke it, I just don't want to risk that power outage will result in half of my kvm guests corrupted.

lozair · Apr 19, 2011

thks for your explanation.

Did you use scripts to create vm on proxmox cluster like we doing here ?

I think that create vm outside the proxmox web interface can lead to "corrupted" LVM data on the cluster...

Anyone can point me to the right way to maintain lvm data synchronized over the cluster. To be more precise, when i must launch vgscan/lvscan during vm creartion if i use scripts to create/migrtae vm ?

regards

l.mierzwa · Apr 19, 2011

lozair said:
thks for your explanation.

Did you use scripts to create vm on proxmox cluster like we doing here ?

I think that create vm outside the proxmox web interface can lead to "corrupted" LVM data on the cluster...

Anyone can point me to the right way to maintain lvm data synchronized over the cluster. To be more precise, when i must launch vgscan/lvscan during vm creartion if i use scripts to create/migrtae vm ?

regards

I did everything using proxmox tools and still I had problems. Before that I had one cluster with lvm on top of FC but I broke it with lvextend, so I was very careful not to do any manual "tweaking" and left everything to proxmox.

lozair · Apr 19, 2011

when you say "i broke it with lvextend", can you precise the problem you encountered
we use here vgextend to extend our vg_guests volume group which host all our vm disk...

Thks

l.mierzwa · Apr 19, 2011

lozair said:
when you say "i broke it with lvextend", can you precise the problem you encountered
we use here vgextend to extend our vg_guests volume group which host all our vm disk...

Thks

I've resized one on logical volumes using lvextend, You can't do that if in case of lvm on shared storage connected to proxmox.

see http://forum.proxmox.com/threads/27...ached-shared-storage-system?p=15240#post15240

lozair · Apr 19, 2011

ok thks for the link.

I have rebooted the "failed" node.
All was good again, all worked fine.
I'm looking for solution to this issue.
Anyone other users use clvm on their proxmox cluster ?

I'm thinking about this solution but is there any impact with proxmox to use clvm (except no snapshoting).
we have transferred 20 vm on our 4 nodes cluster but i must secure the data in order to migrate safely between nodes.

Thks for your help

Strange lvm issue on proxmox cluster

Member

Member

Distinguished Member

Member

Distinguished Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

We value your privacy