Strange lvm issue on proxmox cluster

lozair

Member
Nov 4, 2008
89
0
6
Hi,
we have 2 nodes runing proxmox 1.7.

we use shared storage between proxmox servers on FC SAN using multipath.
This shared storage is used as LVM storage named vg_guests on proxmox.
On this storage we create Virtual raw disk as LV for each virtual machine.

We attempt to migrate our Xen vm from old cluster to this new proxmox cluster.

I precise that for migrating our Xen machine we script the process.
This script use "qm create" command to create a raw disk and we use kpartx/fdisk/tar/grub commands to configure the vm disk from the first proxmox node.

The "transfer" from xen cluster to proxmox cluster seems to be ok.

We encounter a problem when we migrate from first proxmox node to second node.
The FS of the VM seems to be corrupted after live migration.

Looking the LVM status on the second node, all seems ok.
But it appears there is a big "LVM" problem.

Indeed for the vm (vmid=202) the LV /dev/vg_guests/vm-202-disk-1 present the same "LVM" metadata (same lvdisplay result on the 2 nodes for this disk) than on first node.
But using kpartx -l comand we can view that the partition table is not the same than on first node.....

In fact in first node /dev/vg_guests/vm-202-disk-1 point to a linux disk and on second node it point to a windows disk.......

We have solved the problem rebooting the second node. After reboot all the LV are ok on the second node.

We plan to migrate our 70 Xen vm to proxmox and i attempt to understand what was the problem before achieve this.


Perhaps our procedure for migrating from xen to kvm can cause problem on proxmox.

Any advice will be great

Regards
 
Anyone have seen the same problem before ?

Today we transfer 2 Xen vm to proxmox and all was fine.

We have added a third node and all was fine too.

All migrations between proxmox node was ok


Can someone confirm that we can used shared storage (not shared filesystem) with proxmox whithout risk of corrupted data ?

Reading the docs it seems that should be ok.

Thanks for your help
 
Anyone have seen the same problem before ?

Today we transfer 2 Xen vm to proxmox and all was fine.

We have added a third node and all was fine too.

All migrations between proxmox node was ok


Can someone confirm that we can used shared storage (not shared filesystem) with proxmox whithout risk of corrupted data ?

Reading the docs it seems that should be ok.

Thanks for your help
Hi,
i use shared storage (first with FC-SAN only, now also with drbd) since this option is available (app. 1.5 years) without trouble in production.

Udo
 
OK thanks for your response.
Another question, we use here only one LVM Volume Group and to extend the space we use vgextend and add more luns to the VG.
Do you use the same method ?

Thks
 
OK thanks for your response.
Another question, we use here only one LVM Volume Group and to extend the space we use vgextend and add more luns to the VG.
Do you use the same method ?

Thks
Hi,
it's depends... if the lun on another raid, i nomaly use it as an new lvm-storage.
With drbd-setups i use allways own drdb-devices for each server (to get no trouble in case of a split brain situation).

Udo
 
Hi,

I'm coming again with my strange LVM problem which appears again on my cluster.
We use here scripts with qm command to create new machines.

It seems i encountered the problem when i create a vm (id=202), destroy the vm and recreate a vm using the same vmid (id=202).
On one node i have strange lvm issue with dm devices pointing on bad direction, when i migrate the vm on this node all the migration was ok but when there is some write on disk all fails.
Is there any restriction about reusing free proxmox vm id on a cluster ?

I probably make a mistake but i can't identify the error...

Thks for your help.
 
I have the following for the vmId 202 on the node :

#lvscan
ACTIVE '/dev/vg_guests/vm-202-disk-1' [10,00 GB] inherit

#dmsetup info
Name: vg_guests-vm--202--disk--1
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 254, 9
Number of targets: 1
UUID: LVM-WAnsDzU7yk5qdqZ2tYhbpxZzmnF4Zy5htR2As2foul5nJK0o34IoXxDTFPU5zTeo

All seems ok but when i migrate the 202 vm, the vm fail to run correctly.
This vm host a redmine instance and the application fails.
If i stop the vm and i attempt to restart on the same node, i can't boot, grub doesn't recognize partitions.
If i migrtae to another node, all works fine, the vm start and all is ok.

It seems the device identified by vg_guests-vm--202--disk--1 not pointing the same data than another node...

Regards
 
I can second this issue, we made a backup of VM 112 and restored it again(from the old SAN-iscsi to the new SAN-also iscsi) to node 2.
We live migrated VM 112(a debian VM installed a year ago fresh within proxmox) from node 2 to node 1.
That worked fine, but after a couple of days the disk became read-only, we rebooted the VM and it didn't boot anymore, no grub menu, error 15.
Eventually we migrated(offline) to node 2 again and the VM started again, no problems.

This is on a 1.6 cluster, since we are trying to upgrade to 1.8 now it would be nice to figure out what is wrong here.
 
Ok i'm thinking than with proxmox we can use simple lvm on shared storage....and proxmox assume only one lvm was accessed only by one vm and assume LVM consistency over the cluster using vgscan/lvscan/etc...

We use here multipathing to acces our SAN Luns, do you have the same.

It seems Udo use proxmox and LVM without clvm and that works fine....

I read about clvm long time ago.

Do you have any docs/howto about setting clvm in proxmox.

Regards
 
Ok i'm thinking than with proxmox we can use simple lvm on shared storage....and proxmox assume only one lvm was accessed only by one vm and assume LVM consistency over the cluster using vgscan/lvscan/etc...

We use here multipathing to acces our SAN Luns, do you have the same.

It seems Udo use proxmox and LVM without clvm and that works fine....

I read about clvm long time ago.

Do you have any docs/howto about setting clvm in proxmox.

Regards

Short tutorial:

1. install cman and clvm
2. configure cman in /etc/cluster/cluster.conf (see man cluster.conf) on all nodes
3. start cman and clvm on all nodes
4. enable clvm for given vg (vgchange -cy <your vg>)

example cluster.conf:
<?xml version="1.0"?>
<cluster name="clvm_cluster" config_version="10">

<clusternodes>

<clusternode name="pve186" votes="1" nodeid="1">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.1.186"/>
</method>
</fence>
</clusternode>

<clusternode name="pve185" votes="1" nodeid="2">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.1.185"/>
</method>
</fence>
</clusternode>

<clusternode name="pve184" votes="1" nodeid="3">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.1.184"/>
</method>
</fence>
</clusternode>

<clusternode name="pve183" votes="1" nodeid="4">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.1.183"/>
</method>
</fence>
</clusternode>

<clusternode name="pve182" votes="1" nodeid="5">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.1.182"/>
</method>
</fence>
</clusternode>

<clusternode name="pve180" votes="1" nodeid="6">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.1.180"/>
</method>
</fence>
</clusternode>

</clusternodes>

<fence_daemon clean_start="1" post_fail_delay="3" post_join_delay="3"/>
<fencedevices>
<fencedevice name="manual" agent="fence_manual"/>
</fencedevices>

</cluster>

Notes:
I did not enabled any fencing which is recomended
clvm means not lvm snapshots !! so no online backups with vzdump
cman version in debian lenny is old and buggy, qdisc on top of multipath devices will not work (qdisk is a optional special partition or whole LUN that cman uses to check if it is connected properly to shared storage - very usefull)
 
We use here multipathing to acces our SAN Luns, do you have the same.

It seems Udo use proxmox and LVM without clvm and that works fine....

I've had identical problem on two clusters in two separate data centers, one was using FC storage and multipath, other iscsi first without multipath, and then I've added multipath to protect myself from iscsi reconnects (one reconnect changed iscsi disk name from sda to sdb and my lvm stopped working as dm-mapper was pointing all I/O to sda). It all was fine until one day I have migrated several kvm guest to another node, few of them did not work, I've checked second cluster and it had the same problem with migration. I can risk such data loss so just to be on the safe side I've added clvm. Some people may be fine without it, maybe it's something I did that broke it, I just don't want to risk that power outage will result in half of my kvm guests corrupted.
 
thks for your explanation.

Did you use scripts to create vm on proxmox cluster like we doing here ?

I think that create vm outside the proxmox web interface can lead to "corrupted" LVM data on the cluster...

Anyone can point me to the right way to maintain lvm data synchronized over the cluster. To be more precise, when i must launch vgscan/lvscan during vm creartion if i use scripts to create/migrtae vm ?

regards
 
thks for your explanation.

Did you use scripts to create vm on proxmox cluster like we doing here ?

I think that create vm outside the proxmox web interface can lead to "corrupted" LVM data on the cluster...

Anyone can point me to the right way to maintain lvm data synchronized over the cluster. To be more precise, when i must launch vgscan/lvscan during vm creartion if i use scripts to create/migrtae vm ?

regards

I did everything using proxmox tools and still I had problems. Before that I had one cluster with lvm on top of FC but I broke it with lvextend, so I was very careful not to do any manual "tweaking" and left everything to proxmox.
 
when you say "i broke it with lvextend", can you precise the problem you encountered
we use here vgextend to extend our vg_guests volume group which host all our vm disk...

Thks
 
ok thks for the link.

I have rebooted the "failed" node.
All was good again, all worked fine.
I'm looking for solution to this issue.
Anyone other users use clvm on their proxmox cluster ?

I'm thinking about this solution but is there any impact with proxmox to use clvm (except no snapshoting).
we have transferred 20 vm on our 4 nodes cluster but i must secure the data in order to migrate safely between nodes.

Thks for your help
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!