[SOLVED] Proxmox 5.0 - iSCSI + LVM unpredictable

Underphil · Jul 12, 2017

Hey there,

So I've setup up Proxmox 5.0 on a test cluster using LVM over iSCSI on a Nimble device. I set up a LUN and an LVM layer on top using the GUI and created a VM.

Problem 1 : On migrating that VM to another node in the cluster, it errored out saying that the volume group wasn't available on the other node in the cluster. I restarted open-iscsi on that node, and then the volume group appeared and I could migrate the VM.

Problem 2 : I then resized that virtual disk from 32GB to 64GB on the second node. Running 'lsblk' on that node showed that the change *had* been reflected. However I migrated the VM back to the first node, and one that node it was showing as a 32GB LVM volume (in both the guest OS and 'lsblk' on the host).

Any idea what's happening here?

pve-manager/5.0-23/af4267bf (running kernel: 4.10.15-1-pve)

(If it's a simple case of killing all the configuration and starting again, that's fine, but I can't re-install since iDRAC isn't working and they're miles away

)

mir · Jul 12, 2017

Underphil said:
Problem 2 : I then resized that virtual disk from 32GB to 64GB on the second node. Running 'lsblk' on that node showed that the change *had* been reflected. However I migrated the VM back to the first node, and one that node it was showing as a 32GB LVM volume (in both the guest OS and 'lsblk' on the host).

You forgot to resize the file system inside the VM?

Underphil · Jul 12, 2017

mir said:
You forgot to resize the file system inside the VM?

No, that's not the issue. The actual LVM volume that the VM sits on is being reflected as having two different sizes depending on which host it is migrated to.

Only way I can see to fix that is to stop the VM and restart open-iscsi on all hosts which is obviously not a solution.

Any other thoughts on this issue?

Underphil · Jul 17, 2017

Nothing? This works fine on PVE 4.4, VMWare and everything else that uses that Nimble as an iSCSI target.

adamb · Jul 17, 2017

Underphil said:
Nothing? This works fine on PVE 4.4, VMWare and everything else that uses that Nimble as an iSCSI target.

We are a nimble/HP shop. I am hoping to get pve5 on a test cluster real soon. Seeing this I guess I should do it sooner than later.

dfenwick · Jul 17, 2017

I'm having the same problem on 5.0. It's my first experience with proxmox, and doing anything requiring migration or resizing has unpredictable results. Single lvm, 6 hosts. I've pretty much resorted to not moving anything around until I know what's happening.

juniper · Jul 18, 2017

Just for information, on 4.4 all works fine with the same HW (HP ProLiant DL380 Gen9).
Same servers with 5 have LVM problem.

Underphil · Jul 18, 2017

Thanks for confirming that it's not just me, all

So I guess the problem lies with either LVM or Open-iSCSI and isn't a Proxmox specific problem. I might try and find some support for that elsewhere to see if there's any kind of known issue here and will report back.

Goes without saying that I'd recommend avoiding PVE 5.x if you have a Nimble device.

adamb · Jul 19, 2017

I upgraded my proxmox4 HP+Nimble test cluster today to proxmox5. Quite confident I pinned down the issue you are all having/seeing.

/etc/lvm/lvm.conf on proxmox4 has the following.

use_lvmetad = 0

/etc/lvm/lvm.conf on proxmox5 has the following

use_lvmetad = 1

Change it to "0" and reboot your front ends. Re-test!

Underphil · Jul 19, 2017

Thanks for that info dude, I'll try that on my test hosts and see if it solves it. Will report back.

Underphil · Jul 19, 2017

Underphil said:
Thanks for that info dude, I'll try that on my test hosts and see if it solves it. Will report back.

Yeah, confirmed that on my side at least the issue has gone away. New machines can be migrated, and resized machines have their sizes accurately reflected.

Many thanks for the work.

martin_hd · Jul 22, 2017

I'm coming here after losing several containers to this issue: LVM on host1 did not know about the LVs created on hst2, and the same space on my iSCSI LUNs got used twice. I even resized one LV into several other LVs, effectively overwriting their filesystems.

These issues are very easily reproducible:

Create a container on host1,
migrate it to host2 (which works), then
start it on host2: doesn't work because host2 cannot find the LV

With respect, I cannot believe how this went unnoticed into Proxmox 5. This is no exotic configuration but the standard config for Shared LVM users, almost guaranteeing data loss. There should be a big red warning sign for Shared LVM users to set use_lvmetad = 0 (or better, this should be done by Proxmox, as it is done with global_filter).

Maybe this is a dumb question, but wouldn't it be smart to re-use a widely adopted solution such as Cluster LVM (clvm) for Shared LVM instead of an own implementation?

Luckily, my containers were easily restored thanks to Proxmox' handy backup feature. Also, a big "thank you" to adamb for the fix to this issue.

adamb · Jul 24, 2017

martin_hd said:
I'm coming here after losing several containers to this issue: LVM on host1 did not know about the LVs created on hst2, and the same space on my iSCSI LUNs got used twice. I even resized one LV into several other LVs, effectively overwriting their filesystems.

These issues are very easily reproducible:

Create a container on host1,

migrate it to host2 (which works), then

start it on host2: doesn't work because host2 cannot find the LV

With respect, I cannot believe how this went unnoticed into Proxmox 5. This is no exotic configuration but the standard config for Shared LVM users, almost guaranteeing data loss. There should be a big red warning sign for Shared LVM users to set use_lvmetad = 0 (or better, this should be done by Proxmox, as it is done with global_filter).

Maybe this is a dumb question, but wouldn't it be smart to re-use a widely adopted solution such as Cluster LVM (clvm) for Shared LVM instead of an own implementation?

Luckily, my containers were easily restored thanks to Proxmox' handy backup feature. Also, a big "thank you" to adamb for the fix to this issue.

I agree, would love to see some input from the dev's on how something like this is overlooked. IMO this should have been clear as day in testing.

martin_hd · Jul 24, 2017

A patch has been created and applied this morning. I saw it on the pve-devel mailing list.

adamb · Jul 24, 2017

martin_hd said:
A patch has been created and applied this morning. I saw it on the pve-devel mailing list.

Good stuff, appreciate the information. I should have caught that myself.

czechsys · Jul 24, 2017

adamb said:
I upgraded my proxmox4 HP+Nimble test cluster today to proxmox5. Quite confident I pinned down the issue you are all having/seeing.

/etc/lvm/lvm.conf on proxmox4 has the following.

use_lvmetad = 0

/etc/lvm/lvm.conf on proxmox5 has the following

use_lvmetad = 1

Change it to "0" and reboot your front ends. Re-test!

This looks as distro difference between Debian9 and Debain8.

dfenwick · Jul 24, 2017

I've now run for 5 days with the changes made and it's stable. I haven't added any shared LVMs to the infrastructure though. Kind of afraid of blowing up what I have.

Underphil · Jul 26, 2017

adamb said:
I agree, would love to see some input from the dev's on how something like this is overlooked. IMO this should have been clear as day in testing.

Yeah, it's a little concerning. I'd thought there would be some response to this thread from the developers.

Underphil · Jul 26, 2017

martin_hd said:
I'm coming here after losing several containers to this issue

Same thing happened to me-- Lost a ton of VMs to this as well. Only testing stuff mind you.

[SOLVED] Proxmox 5.0 - iSCSI + LVM unpredictable

New Member

Famous Member

New Member

New Member

Famous Member

Active Member

Renowned Member

New Member

Famous Member

New Member

New Member

New Member

Famous Member

New Member

Famous Member

Renowned Member

Active Member

New Member

New Member

We value your privacy