[SOLVED] Proxmox 5.0 - iSCSI + LVM unpredictable

Underphil

New Member
May 22, 2015
13
0
1
Hey there,

So I've setup up Proxmox 5.0 on a test cluster using LVM over iSCSI on a Nimble device. I set up a LUN and an LVM layer on top using the GUI and created a VM.

Problem 1 : On migrating that VM to another node in the cluster, it errored out saying that the volume group wasn't available on the other node in the cluster. I restarted open-iscsi on that node, and then the volume group appeared and I could migrate the VM.

Problem 2 : I then resized that virtual disk from 32GB to 64GB on the second node. Running 'lsblk' on that node showed that the change *had* been reflected. However I migrated the VM back to the first node, and one that node it was showing as a 32GB LVM volume (in both the guest OS and 'lsblk' on the host).

Any idea what's happening here?

pve-manager/5.0-23/af4267bf (running kernel: 4.10.15-1-pve)

(If it's a simple case of killing all the configuration and starting again, that's fine, but I can't re-install since iDRAC isn't working and they're miles away :D)
 
Problem 2 : I then resized that virtual disk from 32GB to 64GB on the second node. Running 'lsblk' on that node showed that the change *had* been reflected. However I migrated the VM back to the first node, and one that node it was showing as a 32GB LVM volume (in both the guest OS and 'lsblk' on the host).
You forgot to resize the file system inside the VM?
 
You forgot to resize the file system inside the VM?

No, that's not the issue. The actual LVM volume that the VM sits on is being reflected as having two different sizes depending on which host it is migrated to.

Only way I can see to fix that is to stop the VM and restart open-iscsi on all hosts which is obviously not a solution.

Any other thoughts on this issue?
 
Nothing? This works fine on PVE 4.4, VMWare and everything else that uses that Nimble as an iSCSI target.
 
I'm having the same problem on 5.0. It's my first experience with proxmox, and doing anything requiring migration or resizing has unpredictable results. Single lvm, 6 hosts. I've pretty much resorted to not moving anything around until I know what's happening.
 
Just for information, on 4.4 all works fine with the same HW (HP ProLiant DL380 Gen9).
Same servers with 5 have LVM problem.
 
Thanks for confirming that it's not just me, all :)

So I guess the problem lies with either LVM or Open-iSCSI and isn't a Proxmox specific problem. I might try and find some support for that elsewhere to see if there's any kind of known issue here and will report back.

Goes without saying that I'd recommend avoiding PVE 5.x if you have a Nimble device.
 
I upgraded my proxmox4 HP+Nimble test cluster today to proxmox5. Quite confident I pinned down the issue you are all having/seeing.

/etc/lvm/lvm.conf on proxmox4 has the following.

use_lvmetad = 0

/etc/lvm/lvm.conf on proxmox5 has the following

use_lvmetad = 1

Change it to "0" and reboot your front ends. Re-test!
 
Thanks for that info dude, I'll try that on my test hosts and see if it solves it. Will report back.
 
Thanks for that info dude, I'll try that on my test hosts and see if it solves it. Will report back.

Yeah, confirmed that on my side at least the issue has gone away. New machines can be migrated, and resized machines have their sizes accurately reflected.

Many thanks for the work.
 
I'm coming here after losing several containers to this issue: LVM on host1 did not know about the LVs created on hst2, and the same space on my iSCSI LUNs got used twice. I even resized one LV into several other LVs, effectively overwriting their filesystems.

These issues are very easily reproducible:
  • Create a container on host1,
  • migrate it to host2 (which works), then
  • start it on host2: doesn't work because host2 cannot find the LV
With respect, I cannot believe how this went unnoticed into Proxmox 5. This is no exotic configuration but the standard config for Shared LVM users, almost guaranteeing data loss. There should be a big red warning sign for Shared LVM users to set use_lvmetad = 0 (or better, this should be done by Proxmox, as it is done with global_filter).

Maybe this is a dumb question, but wouldn't it be smart to re-use a widely adopted solution such as Cluster LVM (clvm) for Shared LVM instead of an own implementation?

Luckily, my containers were easily restored thanks to Proxmox' handy backup feature. Also, a big "thank you" to adamb for the fix to this issue.
 
I'm coming here after losing several containers to this issue: LVM on host1 did not know about the LVs created on hst2, and the same space on my iSCSI LUNs got used twice. I even resized one LV into several other LVs, effectively overwriting their filesystems.

These issues are very easily reproducible:
  • Create a container on host1,
  • migrate it to host2 (which works), then
  • start it on host2: doesn't work because host2 cannot find the LV
With respect, I cannot believe how this went unnoticed into Proxmox 5. This is no exotic configuration but the standard config for Shared LVM users, almost guaranteeing data loss. There should be a big red warning sign for Shared LVM users to set use_lvmetad = 0 (or better, this should be done by Proxmox, as it is done with global_filter).

Maybe this is a dumb question, but wouldn't it be smart to re-use a widely adopted solution such as Cluster LVM (clvm) for Shared LVM instead of an own implementation?

Luckily, my containers were easily restored thanks to Proxmox' handy backup feature. Also, a big "thank you" to adamb for the fix to this issue.

I agree, would love to see some input from the dev's on how something like this is overlooked. IMO this should have been clear as day in testing.
 
I upgraded my proxmox4 HP+Nimble test cluster today to proxmox5. Quite confident I pinned down the issue you are all having/seeing.

/etc/lvm/lvm.conf on proxmox4 has the following.

use_lvmetad = 0

/etc/lvm/lvm.conf on proxmox5 has the following

use_lvmetad = 1

Change it to "0" and reboot your front ends. Re-test!

This looks as distro difference between Debian9 and Debain8.
 
I've now run for 5 days with the changes made and it's stable. I haven't added any shared LVMs to the infrastructure though. Kind of afraid of blowing up what I have. :)
 
I agree, would love to see some input from the dev's on how something like this is overlooked. IMO this should have been clear as day in testing.

Yeah, it's a little concerning. I'd thought there would be some response to this thread from the developers.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!