Proxmox 5.0, iSCSI and LVM clarifications

dfenwick · Jul 16, 2017

I've been testing Proxmox 5.0 for the last few days, experimenting with a few of the machines I have.

I have two Dell EqualLogic PS-M4110 arrays in a Dell m1000e chassis with 9 physical blades also in the blade chassis. The PS-M4110 only supports iSCSI. Each of the blades has an onboard nic and at least one mezzanine card. All nics are 10Gbps, but the nics in the B and C fabric are forced to 1Gbe due to some switch limitations I have in the chassis. The A fabric is full 10Gbe.

I read most of the Proxmox documentation before doing anything with it other than the basics of installing it on blades.

I have the nics split between regular traffic and san traffic. They're a mix of Broadcom NetXtreme II and Intel X520 nics. I have 6 nodes in my cluster and all are configured properly for connectivity. Their enumerations come out to eno1 and eno2 being on the 10Gbe fabric, and the others (the p nics - I'm starting to detest the whole kernel 3.x and 4.x nomenclature they've done on nics) are the 1Gbe traffic nics. The 10Gbe nics are not redundant right now. Only eno1 or eno2 is configured on the san bridge (vmbr1) depending on the blade. The quarter height blades alternate their fabric connectivity due to mezzanine design, but let's just say everything's up properly (it is). The 10Gbe nics are also configured for MTUs of 9k, which was more challenging to figure out how to do the Proxmox way than I had predicted.

I added 5 2TB volumes on my storage array and then, per the storage documentation, created iscsi storage containers for each. I then mapped each of those iscsi storage containers to individual shared lvm storage containers. My first go-round, which caused many issues, was because I didn't quite understand how the lvm storage was created and I ended up with luns in my lvm containers. So I tore all of that down and then created 5 iscsi containers with 5 separate lvm containers, one for each iscsi lun.

This is where my first issue cropped up. After provisioning all of this, it was only available on the PVE manager I was logged into through the web interface. None of the volume groups activated on any of the other servers. After restarting each of the other servers in the cluster, one-at-a-time due to not wanting to lose corosync quarum, they each picked up those lvm volumes, but only after restart. I didn't worry too much about that at the time because I was just poking around. Those logical volumes were called lvm001 through lvm005. Later testing (after everything below I ran into transpired) showed that I had to clear the volume cache on each of the other servers in the cluster (other than the one I was connected to the web server on) with:

Code:

vgscan -v --cache
vgchange -ay vg001

I then created a virtual machine on lvm001 on host A and gave it 50GB of disk space. Installed Centos 6.9 on that virtual machine. Then I did a live clone of that virtual machine to a different host and different logical volume. I installed another 5 virtual machines over a couple days. But what I observed was strange, and I'm hoping someone might be able to enlighten me on the problem.

Over the course of a day, the volume contents started falling out of sync between hosts in the cluster. For example, if I installed a virtual machine on host B, host A would not see the disk if the disk was placed on lvm002 instead of lvm001. It got to the point where I had duplicate disks across my 5 logical volumes due to creating a tearing down virtual machines throughout the process. Eventually VMs became unbootable if I migrated them between my hosts.

Now I'm definitely not a Proxmox expert. I've been evaluating it in my local lab as a test. But I've read most of the documentation and I've been doing this type of stuff for an awful long time (30+ years - I'm actually a software engineer by trade - I'm a Samba team alumni, and have done some kernel work, but not for a long while now.) I'm trying to figure out if I've taken an incorrect path in my storage decision making process or if that's the expected behavior when adding an iscsi storage in a Proxmox cluster. In addition, what can cause the shared logical volumes to fall out of sync across hosts as I've seen? To be honest, I can readily reproduce this problem in my environment. Not that I want to, but if it's necessary I can certainly do so.

I've since moved to a very large single lun (10TB) but I haven't tested it yet. The same problem with the volume group not being added happened with that LUN. In addition, tearing down all of the previous iscsi storage volumes led to a spamming bug on the servers. When iscsi volumes get torn down, you definitely need to tear down the initiator they're using, particularly if those volumes are removed from the server. My san controllers logged 5000 connection errors before I had to reboot my cluster because the iscsi initiators were not torn down from the previous volume connections.

Anyway, Proxmox is pretty awesome. As with most of these types of environments, it has a few warts here and there, but overall I'm impressed by what's been produced. I've already started looking through the source code, because I can think of a few things I'd like to add. Like bulk VM creation in the GUI, some various error checking (should we really be able to do a non-live clone of a running VM?) and some other things. I have quite a lot of experience with ExtJS as well, so that learning curve will be pretty shallow.

Has anyone run into these situations? Are there alternatives to

The obligatory pveversion info:

Code:

pveversion --verbose

proxmox-ve: 5.0-15 (running kernel: 4.10.15-1-pve)
pve-manager: 5.0-23 (running version: 5.0-23/af4267bf)
pve-kernel-4.10.15-1-pve: 4.10.15-15
libpve-http-server-perl: 2.0-5
lvm2: 2.02.168-pve2
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-10
qemu-server: 5.0-12
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-5
libpve-storage-perl: 5.0-12
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-6
pve-qemu-kvm: 2.9.0-2
pve-container: 2.0-14
pve-firewall: 3.0-1
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve2
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90

martin_hd · Jul 22, 2017

I'm sorry, your text is very long. I can confirm, however, that changes in LVM vgroups (such as creation of logical volumes) seem to be not properly synchronized/"seen" across a LVM shared via iSCSI. I can reproduce this with my iSCSI SAN over multipath.

Running vgscan --cache fixes the issue. However, Proxmox VE perl scripts don't seem to do that but make an assumption about LVM2; see Storage/LVMPlugin.pm:

# In LVM2, vgscans take place automatically;
# this is just to be sure
if ($cache->{vgs} && !$cache->{vgscaned} &&
!$cache->{vgs}->{$scfg->{vgname}}) {
$cache->{vgscaned} = 1;
my $cmd = ['/sbin/vgscan', '--ignorelockingfailure', '--mknodes'];
eval { run_command($cmd, outfunc => sub {}); };
warn $@ if $@;
}

Maybe that assumption from the above comment is no longer true under Debian Stretch. Also, as you can see, --cache is missing from the command. This might be a bug in Proxmox (or a misconfiguration on our parts...)

martin_hd · Jul 22, 2017

See https://forum.proxmox.com/threads/proxmox-5-0-iscsi-lvm-unpredictable.35636

dfenwick · Jul 22, 2017

Thanks martin. I replied in that thread. As you said there, I find it a bit disappointing that the Proxmox team didn't seem to notice either of these omissions in PVE 5. This is my first evaluation of Proxmox. While I love the UI and think it *can* be great, I'm leery of doing anything beyond just minor testing and lab work with it.

Search

Search

Proxmox 5.0, iSCSI and LVM clarifications

dfenwick

Active Member

martin_hd

New Member

martin_hd

New Member

dfenwick

Active Member