problems with shared LVM on top of iSCSI after dropping multipath

linkstat · Nov 20, 2022

Hello.

We have a cluster of 7 nodes. VMs and LXCs are stored on external storage, on a shared LVM, via iSCSI.
We recently had to move from a 2 x 1Gbps iSCSI multipath connection to a 1 x 10 Gbps FO ethernet (single link iSCSI connection, no multiphating).
The iSCSI storage has 4 ethernet ports (2 x 1Gbps + 2 x 10 Gbps), so, to the existing portals (192.168.103.252 and 192.168.104.252), we enable one more of 10Gbps (192.168.101.23).

The plan then consists of modifying, node by node, the configuration referred to iSCSI.

So, we start with the first node, uninstalling multipath: # apt uninstall multipath-tools
Then modifying /etc/iscsi/iscsid.conf by changing the parameter node.session.nr_sessions = 1
And removing the old targets from /etc/iscsi/nodes/ and /etc/iscsi/send_targets/
Then, in /etc/pve/storage.cfg we remove the two iSCSI targets for that node, and added the new iSCSI node in their place.
We restart the node and... neither the LXC nor the VM start.

We check the iSCSI connectivity:

# iscsiadm -m session
tcp: [1] 192.168.101.23:3260,3 iqn.1992-04.com.emc:cx.ckm00190901062.a0 (non-flash)

And see the volume groups

# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
vm-101-disk-0 khalis -wi-a----- 40.00g
...
(a bunch of logical volumes)
...
vm-430-disk-0 khalis -wi-ao---- 10.00g
vm-432-disk-0 khalis -wi-ao---- 10.00g
data pve twi-a-tz-- <810.27g 0.00 10.42
root pve -wi-ao---- 96.00g
swap pve -wi-ao---- 8.00g

mmm... I'm in big trouble (I think). So I'll try something else.

From the Proxmox web interface:

Go to the Hardware / Resources configuration of the LXC that does not start.
Select the Root Disk, and then: Volume Action -> Move Storage.
In Target Storage, I select local (and Delete source option selected).
The task is completed successfully. Now when I try to start the LXC it starts without problems. (Good!)
So, I repeat the whole Move Storage thing, but this time, from the local storage, to the shared LVM.
Again, the task is completed successfully. Now when I try to start the LXC again, it starts without problems. (Very Good!)

Ultimately I think I'll have to do this for every VM and LXC on every node. It's a nightmare, especially since while some VMs weigh a few gigabytes, others weigh quite a few terabytes. But at least I have a procedure to help me carry out the whole plan.

However, I try to start another one of the LXCs and VMs that were not starting before, and now suddenly, all the VMs and LXCs on that node can start.
What happened then?
9
Now I think I'll only have to do this once per Proxmox node (not such a nightmare anymore, just a pain in the ass).

And I have this great doubt: what is this behavior due to? Is this the correct way to do things when you want to ditch multipath and just use iSCSI?

Regards! and thanks for your time.

bbgeek17 · Nov 20, 2022

linkstat said:
So, we start with the first node, uninstalling multipath: # apt uninstall multipath-tools

Its hard to visualize your current end to end state. That said, this is where you went wrong ^^^.
You should not have uninstalled multipath, simply added new path, then removed the two old ones.

You had critical dependency on device name. Specifically your LVM was placed on /dev/mpathX which no longer exists because you uninstalled multipath. Now you have naked devices (/dev/sdX) and you need to track down every place that relied on /dev/mpath.

You probably somehow re-activated your LVM indirectly on new /dev/sdX disks, make sure it still works after reboot.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

linkstat · Nov 21, 2022

Thanks for your answer.

Yes, we had thought about that (leave multipath). But since we are about to add a second storage, built on TrueNAS, we plan to use ZFS over iSCSI storage type, using the TrueNAS ZFS over iSCSI interface from THEGRANDWAZOO, which at least at the moment, does not support multipath.

For now, my first node has already restarted several times (just for testing), and the VMs always autostarted correctly.

On all nodes, the devices are in /dev/mapper/:

Code:

# ls -lh /dev/mapper/
total 0
crw------- 1 root root 10, 236 Nov 20 17:21 control
lrwxrwxrwx 1 root root       7 Nov 20 17:21 khalis-vm--101--disk--0 -> ../dm-8
lrwxrwxrwx 1 root root       8 Nov 20 17:21 khalis-vm--102--disk--0 -> ../dm-36
lrwxrwxrwx 1 root root       7 Nov 20 17:21 khalis-vm--103--disk--0 -> ../dm-5
lrwxrwxrwx 1 root root       7 Nov 20 17:21 khalis-vm--103--disk--1 -> ../dm-6
...
(a bunch of disks)
...
lrwxrwxrwx 1 root root       8 Nov 20 17:21 khalis-vm--888--disk--0 -> ../dm-37
lrwxrwxrwx 1 root root       7 Nov 20 17:21 pve-data -> ../dm-4
lrwxrwxrwx 1 root root       7 Nov 20 17:21 pve-data_tdata -> ../dm-3
lrwxrwxrwx 1 root root       7 Nov 20 17:21 pve-data_tmeta -> ../dm-2
lrwxrwxrwx 1 root root       7 Nov 20 17:21 pve-root -> ../dm-1
lrwxrwxrwx 1 root root       7 Nov 20 17:21 pve-swap -> ../dm-0

Although on neither node is there a /dev/mapper/ directory.

Well. For now, I'll just cross my fingers and go node by node, hoping I don't mess up the entire storage infrastructure.

bbgeek17 · Nov 21, 2022

hopefully you will have more success than this guy https://forum.proxmox.com/threads/z...script-with-truenas-scale.116240/#post-507405

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Search

Search

problems with shared LVM on top of iSCSI after dropping multipath

linkstat

Renowned Member

bbgeek17

Distinguished Member

linkstat

Renowned Member

bbgeek17

Distinguished Member