Validation and questions about iSCSI Config

jsterr

Famous Member
Jul 24, 2020
895
267
108
34
Hello Members :-)

Im not very experienced in integrating iscsi into pve so I had my first configuration and there where some unexpected behaviours. In the end everything including HA-Tests worked but there where some questions regarding the process of integration, which might help also integrating this into the Multipath Guide if needed: https://pve.proxmox.com/wiki/Multipath#iSCSI

Thats my multipath.conf

Code:
root@pve1:~# cat /etc/multipath.conf
defaults {
        polling_interval 5
        checker_timeout 15
        find_multipaths strict
        user_friendly_names yes
        enable_foreign nvme
}
devices {
        device {
                vendor DellEMC
                product PowerStore
                detect_prio "yes"
                path_selector "queue-length 0"
                path_grouping_policy "group_by_prio"
                path_checker tur
                failback immediate
                fast_io_fail_tmo 5
                no_path_retry 3
                rr_min_io_rq 1
                max_sectors_kb 1024
                dev_loss_tmo 10
                hardware_handler "1 alua"
        }
}
blacklist {
    devnode "sd[a-h]$"

}

blacklisting is there because ceph is also used on the system.

These are the www-id files from the /etc/multipath/wwids file

Code:
root@pve1:~# cat /etc/multipath/wwids
# Multipath wwids, Version : 1.0
# NOTE: This file is automatically maintained by multipath and multipathd.
# You should not need to edit this file in normal circumstances.
#
# Valid WWIDs:
#/3500a075150392adf/
#/35000cca04fb46f94/
#/35000cca04fb4a36c/
#/35002538b097b6bb0/
#/35002538b097b6a30/
#/35000cca04fb4daac/
#/35000cca04fb4bd7c/
/368ccf09800fc69d0987f2076fa063e6b/
#/350025380a4c61db0/
/368ccf09800042f77bfddd1b0bb7bf57b/

I tested with 2 LUNs from a DELL Storage. I integrated all the steps from the Multipath Wikipost and got this:
Code:
root@pve1:~# multipath -ll
mpathi (368ccf09800fc69d0987f2076fa063e6b) dm-6 DellEMC,PowerStore
size=5.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| |- 15:0:0:1 sdk 8:160 active ready running
| `- 16:0:0:1 sdl 8:176 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
  |- 13:0:0:1 sdi 8:128 active ready running
  `- 14:0:0:1 sdj 8:144 active ready running
mpathk (368ccf09800042f77bfddd1b0bb7bf57b) dm-7 DellEMC,PowerStore
size=5.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| |- 13:0:0:2 sdo 8:224 active ready running
| `- 14:0:0:2 sdn 8:208 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
  |- 15:0:0:2 sdm 8:192 active ready running
  `- 16:0:0:2 sdp 8:240 active ready running

Looks fine. Somehow only the first LUN is called /dev/mapper/mpathi on all 3 nodes. the second LUN which was created today has a different name on each node (mpathk on node1, mpathl on node2 and mpathj on node3) - can anyone explain if this is correct or wrong or if it does matter?

I also wanted to create a LVM ontop of iscsi and did via webui and cli, both brought up the following error:

Code:
create storage failed: vgcreate iscsi-500t-vg /dev/disk/by-id/scsi-368ccf09800fc69d0987f2076fa063e6b error: Cannot use device /dev/mapper/mpathi with duplicates. (500)

I found a solution for this by adding a lvm filter config. Is this needed? and if yes, can this be included into the wikipost?
Code:
cp /etc/lvm/lvm.conf /etc/lvm/lvm.conf.backup.$(date +%Y%m%d-%H%M%S)
nano /etc/lvm/lvm.conf

add inside the devices {} blocks

# only /dev/mapper/mpath* und /dev/sdc-h allowed for lvm config
    filter = [ "a|^/dev/mapper/mpath.*|", "a|^/dev/sd[c-h]$|", "r|.*|" ]
    global_filter = [ "a|^/dev/mapper/mpath.*|", "a|^/dev/sd[c-h]$|", "r|.*|" ]
  
pvs --config 'devices { filter = [ "a|^/dev/mapper/mpath.*|", "a|^/dev/sd[c-h]$|", "r|.*|" ] }'
pvscan --cache

After that adding an lvm via webui worked for both the LUN1 with having /dev/mapper/mpathi on all 3 nodes but ALSO on LUN2 with having different /dev/mapper/mpath-Devices (k,l,j). Thats how the storage.cfg looks like, I added each portal to the datacenter storage config:
Code:
iscsi: iscsi
        portal 10.10.1.71
        target iqn.2015-10.com.dell:dellemc-powerstore-crk00230109862-a-0c343331
        content none
        nodes pve2,pve3,pve1

iscsi: iscsi-2
        portal 10.10.1.71
        target iqn.2015-10.com.dell:dellemc-powerstore-crk00230109862-a-7b5a2c77
        content none

iscsi: iscsi-3
        portal 10.10.1.71
        target iqn.2015-10.com.dell:dellemc-powerstore-crk00230109862-b-3dd609dd
        content none

iscsi: iscsi-4
        portal 10.10.1.71
        target iqn.2015-10.com.dell:dellemc-powerstore-crk00230109862-b-5ef4a37e
        content none

lvm: iscsi-pve-storage-1
        vgname iscsi-500t-vg1
        base iscsi:0.0.1.scsi-368ccf09800fc69d0987f2076fa063e6b
        content images,rootdir
        saferemove 0
        shared 1
        snapshot-as-volume-chain 1

lvm: iscsi-pve-storage-2
        vgname iscsi-500t-vg2
        base iscsi:0.0.2.scsi-368ccf09800042f77bfddd1b0bb7bf57b
        content rootdir,images
        saferemove 0
        shared 1
        snapshot-as-volume-chain 1


Both pools work regarding: High-Availabilty, Storage-Migration, Link-Loss (tried with ifdown on the iscsi-connections) - allthough the second pools has different /dev/mapper-devices on each node - this is the thing which confuses me. Can anyone explain? Is the setup correct and are there things which you would do different?
 
Last edited:
Mr iSCSI @bbgeek17 or @LnxBil (saw that you know about the lvm filter thing) can you explain and check the post, could be a improvement for the docs. Any help is really appreciated, thanks!
 
Looks fine. Somehow only the first LUN is called /dev/mapper/mpathi on all 3 nodes. the second LUN which was created today has a different name on each node (mpathk on node1, mpathl on node2 and mpathj on node3) - can anyone explain if this is correct or wrong or if it does matter?
If over time you have tried multiple experiments, the letter based device may already have been used by something. Kernel will allocate new device on each node independently. The disk signature will be identical across the nodes, so the letter is not important, unless you are creating, for example, /etc/fstab entry that uses disk letter and the entry is the same across all nodes. In this case, of course, some nodes will not find /mpathM because it does not exist there.

I also wanted to create a LVM ontop of iscsi and did via webui and cli, both brought up the following error:

create storage failed: vgcreate iscsi-500t-vg /dev/disk/by-id/scsi-368ccf09800fc69d0987f2076fa063e6b error: Cannot use device /dev/mapper/mpathi with duplicates. (500)
I think it means that LVM/Kernel sees devices in addition to those contained in Multipath device (e6b mpathi) . Essentially there is a duplicate conflict.
The fact that it started working when you restricted LVM to look only at certain devices is potentially a confirmation.

It is also possible that the LVM cache is incorrect, and a rescan or reboot would have fixed the duplicates error.
I found a solution for this by adding a lvm filter config. Is this needed? and if yes, can this be included into the wikipost?
It should not be needed.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
  • Like
Reactions: Johannes S
Thanks for the reply!
  • I did use wwid via /etc/multipath/wwids (multipath -a WWID) on every node for the lun I wanted to add
  • the mpath devices were automatically created, they appeared after using multipath -a, multipath -r and multipath -ll
  • We used the /etc/multipath.conf from dell-docs - the only thing we changed was: find_multipaths no to find_multipaths strict -> might this be the reason its acting so strange?
  • the vgcreate uses the by-id path which also matches the wwid (368ccf09800fc69d0987f2076fa063e6b) the command was automatically used by using the lvm wizard via web-ui. unfortunately theres not more output to it then what I posted.
I also did tests by turning of some paths via ifdown, and everything stayed OK and responsive :-/

Edit:
  • The dell guide also says:
Create a Physical Volume (PV) and Volume Group (VG) on top of the multipath device by running the following commands:
root@chost-02:~# pvcreate /dev/mapper/mpathb<br>
Physical volume "/dev/mapper/mpathb" successfully created.
But I did not do this as we used the webui, which used the vgcreate from above.
 
Last edited:
From the man page:

wwids_file The full pathname of the wwids file, which is used by multipath to keep track of the wwids for LUNs it has created multipath devices on in the past. Defaults to /etc/multi‐ path/wwids

-a add the wwid for the specified device to the wwids file

Please read the above carefully. The file and command refer to WWID of the LUN, which is a SCSI device presented by SAN.

Im not sure what you are referring to. The file is called /etc/multipath/wwids? and you need to add the wwid of the scsi-disk (Volume on Dell-Storage) presented from the dell storage, which I did by checking the disks from lsblk (the ones from dell) and using /lib/udev/scsi_id -g -u -d /dev/sdX
after that I added the wwid (which is all the same on all 4 devices presented from dell emc) via multipath -a WWID (only once) like it is also mentioned here: https://pve.proxmox.com/wiki/Multipath#Add_WWIDs_to_the_WWIDs_file

The *e6b is for the physical LUN, not Mpath device.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
OK. Maybe I should have created it via cli, instead of web-ui. But isnt *e6b mentioned here as the id of the multipath device?

Code:
root@pve1:~# multipath -ll
mpathi (368ccf09800fc69d0987f2076fa063e6b) dm-6 DellEMC,PowerStore
size=5.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| |- 15:0:0:1 sdk 8:160 active ready running
| `- 16:0:0:1 sdl 8:176 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
  |- 13:0:0:1 sdi 8:128 active ready running
  `- 14:0:0:1 sdj 8:144 active ready running

Thanks for the guide! I will check it out (not today) and will report back. iSCSI feels way more complex then Ceph for me :)
 
Last edited:
Someone has a idea to further troubleshoot this? The lvm filter should not be needed - the steps I took match with the guide from proxmox. Is my storage.cfg correct with the 4 iscsi-entries?

Code:
iscsi: iscsi
        portal 10.10.1.71
        target iqn.2015-10.com.dell:dellemc-powerstore-crk00230109862-a-0c343331
        content none
        nodes pve2,pve3,pve1

iscsi: iscsi-2
        portal 10.10.1.71
        target iqn.2015-10.com.dell:dellemc-powerstore-crk00230109862-a-7b5a2c77
        content none

iscsi: iscsi-3
        portal 10.10.1.71
        target iqn.2015-10.com.dell:dellemc-powerstore-crk00230109862-b-3dd609dd
        content none

iscsi: iscsi-4
        portal 10.10.1.71
        target iqn.2015-10.com.dell:dellemc-powerstore-crk00230109862-b-5ef4a37e
        content none

Thanks for the help. If I cant fix this, how can I reset the iscsi configuration completly on the proxmox site? so its like a new system, which never had iscsi configured before?
 
If I cant fix this, how can I reset the iscsi configuration completly on the proxmox site
"wipefs -a" one of the devices in the group, if you still cant access mpath device. Remove the iSCSI storage pools, remove any nodes/sessions with iscsiadm, reboot the node, optionally remove/re-init the LUNs on SAN side.
Someone has a idea to further troubleshoot this?
run "vgcreate" with --debug and --verbose
-d|--debug ...
Set debug level. Repeat from 1 to 6 times to increase the detail of messages sent to the log file and/or syslog (if configured).

-v|--verbose ...
Set verbose level. Repeat from 1 to 4 times to increase the detail of messages sent to stdout and stderr.

Is my storage.cfg correct with the 4 iscsi-entries?
You have each LUN exposed via two targets on the same IP Portal. It seems strange to me.

I recommend taking the PVE out of the mix and using iscsiadm for node/session configuration. If you are still having issues - contact Dell/EMC for help.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
Multipath is setup correctly if the devices are appearing using multipath -ll, that layer is fine.

It seems to be the volume enumeration layer that is causing issues. If you check under /dev/mapper there should be links using that WWID's of each of those volumes to a dm device.

/dev/mapper/WWID -> ../dm-7 (or some other number)

That would be the physical link you need to use when creating volumes. After volume creation you need to get the UUID from the volume then use that when doing the /etc/fstab entry for persistent mounts. I've found that the Proxmox web UI management for iSCSI based volumes to be very hit and miss and always suggest people manage it on the command prompt and ensure each host has the same fstab entry.
 
It is unlikely that @jsterr is/was planning to use fstab/filesystem with his PVE Cluster in shared storage configuration.

Doesn't matter, the OS is going to do that anyway. Proxmox is just managing a suite of technologies that exist on linux, and storage is managed by the kernel + various kernel drivers and OS API hooks. ZFS, NFS, iSCSI, OCFS2, LVM, all of it boils down to just a mount that the hypervisor or abstraction layer references. Most of this is hidden under the hood but it's visible if you poke around. Because of how unreliable the Proxmox web UI can be with the more complicated storage setups, I always recommend people to manage it directly and not use the web UI.

https://kb.blockbridge.com/technote/proxmox-lvm-shared-storage/

LVM is not a clustering aware volume manager nor or any of the common file systems clustering aware file systems. LVM on shared storage tries to get around this by only having one host access one volume at a time and then map that volume directly into a VM (or container) as raw storage. When this happens you can see it the device on the host OS. If using shared storage it's really better to use a clustering aware file system and just have all the nodes access it directly, but that requires a bit extra configuration. Otherwise NFSv4 or Ceph are more compatible with Linux native technologies.
 
Hey @palladin9479 , I am happy to see that our article on PVE/LVM and Shared storage was helpful in your understanding!


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Yes it's much easier to follow from start to finish. I think the bigger picture is that none of these tools come from proxmox and are instead just Debian / Linux tools that proxmox attempts to automate. Learning how those underlying tools work assists greatly in building and troubleshooting configurations.
 
Yes it's much easier to follow from start to finish. I think the bigger picture is that none of these tools come from proxmox and are instead just Debian / Linux tools that proxmox attempts to automate. Learning how those underlying tools work assists greatly in building and troubleshooting configurations.
You made some valid statements, but I don't see their relevance to @jsterr's thread

He has 2 LUNs, that appear to be seen properly by Multipath, despite somewhat suspicious base iSCSI configuration. He ran direct LVM tool on the LUN that complained about duplicate being present.

There are no indication of duplicates that we've seen so far. Running the same LVM command with debug/verbose options may be helpful in identifying why the system is not happy.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Johannes S
Looks fine. Somehow only the first LUN is called /dev/mapper/mpathi on all 3 nodes. the second LUN which was created today has a different name on each node (mpathk on node1, mpathl on node2 and mpathj on node3) - can anyone explain if this is correct or wrong or if it does matter?


/dev/mapper/mpathx are just dynamic aliases created by multipathd based on order of device discovery, they are not guaranteed to be consistent across hosts or even across reboots. Adding or deleting iSCSI targets will also cause them to be reordered, they really only exist for simple non-clustered setups as an easier way to look at devices instead of using the WWID's.

https://linux.die.net/man/5/multipath.conf

My advice to people when dealing with mpxio / multipathd is to just use the WWID's as those are intrinsically part of a volume and will not change. If you really want a unique name for each volume then you need to use the alias directive in the multipath section to create one.

multipaths {
multipath {
wwid 360000000000000000e00000000030001
alias yellow
}
multipath {
wwid 360000000000000000e00000000020001
alias blue
}
multipath {
wwid 360000000000000000e00000000010001
alias red
}
}

Ensure the exact same file is placed on all the nodes in the cluster and reload multipathd config with multipath -r

Now every node can associate the same path to the same device

These commands are super helpful for ensure that each node can properly see all required paths to each of the iSCSI targets.

lsblk
lsblk -o NAME,LABEL,UUID
lsscsi -i -s
 
Last edited:
  • Like
Reactions: jsterr