iSCSI VST vs GST Support (Nimble Target to LUN mapping)

_--James--_ · 2024-11-12T04:21:22+0100

HP Nimble defaults to GST as of NOS 5.x+, we ran into this recently on reinit installs migrating from VMware. GST uses a single target to many LUN mappings. (https://support.hpe.com/hpesc/publi...C-4A77-A096-C3004AD1DB7B.html&docLocale=en_US)

In short...

-PVE 8-2-7 connected to HPE Nimble arrays wont see any LUN ID above 1 as a shared LVM. If the LVM is not shared the LUN will work up to ID 2.
-Was able to replicate this same behavior on a Synology iSCSI setup using a single target with multiple LUNs (exported as LUN0, LUN1, LUN2, LUN3...etc), where any numbered LUN above 1 would fail to bring up the VG if shared, and are able to bring up the VG for LUN2 if it is not shared.
-In all cases issciadm connects to the target, pvesm can see the LUN ids all the way up to the testing group (ID0-ID20) but can only grab on LUN0 or LUN1 for shared LVM.
-In cases were Host1 grabs LUN2 and its shared, it shows up as ? on all other hosts in the cluster, can see the LVM under vgscan but not vgs.
-this shows up in the syslog as soon as the LVM goes ? on any additional hosts for Lun ID2+ "pvestatd[1600]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5"

After iscsiadm does map to LUN1-LUN2 we get this in the syslog every time we replicate this

kernel: sd 1:0:0:1: LUN assignments on this target have changed. The Linux SCSI layer does not automatically remap LUN assignments.
kernel: scsi 1:0:0:2: Direct-Access SYNOLOGY Storage 4.0 PQ: 0 ANSI: 5
kernel: sd 1:0:0:2: Attached scsi generic sg2 type 0
kernel: sd 1:0:0:2: [sdc] 1073741824 512-byte logical blocks: (550 GB/512 GiB)
kernel: sd 1:0:0:2: [sdc] Write Protect is off
kernel: sd 1:0:0:2: [sdc] Mode Sense: 43 00 10 08
kernel: sd 1:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
kernel: sd 1:0:0:2: [sdc] Preferred minimum I/O size 512 bytes
kernel: sd 1:0:0:2: [sdc] Optimal transfer size 16384 logical blocks > dev_max (8192 logical blocks)
kernel: sd 1:0:0:2: [sdc] Attached SCSI disk

So, Does iSCSI used by PVE not support GST?

Why does LUN0 and LUN1 work together in this model but not LUN2+?

Sorry for the really dumb questions, most iSCSI deployments we have done always do target:lun0 mappings, this GST thing does not seem to affect VMware the same way it does for PVE/iscsiadm so it seems this might be something not used outside of VMware shops? All of our Nimble arrays were older OS's upgraded through the years, then volumes migrated to new shelfs. This is the first time we reinit these units wholesale like this..in a very very long time. all our migrated volumes were always VST.

FYI for anyone with this issue - On Nimble we can put the config back to the VST model (from SSH, "group --edit --default_iscsi_target_scope volume")

bbgeek17 · 2024-11-12T04:40:30+0100

Hi @_--James--_ ,
The GST and VST are HPE-specific terms. They are not part of the iSCSI RFC/protocol. As such, there is no specific "support" for either one in Linux iSCSI implementation (which is what PVE is using).

Nevertheless, it's clear what HPE means by those terms. Either a target per disk or many disks per target. Simple enough.

Blockbridge PVE Plugin won't run into this situation, but I am curious about it. Can you provide step-by-step instructions on how you get to the LVM failures? I imagine you are using an iSCSI storage pool configuration, but what is your disk organization? I.e. before you run "vgs" you must have created PV groups? Can you elaborate on that, preferably with command examples?
Best

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

_--James--_ · 2024-11-12T06:42:31+0100

bbgeek17 said:
Hi @_--James--_ ,
The GST and VST are HPE-specific terms. They are not part of the iSCSI RFC/protocol. As such, there is no specific "support" for either one in Linux iSCSI implementation (which is what PVE is using).

Nevertheless, it's clear what HPE means by those terms. Either a target per disk or many disks per target. Simple enough.

Blockbridge PVE Plugin won't run into this situation, but I am curious about it. Can you provide step-by-step instructions on how you get to the LVM failures? I imagine you are using an iSCSI storage pool configuration, but what is your disk organization? I.e. before you run "vgs" you must have created PV groups? Can you elaborate on that, preferably with command examples?
Best

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Just your standard iSCSI entry in datacenter>storage, followed by the LVM entry pointing to the iSCSI object. its all standard configs on the PVE side.
If you have access to a Synology, just create one target in San manager then create a few Luns and point them to the same target. The behavior is exactly the same as if iSCSI was coming from Nimble instead of Synology. Pretty trivial to replicate.

bbgeek17 · 2024-11-12T18:21:09+0100

No, I don't have access to Synology.

Note, I don't know why HPE switched from Disk/Target to Target/Disks. Perhaps, they've run into iSCSI session limits, as reducing that number seems to be the first benefit they mention in the article. There are possible drawbacks to reducing the number of sessions, as now that may become your performance bottleneck for the number of Tasks that could be processed. But that is between HPE and their customers.

PVE/Blockbridge interaction is completely automated so one would not need to manually manage connections, authorizations, or target manipulations.
At the same time, Blockbridge is very flexible, so it's easy to fall back to the manual method.

I've created a single iSCSI target/connection with 16 disks:

root@pve-1:~# lsscsi
[12:0:0:0] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdd
[12:0:0:1] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdf
[12:0:0:2] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdh
[12:0:0:3] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdj
[12:0:0:4] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdl
[12:0:0:5] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdm
[12:0:0:6] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdq
[12:0:0:7] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sds
[12:0:0:8] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdu
[12:0:0:9] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdw
[12:0:0:10] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdx
[12:0:0:11] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdz
[12:0:0:12] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdab
[12:0:0:13] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdad
[12:0:0:14] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdag
[12:0:0:15] disk B*BRIDGE SECURE DRIVE 6.1 /dev/sdai

Here is a corresponding listing from PVE iSCSI storage pool:

root@pve-1:~# pvesm list blockbridge-direct
Volid Format Type Size VMID
blockbridge-direct:0.0.0.scsi-SB_BRIDGE_SECURE_DRIVE_287e5552-98dd-4c06-b990-4b1636c712e6 raw images 107374182400
blockbridge-direct:0.0.1.scsi-SB_BRIDGE_SECURE_DRIVE_79c62c18-38c1-4647-ade3-6ee0041f06b3 raw images 11811160064
blockbridge-direct:0.0.10.scsi-SB_BRIDGE_SECURE_DRIVE_97b6b917-08e2-45c3-86b5-56229d5aab75 raw images 15032385536
blockbridge-direct:0.0.11.scsi-SB_BRIDGE_SECURE_DRIVE_c44d2dee-3188-4f9b-8083-af7f6ba15208 raw images 16106127360
blockbridge-direct:0.0.12.scsi-SB_BRIDGE_SECURE_DRIVE_d14b3d09-836e-4f4d-8410-a53bfd1f2f67 raw images 17179869184
blockbridge-direct:0.0.13.scsi-SB_BRIDGE_SECURE_DRIVE_eb49f510-33b7-4d96-99ac-c2c555e109b5 raw images 18253611008
blockbridge-direct:0.0.14.scsi-SB_BRIDGE_SECURE_DRIVE_d7ef320f-8b12-4bca-b388-95a5a56cb0f9 raw images 19327352832
blockbridge-direct:0.0.15.scsi-SB_BRIDGE_SECURE_DRIVE_00b25ef4-31b6-4498-97aa-628e7e3985f3 raw images 20401094656
blockbridge-direct:0.0.2.scsi-SB_BRIDGE_SECURE_DRIVE_41b09104-988d-44cf-be7a-1cacf3872075 raw images 118111600640
blockbridge-direct:0.0.3.scsi-SB_BRIDGE_SECURE_DRIVE_3bd28a1e-3e3b-48a8-95ff-4d7f3b1113b5 raw images 119185342464
blockbridge-direct:0.0.4.scsi-SB_BRIDGE_SECURE_DRIVE_5f996c89-4e6f-4aaf-aa02-35f2cd07c272 raw images 120259084288
blockbridge-direct:0.0.5.scsi-SB_BRIDGE_SECURE_DRIVE_f0f57ca9-76c5-42fe-9af6-6ba188df2e93 raw images 121332826112
blockbridge-direct:0.0.6.scsi-SB_BRIDGE_SECURE_DRIVE_a1b90b91-2ae4-4f6b-b3d4-85a009a24b5f raw images 122406567936
blockbridge-direct:0.0.7.scsi-SB_BRIDGE_SECURE_DRIVE_cad6c274-0a76-4701-98ef-a9ba4f2be22c raw images 123480309760
blockbridge-direct:0.0.8.scsi-SB_BRIDGE_SECURE_DRIVE_657397cc-da19-4066-9335-d00b80262798 raw images 12884901888
blockbridge-direct:0.0.9.scsi-SB_BRIDGE_SECURE_DRIVE_917799e0-ebae-43b4-9731-5a29857cc52e raw images 13958643712

added LVM pools for each LUN:

for line in $(pvesm list direct|awk '{print $1}'|grep -v volid);do lun=$(echo $line|cut -d '.' -f 3);pvesm add lvm lvm$lun --base $line --shared 1 --vgname vg$lun;done
pvesm add lvm lvm0 --base blockbridge-direct:0.0.0.scsi-SB_BRIDGE_SECURE_DRIVE_287e5552-98dd-4c06-b990-4b1636c712e6 --shared 1 --vgname vg0
pvesm add lvm lvm1 --base blockbridge-direct:0.0.1.scsi-SB_BRIDGE_SECURE_DRIVE_79c62c18-38c1-4647-ade3-6ee0041f06b3 --shared 1 --vgname vg1
pvesm add lvm lvm10 --base blockbridge-direct:0.0.10.scsi-SB_BRIDGE_SECURE_DRIVE_97b6b917-08e2-45c3-86b5-56229d5aab75 --shared 1 --vgname vg10
pvesm add lvm lvm11 --base blockbridge-direct:0.0.11.scsi-SB_BRIDGE_SECURE_DRIVE_c44d2dee-3188-4f9b-8083-af7f6ba15208 --shared 1 --vgname vg11
pvesm add lvm lvm12 --base blockbridge-direct:0.0.12.scsi-SB_BRIDGE_SECURE_DRIVE_d14b3d09-836e-4f4d-8410-a53bfd1f2f67 --shared 1 --vgname vg12
pvesm add lvm lvm13 --base blockbridge-direct:0.0.13.scsi-SB_BRIDGE_SECURE_DRIVE_eb49f510-33b7-4d96-99ac-c2c555e109b5 --shared 1 --vgname vg13
pvesm add lvm lvm14 --base blockbridge-direct:0.0.14.scsi-SB_BRIDGE_SECURE_DRIVE_d7ef320f-8b12-4bca-b388-95a5a56cb0f9 --shared 1 --vgname vg14
pvesm add lvm lvm15 --base blockbridge-direct:0.0.15.scsi-SB_BRIDGE_SECURE_DRIVE_00b25ef4-31b6-4498-97aa-628e7e3985f3 --shared 1 --vgname vg15
pvesm add lvm lvm2 --base blockbridge-direct:0.0.2.scsi-SB_BRIDGE_SECURE_DRIVE_41b09104-988d-44cf-be7a-1cacf3872075 --shared 1 --vgname vg2
pvesm add lvm lvm3 --base blockbridge-direct:0.0.3.scsi-SB_BRIDGE_SECURE_DRIVE_3bd28a1e-3e3b-48a8-95ff-4d7f3b1113b5 --shared 1 --vgname vg3
pvesm add lvm lvm4 --base blockbridge-direct:0.0.4.scsi-SB_BRIDGE_SECURE_DRIVE_5f996c89-4e6f-4aaf-aa02-35f2cd07c272 --shared 1 --vgname vg4
pvesm add lvm lvm5 --base blockbridge-direct:0.0.5.scsi-SB_BRIDGE_SECURE_DRIVE_f0f57ca9-76c5-42fe-9af6-6ba188df2e93 --shared 1 --vgname vg5
pvesm add lvm lvm6 --base blockbridge-direct:0.0.6.scsi-SB_BRIDGE_SECURE_DRIVE_a1b90b91-2ae4-4f6b-b3d4-85a009a24b5f --shared 1 --vgname vg6
pvesm add lvm lvm7 --base blockbridge-direct:0.0.7.scsi-SB_BRIDGE_SECURE_DRIVE_cad6c274-0a76-4701-98ef-a9ba4f2be22c --shared 1 --vgname vg7
pvesm add lvm lvm8 --base blockbridge-direct:0.0.8.scsi-SB_BRIDGE_SECURE_DRIVE_657397cc-da19-4066-9335-d00b80262798 --shared 1 --vgname vg8
pvesm add lvm lvm9 --base blockbridge-direct:0.0.9.scsi-SB_BRIDGE_SECURE_DRIVE_917799e0-ebae-43b4-9731-5a29857cc52e --shared 1 --vgname vg9

At this point I may have gotten closer to the state you were in:

/sbin/vgscan --ignorelockingfailure --mknodes
Found volume group "vg13" using metadata type lvm2
Found volume group "vg5" using metadata type lvm2
Found volume group "vg12" using metadata type lvm2
Found volume group "vg4" using metadata type lvm2
Found volume group "vg11" using metadata type lvm2
Found volume group "vg3" using metadata type lvm2
Found volume group "vg10" using metadata type lvm2
Found volume group "lvm2" using metadata type lvm2
Found volume group "vg9" using metadata type lvm2
Found volume group "vg1" using metadata type lvm2
Found volume group "vg8" using metadata type lvm2
Found volume group "vg0" using metadata type lvm2
Found volume group "vg15" using metadata type lvm2
Found volume group "vg7" using metadata type lvm2
Found volume group "pve" using metadata type lvm2
Found volume group "vg14" using metadata type lvm2
Found volume group "vg6" using metadata type lvm2
Command failed with status code 5.

Note, your situation may be different.
Interestingly, the failure is on the root disk.

for i in $(pvscan|awk '{print $2}');do echo -n "$i " ;vgscan --mknodes -d -v --devices $i;done
/dev/sdp Found volume group "vg8" using metadata type lvm2
/dev/sdn Found volume group "vg7" using metadata type lvm2
/dev/sdm Found volume group "vg6" using metadata type lvm2
/dev/sdk Found volume group "vg5" using metadata type lvm2
/dev/sdy Found volume group "vg15" using metadata type lvm2
/dev/sdi Found volume group "vg4" using metadata type lvm2
/dev/sdx Found volume group "vg14" using metadata type lvm2
/dev/sdw Found volume group "vg13" using metadata type lvm2
/dev/sdg Found volume group "vg3" using metadata type lvm2
/dev/sdv Found volume group "vg12" using metadata type lvm2
/dev/sde Found volume group "lvm2" using metadata type lvm2
/dev/sdt Found volume group "vg11" using metadata type lvm2
/dev/sds Found volume group "vg10" using metadata type lvm2
/dev/sdc Found volume group "vg1" using metadata type lvm2
/dev/nvme0n1p3 Found volume group "pve" using metadata type lvm2
Command failed with status code 5.
/dev/sdq Found volume group "vg9" using metadata type lvm2
/dev/sda Found volume group "vg0" using metadata type lvm2

So a spurious failure on the root disk blocks further interaction here. This article: https://bugzilla.redhat.com/show_bug.cgi?id=1828617 informs us that mknodes is not meant for systems with Udev. Proxmox uses Udev. In fact, minor code modification of the LVM plugin bypasses the problem and unblocks VM migrations and other operations.

At this point, since this situation does not affect Blockbridge integration you are welcome to use my research to pursue the solution via your support entitlement with PVE and/or HPE.

PS presence of multipath greatly affects all steps. Do you not use multipath with your Nimble?

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Search

Search

iSCSI VST vs GST Support (Nimble Target to LUN mapping)

_--James--_

Member

bbgeek17

Distinguished Member

_--James--_

Member

bbgeek17

Distinguished Member