removing iscsi multipath lun

tvtue

New Member
Nov 22, 2024
21
5
3
Hello all,

either if you use the multipath lun directly in a vm or as basis for a shared lvm, what is the recommended or the correct way to remove the lun?

What I am assuming here as a prerequisite is the setup using the Multipath guide from the proxmox wiki. For using the lun directly in a vm I am deviating from the wiki guide. right before setting up the lvm with pv- and vgcreate. Instead of that, I go to the webgui and add an iscsi storage. It's not possible to add both portals to get all path to the san in the webgui but eventually the multipath device is beeing used.

root@sm01a:~# multipath -ll
3600d023100090e2f4bc30e5e6638deab dm-10 IFT,DS 4000 Series
size=1.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 15:0:0:1 sdc 8:32 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
`- 16:0:0:1 sdd 8:48 active ready running
3600d023100090e2f778a2c9d58c14d2c dm-9 IFT,DS 4000 Series
size=4.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 16:0:0:0 sda 8:0 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
`- 15:0:0:0 sdb 8:16 active ready running

root@sm01a:~# pvs
PV VG Fmt Attr PSize PFree
/dev/mapper/3600d023100090e2f778a2c9d58c14d2c tbedeonssd lvm2 a-- <4,00t <3,97t
/dev/nvme0n1 ceph-0b4719ad-0a69-4a81-9ac0-fb82465f3103 lvm2 a-- 894,25g 0
/dev/nvme1n1 ceph-900cc3ec-f36f-4457-bf8f-adde756b83ab lvm2 a-- 894,25g 0
/dev/nvme2n1 ceph-e9d46a51-4722-46f1-b130-535f82fcaf2d lvm2 a-- 894,25g 0
/dev/nvme3n1p3 pve lvm2 a-- 893,25g 16,00g

root@sm01a:~# vgs
VG #PV #LV #SN Attr VSize VFree
ceph-0b4719ad-0a69-4a81-9ac0-fb82465f3103 1 1 0 wz--n- 894,25g 0
ceph-900cc3ec-f36f-4457-bf8f-adde756b83ab 1 1 0 wz--n- 894,25g 0
ceph-e9d46a51-4722-46f1-b130-535f82fcaf2d 1 1 0 wz--n- 894,25g 0
pve 1 3 0 wz--n- 893,25g 16,00g
tbedeonssd 1 1 0 wz--n- <4,00t <3,97t

dir: local
path /var/lib/vz
content iso,vztmpl,backup,import
shared 0

lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images

lvm: tbedeonssd
vgname tbedeonssd
content rootdir,images
saferemove 0
shared 1
snapshot-as-volume-chain 1

iscsi: tbedeonhdd
portal 10.27.34.201
target iqn.2002-10.com.infortrend:raid.uid593455.201
content images

boot: order=scsi0;ide2
cores: 1
cpu: x86-64-v3
ide2: none,media=cdrom
memory: 2048
meta: creation-qemu=10.0.2,ctime=1758265574
name: test
net0: virtio=BC:24:11:18:C6:CC,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: tbedeonssd:vm-100-disk-0.qcow2,iothread=1,size=32G
scsi1: tbedeonhdd:0.0.1.scsi-3600d023100090e2f4bc30e5e6638deab,iothread=1,size=1T
scsihw: virtio-scsi-single
smbios1: uuid=a6b19da2-bfdb-4a1a-802f-3af3c547d1bf
sockets: 2
vmgenid: be12f9ac-b31b-4084-9f9f-41324373903b

root@sm01a:~# ps auwwx | grep 3600d023100090e2f4bc30e5e6638deab
root 148885 8.0 0.0 2942712 36708 ? Sl 13:22 0:04 /usr/bin/kvm -id 100 -name test,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/100.pid -daemonize -smbios type=1,uuid=a6b19da2-bfdb-4a1a-802f-3af3c547d1bf -smp 2,sockets=2,cores=1,maxcpus=2 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/100.vnc,password=on -cpu qemu64,+abm,+aes,+avx,+avx2,+bmi1,+bmi2,enforce,+f16c,+fma,+kvm_pv_eoi,+kvm_pv_unhalt,+movbe,+pni,+popcnt,+sse4.1,+sse4.2,+ssse3,+xsave -m 2048 -object iothread,id=iothread-virtioscsi0 -object {"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"} -object iothread,id=iothread-virtioscsi1 -object {"id":"throttle-drive-scsi1","limits":{},"qom-type":"throttle-group"} -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5 -device vmgenid,guid=be12f9ac-b31b-4084-9f9f-41324373903b -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:8b598711fd8b -device ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=101 -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -blockdev {"detect-zeroes":"on","discard":"ignore","driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","discard-no-unref":true,"driver":"qcow2","file":{"aio":"native","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"host_device","filename":"/dev/tbedeonssd/vm-100-disk-0.qcow2","node-name":"ec133714301aa66bd97367546b2f287","read-only":false},"node-name":"fc133714301aa66bd97367546b2f287","read-only":false},"node-name":"drive-scsi0","read-only":false,"throttle-group":"throttle-drive-scsi0"} -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,device_id=drive-scsi0,bootindex=100,write-cache=on -device virtio-scsi-pci,id=virtioscsi1,bus=pci.3,addr=0x2,iothread=iothread-virtioscsi1 -blockdev {"detect-zeroes":"on","discard":"ignore","driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"host_device","filename":"/dev/disk/by-id/scsi-3600d023100090e2f4bc30e5e6638deab","node-name":"e030ce5621bf2cd16b5d09df445e0fc","read-only":false},"node-name":"f030ce5621bf2cd16b5d09df445e0fc","read-only":false},"node-name":"drive-scsi1","read-only":false,"throttle-group":"throttle-drive-scsi1"} -device scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1,device_id=drive-scsi1,write-cache=on -netdev type=tap,id=net0,ifname=tap100i0,script=/usr/libexec/qemu-server/pve-bridge,downscript=/usr/libexec/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=BC:24:11:18:C6:CC,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,host_mtu=1500 -machine type=pc+pve1

root@sm01a:~# ls -l /dev/disk/by-id/scsi-3600d023100090e2f4bc30e5e6638deab
lrwxrwxrwx 1 root root 11 26. Okt 13:08 /dev/disk/by-id/scsi-3600d023100090e2f4bc30e5e6638deab -> ../../dm-10

root@sm01a:~# ls -l /dev/dm-10
brw-rw---- 1 root disk 252, 10 26. Okt 13:08 /dev/dm-10

root@sm01a:~# ls -l /dev/mapper/3600d023100090e2f4bc30e5e6638deab
lrwxrwxrwx 1 root root 8 26. Okt 13:08 /dev/mapper/3600d023100090e2f4bc30e5e6638deab -> ../dm-10

root@sm01a:~# multipath -ll 3600d023100090e2f4bc30e5e6638deab
3600d023100090e2f4bc30e5e6638deab dm-10 IFT,DS 4000 Series
size=1.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 15:0:0:1 sdc 8:32 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
`- 16:0:0:1 sdd 8:48 active ready running

First, you would make sure your data was moved away from the lun of course, i.e. no vm should use it in any ways. When you were using the multipath device as a pv for a shared lvm, you would also remove that storage from the pve storage configuration.

As next step you would remove the multipath device with
Code:
multipath -ll
multipath -f <WWID>
multipath -w <WWID>
multipath -r
on all pve cluster nodes.

Then one could do
Code:
blockdev --flushbufs /dev/sd<X>
blockdev --flushbufs /dev/sd<Y>
echo 1 > /sys/block/sd<X>/device/delete
echo 1 > /sys/block/sd<Y>/device/delete
With sdX and sdY beeing the two iscsi devices beloging to that multipath device.
But shortly after deleting the scsi devices from the kernel with echo 1 > /sys/block/sdX/device/delete they reappear because the iscsi target still presents them to the initiator in the iscsi session. So in fact that is useless, isn't it?

You would then proceed to your iscsi san and unmap the lun so it would not be visible to the pve hosts any more.

Maybe an iscsi session rescan on all pve nodes should follow as a last step, but I am not sure if that would be necessary.

I think it is better to NOT use the multipath setting find_multipath yes because otherwise the multipath device would be reconstructed right after the echo 1 > /sys/block/sdX/device/delete command and the representing of the scsi devices from within the iscsi session. You would need to unmap the lun on the san device without beeing able to remove it from the pve nodes first. -> Race condition.

I was testing with find_multipath yes but I encountered "status unknown" for the storage in the pve webgui and also I had "map or partition in use“ when I issued multipath -f <STALE_LUN_ID>. So I had to use dmsetup remove -f <STALE_LUN_ID> to get rid of it.

What do you think about this? Am I missing something? How do you do that workflow?

Thank you and regards,
Timo
 
But shortly after deleting the scsi devices from the kernel with echo 1 > /sys/block/sdX/device/delete they reappear because the iscsi target still presents them to the initiator in the iscsi session. So in fact that is useless, isn't it?
Once you removed all higher level dependencies on the disk, ie there is no IO as checked by iostat, you can now disconnect/remove iSCSI sessions.
Then you can work through your steps of removing the DM device and so on, including dmsetup.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Hello bbgeek17,

thank you for your reply!
I forgot to mention, that I cannot disconnect the iscsi session by logging out from the iscsi node because there are still other luns in use from there.

Regards,
Timo
 
I have exactly the same issue, and Multi-path SCSI document does not elaborate on this regard anything as well, as debian/linux is underlying OS behind PVE GUI, seems like we need to dig into debian forums to find the solution to this.
Just disconnecting the target is not feasible as author mentioned, because one SAN can present multiple LUNs to the PVE
 
Every situation is unique, sure. You can, may be, unmap the LUN from the target on the SAN side

If your steps work, go with them.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I would not agree about this situation is being unique, this is the most common and classic model in enterprise modular deployment (non-HCI virtualization) when your hypervisor hosts are connected to the shared storage via two dedicated network fabrics and have two/four/six/eight separate paths to the same LUN and PVE is not really here yet to help you with your job.
Deployment manual is wiggly mentioned how to add storage but nothing about removing volumes. Asking you to go to the each host's CLI and manually install iSCSI and multipath packages from debian repository, then manually write a suitable multipath.conf file, then again manually find your wwnid, manually add the wwnid into two different places (blacklist exlude rule inside the multipath.conf and then run multipath -a <wwnid> to specify your multipath device as well) and then restart multipathd service and cross your fingers that mpathX device will be recognized and assembled from a bunch of /dev/sdX devices correctly. Then, to utilize this iSCSI LUN you have to return to your GUI and select surprize! "add LVM" within the GUI instead of iSCSI (which does not make any sense for the first time, but later you will realize that adding "iSCSI" means to add the iSCSI portal/connection but not the specific iSCSI LUN volume). If Proxmox would hire me I would implement Networking and Storage management in the first place, but seems like their leaders have another priorities as to v9.x those two basic things are still not implemented. Other competitors trying to make the work of the basic things like network, storage, clustering first and then play with other SDN, CEPH and other high tier complexity crap later. Do not believe me? just try to configure host management interface to specific VLAN tag and make it HA on top of the bond and you will see how all you cluster setup and replication will fall apart immediatly. In ESXi you can do this in the setup stage right from text installator.
 
Hi @ogrimia , welcome to the forum.

You raised some strong points, and several of them certainly have merit. From what you’ve written, it sounds like you have the skills to make a real impact - why not consider applying for a position?
https://www.proxmox.com/en/about/about-us/careers

Since the project is open source and free to use, you can also contribute ideas, time, or code without any formal commitment:
https://www.proxmox.com/en/about/open-source/developers

In my experience, however limited it may be, SMBs and smaller environments using legacy SAN devices often default to a single RAID group and a single large LUN, and rarely revisit the configuration afterward. Larger enterprises naturally have more extensive arrays, dedicated storage teams, and the expertise to safely remove storage from Linux hosts.

That said, I’m sure the PVE team is continuously working to improve overall usability.

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox