Removing iSCSI disk

NdK73 · Apr 3, 2015

Hello all.

Seems the procedure is still mostly manual and the same since Proxmox 1 (ive years and two major releases):
http://forum.proxmox.com/threads/3578-Removal-of-iSCSI-disk-causes-system-to-hang (last message contains the details).

Currently, after you've removed both the LVM and its iSCSI backing from the GUI, you still can not disconnect the iSCSI server, or your nodes will hang! Actually, it seems "just" pvscan (pvs) hangs because it still tries to access the iSCSI storage. But that means the GUI marks the node as "dead" and VMs can not be online-migrated to other hosts.

Could it be possible to manage the logout & delete steps together with the removal from the GUI?

Code:

iscsiadm -m node -T $IQN --logout
iscsiadm -m node -T $IQN --op=delete

Seems those are the only two really needed ops (well, GUI should also check there are no active virtual disks and disable the VG...).

That could save big headaches to many users that need to add iscsi just temporarily...

NdK73 · Apr 3, 2015

Urgh... Seems those two commands are not enough: I just had to "resurrect" my 3-node cluster

A GUI(ded) procedure is highly desiderable! Removing iSCSI storage is (currently) not for the faint of heart.

m.ardito · Apr 3, 2015

NdK73 said:
Could it be possible to manage the logout & delete steps together with the removal from the GUI?

happened the same to me and others. already asked here, no feedback, wish you luck

Marco

NdK73 · Apr 14, 2015

Writing down the procedure I used, for future reference. Hoping the devs consider fixing the scripts.
Maybe the case of multipath is not (yet) correctly handled (well, actually it needs quite some manual fiddling with config files for setup!).

Be really sure no VM is using the storage you're going to remove (even as "inactive disk"!)
Remove affected LVM (VG) and iSCSI from Datacenter -> Storage
Connect via SSH (important! if something goes wrong, GUI could freeze!) to every node and:
1. multipath -l to identify the multipath device (in my case it was mp_MD3800i)
2. multipath -f mp_MD3800i
3. multipath -l to check the multipathed device actually disappeared; if it's still there, you are doing something wrong and if you proceed you'll probably damage your cluster! Repeat from start!
4. iscsiadm -m session to determine the iqn name to use (mine was iqn.1984-05.com.dell:...)
5. iscsiadm -m node -T iqn.1984-05.com.dell:... --logout
6. iscsiadm -m node -T iqn.1984-05.com.dell:... --op=delete
now you can safely unmap the iSCSI target (do not delete it!) -- I assigned the LUN to another group via MDSM.

After 4, from a node try to run pvs : if it hangs, add the mapping back and check again! If you don't do that, your cluster will hang soon!

TwiX · Sep 6, 2017

Hi,

Seems that we must install multipath ? It's not installed on my v3.3 nodes.
Is this procedure is still the right one ?

Thanks,

Antoine

Adam Smith · Mar 7, 2018

I have two nodes in a cluster with multipath to the iSCSI device and I had to do steps 5 and 6 on both of them. Thank you for these instructions.

NdK73 said:
Writing down the procedure I used, for future reference. Hoping the devs consider fixing the scripts.
Maybe the case of multipath is not (yet) correctly handled (well, actually it needs quite some manual fiddling with config files for setup!).

Be really sure no VM is using the storage you're going to remove (even as "inactive disk"!)

Remove affected LVM (VG) and iSCSI from Datacenter -> Storage

Connect via SSH (important! if something goes wrong, GUI could freeze!) to every node and:

multipath -l to identify the multipath device (in my case it was mp_MD3800i)

multipath -f mp_MD3800i

multipath -l to check the multipathed device actually disappeared; if it's still there, you are doing something wrong and if you proceed you'll probably damage your cluster! Repeat from start!

iscsiadm -m session to determine the iqn name to use (mine was iqn.1984-05.com.dell:...)

iscsiadm -m node -T iqn.1984-05.com.dell:... --logout

iscsiadm -m node -T iqn.1984-05.com.dell:... --op=delete

now you can safely unmap the iSCSI target (do not delete it!) -- I assigned the LUN to another group via MDSM.

After 4, from a node try to run pvs : if it hangs, add the mapping back and check again! If you don't do that, your cluster will hang soon!

kalasnikov · Mar 12, 2022

NdK73 said:
Hello all.

Seems the procedure is still mostly manual and the same since Proxmox 1 (ive years and two major releases):
http://forum.proxmox.com/threads/3578-Removal-of-iSCSI-disk-causes-system-to-hang (last message contains the details).

Currently, after you've removed both the LVM and its iSCSI backing from the GUI, you still can not disconnect the iSCSI server, or your nodes will hang! Actually, it seems "just" pvscan (pvs) hangs because it still tries to access the iSCSI storage. But that means the GUI marks the node as "dead" and VMs can not be online-migrated to other hosts.

Could it be possible to manage the logout & delete steps together with the removal from the GUI?

Code:

iscsiadm -m node -T $IQN --logout iscsiadm -m node -T $IQN --op=delete

Seems those are the only two really needed ops (well, GUI should also check there are no active virtual disks and disable the VG...).

That could save big headaches to many users that need to add iscsi just temporarily...

Hello,

I'm currently on the latest release and might have to do this. I have a CentOS node with 3 targets, and I removed one of them but still have proxmox trying to make connections to it? It's very strange I'm wondering if I run the command will it cause issues with the other iSCSI targets?

mistersuite · Apr 8, 2022

NdK73 said:
Writing down the procedure I used, for future reference. Hoping the devs consider fixing the scripts.
Maybe the case of multipath is not (yet) correctly handled (well, actually it needs quite some manual fiddling with config files for setup!).

Be really sure no VM is using the storage you're going to remove (even as "inactive disk"!)

Remove affected LVM (VG) and iSCSI from Datacenter -> Storage

Connect via SSH(important! if something goes wrong, GUI could freeze!) to every node and:

multipath -l to identify the multipath device (in my case it was mp_MD3800i)

multipath -f mp_MD3800i

multipath -l to check the multipathed device actually disappeared; if it's still there, you are doing something wrong and if you proceed you'll probably damage your cluster! Repeat from start!

iscsiadm -m session to determine the iqn name to use (mine was iqn.1984-05.com.dell:...)

iscsiadm -m node -T iqn.1984-05.com.dell:... --logout

iscsiadm -m node -T iqn.1984-05.com.dell:... --op=delete

now you can safely unmap the iSCSI target (do not delete it!) -- I assigned the LUN to another group via MDSM.

After 4, from a node try to run pvs : if it hangs, add the mapping back and check again! If you don't do that, your cluster will hang soon!

Thank you for documenting your process, I'm having no end of trouble getting multipath set up and working using a Synology UC3200 as the SAN. Thankfully the cluster isn't deployed yet so this is still in the testing phase, but I can't understand why this is so challenging yet the Wiki for Multipath on the ProxMox side is about as brief as can be.

I've been searching the forums and collating what is seemingly a common thread, if you can get it working it is simple, if not it hangs and causes timeouts. I got it working once inside a test PVE VM, then applied what I thought was the working config only to kill the system when the MP sessions start or the node reboots.

I've attached a screenshot of the errors but I do not have enough detailed knowledge to resolve this is seems. If anyone can shed any light I would be extremely grateful.

Tom.

bbgeek17 · Apr 8, 2022

There are few important things you have to keep in mind for your situation:

1) PVE is a Hypervisor suite based on Debian.
2) Adding storage to PVE is, in most cases, is the same as adding it to Debian Linux, or most other Linux flavors.
3) Your SAN vendor is your best resource for information on iSCSI and Multipath.
4) The wiki/guide is basic because there is such a variety of configuration and vendors that only high level advice can be safely given.

The best approach is to:
a) Find the directions from your vendor on configuring iSCSI client with Multipath
b) Tell PVE about the available DM device

As to your errors, they indicate a network connectivity problem and do not appear to have anything to do with original topic of the thread. I recommend opening a new one and providing more information about your configuration to get help.

Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

mistersuite · Apr 11, 2022

bbgeek17 said:
There are few important things you have to keep in mind for your situation:

1) PVE is a Hypervisor suite based on Debian.
2) Adding storage to PVE is, in most cases, is the same as adding it to Debian Linux, or most other Linux flavors.
3) Your SAN vendor is your best resource for information on iSCSI and Multipath.
4) The wiki/guide is basic because there is such a variety of configuration and vendors that only high level advice can be safely given.

The best approach is to:
a) Find the directions from your vendor on configuring iSCSI client with Multipath
b) Tell PVE about the available DM device

As to your errors, they indicate a network connectivity problem and do not appear to have anything to do with original topic of the thread. I recommend opening a new one and providing more information about your configuration to get help.

Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Thank you for your reply. All taken on board and I will do just that. I appreciate this is mildly off-topic so again, thanks for responding with some constructive feedback.

NdK73 · Apr 15, 2022

Sorry for being late.
I'd add: check your /etc/lvm/lvm.conf file, especially the global_filter line -- if you get warnings about it being ignored, you have to correct the file.

In my case, I had to comment out the last 3 lines (added by pve-manager) and include their content in the "master" global_filter line, that now is:

Code:

global_filter = [ "a|/dev/disk/by-path/pci-0000\:00\:1f.2-ata-.*-part[0-9]*|", "r|/dev/[shz]d.*|", "r|/dev/mapper/pve-.*|", "r|/dev/mapper/.*-vm--[0-9]+--disk--[0-9]+|" ]

(and remember to update initramfs!)
If you don't do that, lvm often finds direct block devices instead of the multipath-managed ones and you won't have functioning multipath.
I'ts also important to "blacklist" VM diskd from lvm to avoid FS corruptions.

mistersuite · Apr 19, 2022

NdK73 said:
Sorry for being late.
I'd add: check your /etc/lvm/lvm.conf file, especially the global_filter line -- if you get warnings about it being ignored, you have to correct the file.

In my case, I had to comment out the last 3 lines (added by pve-manager) and include their content in the "master" global_filter line, that now is:

Code:

global_filter = [ "a|/dev/disk/by-path/pci-0000\:00\:1f.2-ata-.*-part[0-9]*|", "r|/dev/[shz]d.*|", "r|/dev/mapper/pve-.*|", "r|/dev/mapper/.*-vm--[0-9]+--disk--[0-9]+|" ]

(and remember to update initramfs!)
If you don't do that, lvm often finds direct block devices instead of the multipath-managed ones and you won't have functioning multipath.
I'ts also important to "blacklist" VM diskd from lvm to avoid FS corruptions.

Thanks for your reply, I'm going to have to read through a few times and look at some documentation to completely understand what is required for my host(s) and configure accordingly.

NdK73 · Apr 20, 2022

See also https://forum.proxmox.com/threads/kernel-5-13-19-4-pve-breaks-e1000e-networking.108141/ : I'm having troubles with newer kernels trying to scan "passive" devices

mistersuite · Apr 20, 2022

Oh excellent, sounds interesting, thank you for linking. Sorry to see that you're almost experiencing similar headaches as myself. My iSCSI target, for reference, is a Synology UC3200 and I believe it uses Open iSCSI underneath the hood and I have a ticket open with them as well to try and figure out if it is a host or target issue. Assuming it is host but ruling it out is never a bad thing.

EDIT:

I've been looking through my SYSLOG and came across the following information which shows the ProxMox host initiating the connection to my iSCSI target, all seems fine until it defaults to an IPv6 method, which I don't have an issue with, but it appears it can't communicate. I've not done further testing as I am signing off from work for the day but it is something I will look at more tomorrow.

Excerpt of the SYSLOG below fyi.

Code:

Apr 19 14:24:53 prx01 kernel: scsi host16: iSCSI Initiator over TCP/IP
Apr 19 14:24:53 prx01 kernel: scsi host17: iSCSI Initiator over TCP/IP
Apr 19 14:24:53 prx01 kernel: scsi 16:0:0:1: Direct-Access     SYNOLOGY Storage          4.0  PQ: 0 ANSI: 5
Apr 19 14:24:53 prx01 kernel: scsi 16:0:0:1: alua: supports implicit TPGS
Apr 19 14:24:53 prx01 kernel: scsi 16:0:0:1: alua: device naa.600140566386187de4c8d413ddbea1de port group 2 rel port 25
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: Attached scsi generic sg6 type 0
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: [sdf] 1073741824 512-byte logical blocks: (550 GB/512 GiB)
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: [sdf] Write Protect is off
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: [sdf] Mode Sense: 43 00 10 08
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: [sdf] Optimal transfer size 1048576 bytes
Apr 19 14:24:53 prx01 kernel: scsi 17:0:0:1: Direct-Access     SYNOLOGY Storage          4.0  PQ: 0 ANSI: 5
Apr 19 14:24:53 prx01 kernel: scsi 17:0:0:1: alua: supports implicit TPGS
Apr 19 14:24:53 prx01 kernel: scsi 17:0:0:1: alua: device naa.600140566386187de4c8d413ddbea1de port group 1 rel port 25
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: Attached scsi generic sg7 type 0
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: [sdg] 1073741824 512-byte logical blocks: (550 GB/512 GiB)
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: [sdg] Write Protect is off
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: [sdg] Mode Sense: 43 00 10 08
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: [sdg] Optimal transfer size 1048576 bytes
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: alua: transition timeout set to 60 seconds
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: alua: port group 02 state N non-preferred supports TOlUSNA
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: alua: transition timeout set to 60 seconds
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: alua: port group 01 state A non-preferred supports TOlUSNA
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: [sdg] Attached SCSI disk
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: [sdf] Attached SCSI disk
Apr 19 14:24:53 prx01 multipathd[1331050]: ssd01: addmap [0 1073741824 multipath 0 1 alua 1 1 service-time 0 1 1 8:96 1]
Apr 19 14:24:53 prx01 multipathd[1331050]: sdg [8:96]: path added to devmap ssd01
Apr 19 14:24:53 prx01 multipath[2841668]: ssd01: adding new path sdg
Apr 19 14:24:53 prx01 systemd[1]: Starting LVM event activation on device 253:5...
Apr 19 14:24:54 prx01 iscsid[1329047]: cannot make a connection to fe80::211:32ff:feb0:a48a:3260 (-1,22)
Apr 19 14:24:54 prx01 iscsid[1329047]: Connection5:0 to [target: iqn.2000-01.com.synology:ses-uc3200.ssd02.bfbeaf9c4b, portal: 10.10.50.221,3260] through [iface: default] is operational now
Apr 19 14:24:54 prx01 iscsid[1329047]: cannot make a connection to fe80::211:32ff:feb0:a610:3260 (-1,22)
Apr 19 14:24:54 prx01 iscsid[1329047]: Connection6:0 to [target: iqn.2000-01.com.synology:ses-uc3200.ssd02.bfbeaf9c4b, portal: 10.10.50.220,3260] through [iface: default] is operational now
Apr 19 14:24:54 prx01 multipathd[1331050]: ssd01: performing delayed actions
Apr 19 14:24:55 prx01 multipathd[1331050]: ssd01: reload [0 1073741824 multipath 0 1 alua 2 1 service-time 0 1 1 8:96 1 service-time 0 1 1 8:80 1]
Apr 19 14:24:59 prx01 iscsid[1329047]: cannot make a connection to fe80::211:32ff:feb0:a48a:3260 (-1,22)
Apr 19 14:24:59 prx01 iscsid[1329047]: cannot make a connection to fe80::211:32ff:feb0:a610:3260 (-1,22)
Apr 19 14:25:02 prx01 iscsid[1329047]: cannot make a connection to fe80::211:32ff:feb0:a48a:3260 (-1,22)
Apr 19 14:25:02 prx01 iscsid[1329047]: cannot make a connection to fe80::211:32ff:feb0:a610:3260 (-1,22)

Search

Search

Removing iSCSI disk

NdK73

Renowned Member

NdK73

Renowned Member

m.ardito

Famous Member

NdK73

Renowned Member

TwiX

Renowned Member

Adam Smith

Active Member

kalasnikov

New Member

mistersuite

Member

Attachments

bbgeek17

Distinguished Member

mistersuite

Member

NdK73

Renowned Member

mistersuite

Member

NdK73

Renowned Member

mistersuite

Member

We value your privacy