Removing iSCSI disk

NdK73

Renowned Member
Jul 19, 2012
93
5
73
Bologna, Italy
www.csshl.net
Hello all.

Seems the procedure is still mostly manual and the same since Proxmox 1 (ive years and two major releases):
http://forum.proxmox.com/threads/3578-Removal-of-iSCSI-disk-causes-system-to-hang (last message contains the details).

Currently, after you've removed both the LVM and its iSCSI backing from the GUI, you still can not disconnect the iSCSI server, or your nodes will hang! Actually, it seems "just" pvscan (pvs) hangs because it still tries to access the iSCSI storage. But that means the GUI marks the node as "dead" and VMs can not be online-migrated to other hosts.

Could it be possible to manage the logout & delete steps together with the removal from the GUI?
Code:
iscsiadm -m node -T $IQN --logout
iscsiadm -m node -T $IQN --op=delete
Seems those are the only two really needed ops (well, GUI should also check there are no active virtual disks and disable the VG...).

That could save big headaches to many users that need to add iscsi just temporarily...
 
Urgh... Seems those two commands are not enough: I just had to "resurrect" my 3-node cluster :(

A GUI(ded) procedure is highly desiderable! Removing iSCSI storage is (currently) not for the faint of heart.
 
Writing down the procedure I used, for future reference. Hoping the devs consider fixing the scripts.
Maybe the case of multipath is not (yet) correctly handled (well, actually it needs quite some manual fiddling with config files for setup!).


  1. Be really sure no VM is using the storage you're going to remove (even as "inactive disk"!)
  2. Remove affected LVM (VG) and iSCSI from Datacenter -> Storage
  3. Connect via SSH (important! if something goes wrong, GUI could freeze!) to every node and:
    1. multipath -l to identify the multipath device (in my case it was mp_MD3800i)
    2. multipath -f mp_MD3800i
    3. multipath -l to check the multipathed device actually disappeared; if it's still there, you are doing something wrong and if you proceed you'll probably damage your cluster! Repeat from start!
    4. iscsiadm -m session to determine the iqn name to use (mine was iqn.1984-05.com.dell:...)
    5. iscsiadm -m node -T iqn.1984-05.com.dell:... --logout
    6. iscsiadm -m node -T iqn.1984-05.com.dell:... --op=delete
  4. now you can safely unmap the iSCSI target (do not delete it!) -- I assigned the LUN to another group via MDSM.

After 4, from a node try to run pvs : if it hangs, add the mapping back and check again! If you don't do that, your cluster will hang soon!
 
Hi,

Seems that we must install multipath ? It's not installed on my v3.3 nodes.
Is this procedure is still the right one ?

Thanks,

Antoine
 
I have two nodes in a cluster with multipath to the iSCSI device and I had to do steps 5 and 6 on both of them. Thank you for these instructions.

Writing down the procedure I used, for future reference. Hoping the devs consider fixing the scripts.
Maybe the case of multipath is not (yet) correctly handled (well, actually it needs quite some manual fiddling with config files for setup!).


  1. Be really sure no VM is using the storage you're going to remove (even as "inactive disk"!)
  2. Remove affected LVM (VG) and iSCSI from Datacenter -> Storage
  3. Connect via SSH (important! if something goes wrong, GUI could freeze!) to every node and:
    1. multipath -l to identify the multipath device (in my case it was mp_MD3800i)
    2. multipath -f mp_MD3800i
    3. multipath -l to check the multipathed device actually disappeared; if it's still there, you are doing something wrong and if you proceed you'll probably damage your cluster! Repeat from start!
    4. iscsiadm -m session to determine the iqn name to use (mine was iqn.1984-05.com.dell:...)
    5. iscsiadm -m node -T iqn.1984-05.com.dell:... --logout
    6. iscsiadm -m node -T iqn.1984-05.com.dell:... --op=delete
  4. now you can safely unmap the iSCSI target (do not delete it!) -- I assigned the LUN to another group via MDSM.

After 4, from a node try to run pvs : if it hangs, add the mapping back and check again! If you don't do that, your cluster will hang soon!
 
Hello all.

Seems the procedure is still mostly manual and the same since Proxmox 1 (ive years and two major releases):
http://forum.proxmox.com/threads/3578-Removal-of-iSCSI-disk-causes-system-to-hang (last message contains the details).

Currently, after you've removed both the LVM and its iSCSI backing from the GUI, you still can not disconnect the iSCSI server, or your nodes will hang! Actually, it seems "just" pvscan (pvs) hangs because it still tries to access the iSCSI storage. But that means the GUI marks the node as "dead" and VMs can not be online-migrated to other hosts.

Could it be possible to manage the logout & delete steps together with the removal from the GUI?
Code:
iscsiadm -m node -T $IQN --logout
iscsiadm -m node -T $IQN --op=delete
Seems those are the only two really needed ops (well, GUI should also check there are no active virtual disks and disable the VG...).

That could save big headaches to many users that need to add iscsi just temporarily...
Hello,

I'm currently on the latest release and might have to do this. I have a CentOS node with 3 targets, and I removed one of them but still have proxmox trying to make connections to it? It's very strange I'm wondering if I run the command will it cause issues with the other iSCSI targets?
 
Writing down the procedure I used, for future reference. Hoping the devs consider fixing the scripts.
Maybe the case of multipath is not (yet) correctly handled (well, actually it needs quite some manual fiddling with config files for setup!).


  1. Be really sure no VM is using the storage you're going to remove (even as "inactive disk"!)
  2. Remove affected LVM (VG) and iSCSI from Datacenter -> Storage
  3. Connect via SSH(important! if something goes wrong, GUI could freeze!) to every node and:
    1. multipath -l to identify the multipath device (in my case it was mp_MD3800i)
    2. multipath -f mp_MD3800i
    3. multipath -l to check the multipathed device actually disappeared; if it's still there, you are doing something wrong and if you proceed you'll probably damage your cluster! Repeat from start!
    4. iscsiadm -m session to determine the iqn name to use (mine was iqn.1984-05.com.dell:...)
    5. iscsiadm -m node -T iqn.1984-05.com.dell:... --logout
    6. iscsiadm -m node -T iqn.1984-05.com.dell:... --op=delete
  4. now you can safely unmap the iSCSI target (do not delete it!) -- I assigned the LUN to another group via MDSM.

After 4, from a node try to run pvs : if it hangs, add the mapping back and check again! If you don't do that, your cluster will hang soon!
Thank you for documenting your process, I'm having no end of trouble getting multipath set up and working using a Synology UC3200 as the SAN. Thankfully the cluster isn't deployed yet so this is still in the testing phase, but I can't understand why this is so challenging yet the Wiki for Multipath on the ProxMox side is about as brief as can be.

I've been searching the forums and collating what is seemingly a common thread, if you can get it working it is simple, if not it hangs and causes timeouts. I got it working once inside a test PVE VM, then applied what I thought was the working config only to kill the system when the MP sessions start or the node reboots.

I've attached a screenshot of the errors but I do not have enough detailed knowledge to resolve this is seems. If anyone can shed any light I would be extremely grateful.

Tom.
 

Attachments

  • ProxMox_iscsi_multipath_issue.png
    ProxMox_iscsi_multipath_issue.png
    60.5 KB · Views: 28
There are few important things you have to keep in mind for your situation:

1) PVE is a Hypervisor suite based on Debian.
2) Adding storage to PVE is, in most cases, is the same as adding it to Debian Linux, or most other Linux flavors.
3) Your SAN vendor is your best resource for information on iSCSI and Multipath.
4) The wiki/guide is basic because there is such a variety of configuration and vendors that only high level advice can be safely given.

The best approach is to:
a) Find the directions from your vendor on configuring iSCSI client with Multipath
b) Tell PVE about the available DM device

As to your errors, they indicate a network connectivity problem and do not appear to have anything to do with original topic of the thread. I recommend opening a new one and providing more information about your configuration to get help.


Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
There are few important things you have to keep in mind for your situation:

1) PVE is a Hypervisor suite based on Debian.
2) Adding storage to PVE is, in most cases, is the same as adding it to Debian Linux, or most other Linux flavors.
3) Your SAN vendor is your best resource for information on iSCSI and Multipath.
4) The wiki/guide is basic because there is such a variety of configuration and vendors that only high level advice can be safely given.

The best approach is to:
a) Find the directions from your vendor on configuring iSCSI client with Multipath
b) Tell PVE about the available DM device

As to your errors, they indicate a network connectivity problem and do not appear to have anything to do with original topic of the thread. I recommend opening a new one and providing more information about your configuration to get help.


Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Thank you for your reply. All taken on board and I will do just that. I appreciate this is mildly off-topic so again, thanks for responding with some constructive feedback.
 
Sorry for being late.
I'd add: check your /etc/lvm/lvm.conf file, especially the global_filter line -- if you get warnings about it being ignored, you have to correct the file.

In my case, I had to comment out the last 3 lines (added by pve-manager) and include their content in the "master" global_filter line, that now is:
Code:
global_filter = [ "a|/dev/disk/by-path/pci-0000\:00\:1f.2-ata-.*-part[0-9]*|", "r|/dev/[shz]d.*|", "r|/dev/mapper/pve-.*|", "r|/dev/mapper/.*-vm--[0-9]+--disk--[0-9]+|" ]
(and remember to update initramfs!)
If you don't do that, lvm often finds direct block devices instead of the multipath-managed ones and you won't have functioning multipath.
I'ts also important to "blacklist" VM diskd from lvm to avoid FS corruptions.
 
Sorry for being late.
I'd add: check your /etc/lvm/lvm.conf file, especially the global_filter line -- if you get warnings about it being ignored, you have to correct the file.

In my case, I had to comment out the last 3 lines (added by pve-manager) and include their content in the "master" global_filter line, that now is:
Code:
global_filter = [ "a|/dev/disk/by-path/pci-0000\:00\:1f.2-ata-.*-part[0-9]*|", "r|/dev/[shz]d.*|", "r|/dev/mapper/pve-.*|", "r|/dev/mapper/.*-vm--[0-9]+--disk--[0-9]+|" ]
(and remember to update initramfs!)
If you don't do that, lvm often finds direct block devices instead of the multipath-managed ones and you won't have functioning multipath.
I'ts also important to "blacklist" VM diskd from lvm to avoid FS corruptions.
Thanks for your reply, I'm going to have to read through a few times and look at some documentation to completely understand what is required for my host(s) and configure accordingly.
 
Oh excellent, sounds interesting, thank you for linking. Sorry to see that you're almost experiencing similar headaches as myself. My iSCSI target, for reference, is a Synology UC3200 and I believe it uses Open iSCSI underneath the hood and I have a ticket open with them as well to try and figure out if it is a host or target issue. Assuming it is host but ruling it out is never a bad thing.

EDIT:

I've been looking through my SYSLOG and came across the following information which shows the ProxMox host initiating the connection to my iSCSI target, all seems fine until it defaults to an IPv6 method, which I don't have an issue with, but it appears it can't communicate. I've not done further testing as I am signing off from work for the day but it is something I will look at more tomorrow.

Excerpt of the SYSLOG below fyi.

Code:
Apr 19 14:24:53 prx01 kernel: scsi host16: iSCSI Initiator over TCP/IP
Apr 19 14:24:53 prx01 kernel: scsi host17: iSCSI Initiator over TCP/IP
Apr 19 14:24:53 prx01 kernel: scsi 16:0:0:1: Direct-Access     SYNOLOGY Storage          4.0  PQ: 0 ANSI: 5
Apr 19 14:24:53 prx01 kernel: scsi 16:0:0:1: alua: supports implicit TPGS
Apr 19 14:24:53 prx01 kernel: scsi 16:0:0:1: alua: device naa.600140566386187de4c8d413ddbea1de port group 2 rel port 25
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: Attached scsi generic sg6 type 0
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: [sdf] 1073741824 512-byte logical blocks: (550 GB/512 GiB)
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: [sdf] Write Protect is off
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: [sdf] Mode Sense: 43 00 10 08
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: [sdf] Optimal transfer size 1048576 bytes
Apr 19 14:24:53 prx01 kernel: scsi 17:0:0:1: Direct-Access     SYNOLOGY Storage          4.0  PQ: 0 ANSI: 5
Apr 19 14:24:53 prx01 kernel: scsi 17:0:0:1: alua: supports implicit TPGS
Apr 19 14:24:53 prx01 kernel: scsi 17:0:0:1: alua: device naa.600140566386187de4c8d413ddbea1de port group 1 rel port 25
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: Attached scsi generic sg7 type 0
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: [sdg] 1073741824 512-byte logical blocks: (550 GB/512 GiB)
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: [sdg] Write Protect is off
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: [sdg] Mode Sense: 43 00 10 08
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: [sdg] Optimal transfer size 1048576 bytes
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: alua: transition timeout set to 60 seconds
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: alua: port group 02 state N non-preferred supports TOlUSNA
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: alua: transition timeout set to 60 seconds
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: alua: port group 01 state A non-preferred supports TOlUSNA
Apr 19 14:24:53 prx01 kernel: sd 17:0:0:1: [sdg] Attached SCSI disk
Apr 19 14:24:53 prx01 kernel: sd 16:0:0:1: [sdf] Attached SCSI disk
Apr 19 14:24:53 prx01 multipathd[1331050]: ssd01: addmap [0 1073741824 multipath 0 1 alua 1 1 service-time 0 1 1 8:96 1]
Apr 19 14:24:53 prx01 multipathd[1331050]: sdg [8:96]: path added to devmap ssd01
Apr 19 14:24:53 prx01 multipath[2841668]: ssd01: adding new path sdg
Apr 19 14:24:53 prx01 systemd[1]: Starting LVM event activation on device 253:5...
Apr 19 14:24:54 prx01 iscsid[1329047]: cannot make a connection to fe80::211:32ff:feb0:a48a:3260 (-1,22)
Apr 19 14:24:54 prx01 iscsid[1329047]: Connection5:0 to [target: iqn.2000-01.com.synology:ses-uc3200.ssd02.bfbeaf9c4b, portal: 10.10.50.221,3260] through [iface: default] is operational now
Apr 19 14:24:54 prx01 iscsid[1329047]: cannot make a connection to fe80::211:32ff:feb0:a610:3260 (-1,22)
Apr 19 14:24:54 prx01 iscsid[1329047]: Connection6:0 to [target: iqn.2000-01.com.synology:ses-uc3200.ssd02.bfbeaf9c4b, portal: 10.10.50.220,3260] through [iface: default] is operational now
Apr 19 14:24:54 prx01 multipathd[1331050]: ssd01: performing delayed actions
Apr 19 14:24:55 prx01 multipathd[1331050]: ssd01: reload [0 1073741824 multipath 0 1 alua 2 1 service-time 0 1 1 8:96 1 service-time 0 1 1 8:80 1]
Apr 19 14:24:59 prx01 iscsid[1329047]: cannot make a connection to fe80::211:32ff:feb0:a48a:3260 (-1,22)
Apr 19 14:24:59 prx01 iscsid[1329047]: cannot make a connection to fe80::211:32ff:feb0:a610:3260 (-1,22)
Apr 19 14:25:02 prx01 iscsid[1329047]: cannot make a connection to fe80::211:32ff:feb0:a48a:3260 (-1,22)
Apr 19 14:25:02 prx01 iscsid[1329047]: cannot make a connection to fe80::211:32ff:feb0:a610:3260 (-1,22)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!