Multipath and ISCSI problems

Limoni dara

Member
May 26, 2021
19
1
8
Hello,
And thank you for your help.

I have installed a PVE cluster with 3 nodes and a DELL ARRAY.
The 2 controllesr on the array have 4 ports each (see PJ) i connected them to 2 switches in 2 non routed VLAN dedicated to iSCSI (70 et 77as you can see in the attachment)

I have 2 big LUNS on the array . I want to do LVM over iscsI. So created my first volume on the first LUN and i mapped it to the 3 PVE hosts in the array GUI.
But i can't get the multipath to work

I can really well see everything from the 3 PVE hosts :

Code:
root@pve2:~# iscsiadm -m session -P 1
Target: iqn.1988-11.com.dell:01.array.bc305b5e705f (non-flash)
    Current Portal: 192.168.70.3:3260,2
    Persistent Portal: 192.168.70.3:3260,2
        **********
        Interface:
        **********
        Iface Name: default
        Iface Transport: tcp
        Iface Initiatorname: iqn.1993-08.org.debian:01:e258a2db6cc
        Iface IPaddress: 192.168.70.101
        Iface HWaddress: default
        Iface Netdev: default
        SID: 1
        iSCSI Connection State: LOGGED IN
        iSCSI Session State: LOGGED_IN
        Internal iscsid Session State: NO CHANGE
    Current Portal: 192.168.70.2:3260,5
    Persistent Portal: 192.168.70.2:3260,5
        **********
        Interface:
        **********
        Iface Name: default
        Iface Transport: tcp
        Iface Initiatorname: iqn.1993-08.org.debian:01:e258a2db6cc
        Iface IPaddress: 192.168.70.101
        Iface HWaddress: default
        Iface Netdev: default
        SID: 2
        iSCSI Connection State: LOGGED IN
        iSCSI Session State: LOGGED_IN
        Internal iscsid Session State: NO CHANGE
    Current Portal: 192.168.77.4:3260,8
    Persistent Portal: 192.168.77.4:3260,8
        **********
        Interface:
        **********
        Iface Name: default
        Iface Transport: tcp
        Iface Initiatorname: iqn.1993-08.org.debian:01:e258a2db6cc
        Iface IPaddress: 192.168.77.101
        Iface HWaddress: default
        Iface Netdev: default
        SID: 3
        iSCSI Connection State: LOGGED IN
        iSCSI Session State: LOGGED_IN
        Internal iscsid Session State: NO CHANGE
    Current Portal: 192.168.77.3:3260,4
    Persistent Portal: 192.168.77.3:3260,4
        **********
        Interface:
        **********
        Iface Name: default
        Iface Transport: tcp
        Iface Initiatorname: iqn.1993-08.org.debian:01:e258a2db6cc
        Iface IPaddress: 192.168.77.101
        Iface HWaddress: default
        Iface Netdev: default
        SID: 4
        iSCSI Connection State: LOGGED IN
        iSCSI Session State: LOGGED_IN
        Internal iscsid Session State: NO CHANGE
    Current Portal: 192.168.77.2:3260,7
    Persistent Portal: 192.168.77.2:3260,7
        **********
        Interface:
        **********
        Iface Name: default
        Iface Transport: tcp
        Iface Initiatorname: iqn.1993-08.org.debian:01:e258a2db6cc
        Iface IPaddress: 192.168.77.101
        Iface HWaddress: default
        Iface Netdev: default
        SID: 5
        iSCSI Connection State: LOGGED IN
        iSCSI Session State: LOGGED_IN
        Internal iscsid Session State: NO CHANGE
    Current Portal: 192.168.70.1:3260,1
    Persistent Portal: 192.168.70.1:3260,1
        **********
        Interface:
        **********
        Iface Name: default
        Iface Transport: tcp
        Iface Initiatorname: iqn.1993-08.org.debian:01:e258a2db6cc
        Iface IPaddress: [default]
        Iface HWaddress: default
        Iface Netdev: default
        SID: 6
        iSCSI Connection State: TRANSPORT WAIT
        iSCSI Session State: FAILED
        Internal iscsid Session State: REOPEN
    Current Portal: 192.168.77.1:3260,3
    Persistent Portal: 192.168.77.1:3260,3
        **********
        Interface:
        **********
        Iface Name: default
        Iface Transport: tcp
        Iface Initiatorname: iqn.1993-08.org.debian:01:e258a2db6cc
        Iface IPaddress: 192.168.77.101
        Iface HWaddress: default
        Iface Netdev: default
        SID: 7
        iSCSI Connection State: LOGGED IN
        iSCSI Session State: LOGGED_IN
        Internal iscsid Session State: NO CHANGE
    Current Portal: 192.168.70.4:3260,6
    Persistent Portal: 192.168.70.4:3260,6
        **********
        Interface:
        **********
        Iface Name: default
        Iface Transport: tcp
        Iface Initiatorname: iqn.1993-08.org.debian:01:e258a2db6cc
        Iface IPaddress: 192.168.70.101
        Iface HWaddress: default
        Iface Netdev: default
        SID: 8
        iSCSI Connection State: LOGGED IN
        iSCSI Session State: LOGGED_IN
        Internal iscsid Session State: NO CHANGE

But each time I map the volume, without even having even confiured multipath, my PVE are in a hanging state and at some time appear again nut it's not stable.

I have these kind of errors :

Code:
Mar  1 15:23:43 pve2 kernel: [ 5701.374972] sd 11:0:0:1: [sdb] tag#614 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK cmd_age=212s
Mar  1 15:23:43 pve2 kernel: [ 5701.374974] sd 11:0:0:1: [sdb] tag#614 CDB: Read(10) 28 00 00 00 00 48 00 00 30 00
Mar  1 15:23:43 pve2 kernel: [ 5701.374974] blk_update_request: I/O error, dev sdb, sector 72 op 0x0:(READ) flags 0x80700 phys_seg 6 prio class 0
Mar  1 15:23:43 pve2 multipathd[13212]: sdb: failed to get udev uid: No data available
Mar  1 15:23:43 pve2 multipathd[13212]: sdb: spurious uevent, path not found
Mar  1 15:23:43 pve2 iscsid: Kernel reported iSCSI connection 1:0 error (1022 - ISCSI_ERR_NOP_TIMEDOUT: A NOP has timed out) state (3)
Mar  1 15:23:47 pve2 iscsid: connection1:0 is operational after recovery (1 attempts)
Mar  1 15:23:49 pve2 pmxcfs[1609]: [status] notice: received log
Mar  1 15:23:51 pve2 kernel: [ 5709.560567]  connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4296316928, last ping 4296318208, now 4296319488
Mar  1 15:23:51 pve2 kernel: [ 5709.562445]  connection3:0: detected conn error (1022)
Mar  1 15:23:52 pve2 iscsid: Kernel reported iSCSI connection 3:0 error (1022 - ISCSI_ERR_NOP_TIMEDOUT: A NOP has timed out) state (3)
Mar  1 15:23:55 pve2 iscsid: connection3:0 is operational after recovery (1 attempts)
Mar  1 15:23:56 pve2 systemd-udevd[787]: sdc: Worker [47717] processing SEQNUM=8722 is taking a long time
Mar  1 15:24:05 pve2 kernel: [ 5722.872553]  connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4296320256, last ping 4296321536, now 4296322816
Mar  1 15:24:05 pve2 kernel: [ 5722.875011]  connection6:0: detected conn error (1022)
Mar  1 15:24:05 pve2 iscsid: Kernel reported iSCSI connection 6:0 error (1022 - ISCSI_ERR_NOP_TIMEDOUT: A NOP has timed out) state (3)
Mar  1 15:24:08 pve2 iscsid: connection6:0 is operational after recovery (1 attempts)
Mar  1 15:24:12 pve2 kernel: [ 5729.784549]  connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4296321984, last ping 4296323264, now 4296324544
Mar  1 15:24:12 pve2 kernel: [ 5729.787074]  connection4:0: detected conn error (1022)
Mar  1 15:24:12 pve2 iscsid: Kernel reported iSCSI connection 4:0 error (1022 - ISCSI_ERR_NOP_TIMEDOUT: A NOP has timed out) state (3)
Mar  1 15:24:16 pve2 iscsid: connection4:0 is operational after recovery (1 attempts)


so then i try the multipath

Code:
"/etc/multipath.conf" 44L, 891B
##Default System Values
defaults {
path_selector "round-robin 0"
uid_attribute ID_SERIAL
fast_io_fail_tmo 5
dev_loss_tmo 10
user_friendly_names yes
no_path_retry fail
find_multipaths no
max_fds 8192
polling_interval 5
}


blacklist {
wwid .*
}

blacklist_exceptions {
        wwid "600C0FF00065EB293359FF6301000000"
      #  wwid 36f4ee0801eff310029e21c965f327cc4
}

devices {
device {
        vendor "DellEMC"
        product "ME4"
        path_grouping_policy "group_by_prio"
        path_checker "tur"
        hardware_handler "1 alua"
        prio "alua"
        failback immediate
        rr_weight "uniform"
        path_selector "service-time 0"
 }
}

multipaths {
        multipath {
                wwid "600C0FF00065EB293359FF6301000000"
                alias phototeque
        }
}

but it's not working it says my wwid is blacklisted but it's in the exception!:

Mar 01 15:26:23 | sdd: path_checker = tur (setting: storage device autodetected)
Mar 01 15:26:23 | sdd: checker timeout = 30 s (setting: kernel sysfs)
Mar 01 15:26:23 | sdd: tur state = up
Mar 01 15:26:23 | sdd: uid_attribute = ID_SERIAL (setting: multipath.conf defaults/devices section)
Mar 01 15:26:23 | sdd: uid = 3600c0ff00065eb293359ff6301000000 (udev)
Mar 01 15:26:23 | sdd: wwid 3600c0ff00065eb293359ff6301000000 blacklisted

So i have 2 questions :

- What can be wrong? The multipath?
- My array has only one IQN so i only put this one with one IP address of the 8 controllers in the storage.cfg like this :

iscsi: ME4024
portal 192.168.70.1
target iqn.1988-11.com.dell:01.array.bc305b5e705f
content none

Should i put each one of the controllers (8 ) in the storage.conf even if it has only one IQN?


thanks for you insight
 

Attachments

  • Capture d’écran 2023-02-28 à 09.39.21.png
    Capture d’écran 2023-02-28 à 09.39.21.png
    60.5 KB · Views: 17
The errors indicate that most likely you have a network issue. Start with one path and ensure that you can access the disk.
In addition to read IO errors you are getting NOP timeouts.
A NOP is a "no operation", its an iSCSI ping. No data is read/written to actual disk. If that times out, then its a network issue (routing, cable, network, siwtch, etc). It could also be a firmware bug on Dell side, i.e. it is hung and is not responding to iSCSI requests.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
The errors indicate that most likely you have a network issue. Start with one path and ensure that you can access the disk.
In addition to read IO errors you are getting NOP timeouts.
A NOP is a "no operation", its an iSCSI ping. No data is read/written to actual disk. If that times out, then its a network issue (routing, cable, network, siwtch, etc). It could also be a firmware bug on Dell side, i.e. it is hung and is not responding to iSCSI requests.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Thanks for your answer.

When i only add one controller on the storage.cfg t's working
Then i go to the ME4024 array GUI, create a Volume (the volume where i will put one LVM VM) and map it on all the PVE initiators.
Back on the PVE and the logs start to fill up just after that

When i look at the iscsi swtiches i see it's communicating :


Code:
<165>1 2023-03-01T22:53:30.629906+00:00 SWD147 dn_alm 798 - - Node.1-Unit.1:PRI [event], Dell EMC (OS10) %ISCSI_OPT_NEW_TCP_CONN: ISCSI_OPT: New iSCSI Connection Discovered. Src ip 192.168.70.102 port:54426 Dest ip: 192.168.70.1 port: 3260
<165>1 2023-03-01T22:53:34.574607+00:00 SWD147 dn_alm 798 - - Node.1-Unit.1:PRI [event], Dell EMC (OS10) %ISCSI_OPT_NEW_TCP_CONN: ISCSI_OPT: New iSCSI Connection Discovered. Src ip 192.168.70.100 port:36466 Dest ip: 192.168.70.2 port: 3260
no terminal mo<165>1 2023-03-01T22:53:41.455707+00:00 SWD147 dn_alm 798 - - Node.1-Unit.1:PRI [event], Dell EMC (OS10) %ISCSI_OPT_NEW_TCP_CONN: ISCSI_OPT: New iSCSI Connection Discovered. Src ip 192.168.70.101 port:55184 Dest ip: 192.168.70.1 port: 3260
<165>1 2023-03-01T22:53:41.482086+00:00 SWD147 dn_alm 798 - - Node.1-Unit.1:PRI [event], Dell EMC (OS10) %ISCSI_OPT_NEW_TCP_CONN: ISCSI_OPT: New iSCSI Connection Discovered. Src ip 192.168.70.100 port:42678 Dest ip: 192.168.70.1 port: 3260

Maybe i need to add all the iscsi other 7 controllers in the storage.conf?

Can you suggest some logs to look at ?

i'm really lost there it's been 3 days i'm stuck on that problem.

Thank you
 
One thing to keep in mind is that PVE is a Debian based product. You want to make sure that you use Dell best practices for connecting SAN to a Debian Linux host.
I noticed that the multipath config appear to be missing leading digit in the config file for your device. But I doubt that will solve all your issues.
Once you have established iSCSI sessions, before multipath, you should be able to "dd if=/dev/path/to/device of=/dev/null bs=32k" to read the entire disk.
I would suggest simplifying the environment - reduce it to one port/IP, make sure it works.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Your blacklist entry is wrong. In Linux multipath, each wwid must be prefixed with a 3, so your correct blacklist exception is:

Code:
blacklist_exceptions {
        wwid "3600c0ff00065eb293359ff6301000000"
}
 
Your blacklist entry is wrong. In Linux multipath, each wwid must be prefixed with a 3, so your correct blacklist exception is:

Code:
blacklist_exceptions {
        wwid "3600c0ff00065eb293359ff6301000000"
}
Thank you, i didn't know! but it did not solve my problem.
 
Your blacklist entry is wrong. In Linux multipath, each wwid must be prefixed with a 3, so your correct blacklist exception is:

Code:
blacklist_exceptions {
        wwid "3600c0ff00065eb293359ff6301000000"
}
One thing to keep in mind is that PVE is a Debian based product. You want to make sure that you use Dell best practices for connecting SAN to a Debian Linux host.
I noticed that the multipath config appear to be missing leading digit in the config file for your device. But I doubt that will solve all your issues.
Once you have established iSCSI sessions, before multipath, you should be able to "dd if=/dev/path/to/device of=/dev/null bs=32k" to read the entire disk.
I would suggest simplifying the environment - reduce it to one port/IP, make sure it works.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
One thing to keep in mind is that PVE is a Debian based product. You want to make sure that you use Dell best practices for connecting SAN to a Debian Linux host.
I noticed that the multipath config appear to be missing leading digit in the config file for your device. But I doubt that will solve all your issues.
Once you have established iSCSI sessions, before multipath, you should be able to "dd if=/dev/path/to/device of=/dev/null bs=32k" to read the entire disk.
I would suggest simplifying the environment - reduce it to one port/IP, make sure it works.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I just read the DELL guide for ME4024 and realised that there is one pool per controller : i have two pools on my array but i will only use one for the time being... I think this is why i have this problem, i activated all the controllers IPs while i'm only using 4. Can my problem come from this? I'll look into this tomorrox and keep you updated, thank you for your help.
 
Hello everyone

I'm coming back with an answer : My problem is with the mtu 9OOO declared between the array and the iscsi dedicated switches. Both were configured mtu 9000 and when i deactivate this config on the pve hosts interfaces everything comes back to normal... I still haven't found the solution to make mtu 9000 work but i'm looking at it.
 
I still haven't found the solution to make mtu 9000 work but i'm looking at it.
I had MTU in my first reply as a possible culprit but decided to remove it as iSCSI NOP is a very small data packet and is certain to fit well into a very small size frame. Just goes to show that MTU mismatch behavior is very unpredictable.

For jumbo MTU to work you must set all ports in the path to the same MTU size, ie including network ports of the switch. You can test your changes with : ping -Mdo -s 8192 ip
note the size has to be less than 9000 or whatever MTU you set it to.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I had MTU in my first reply as a possible culprit but decided to remove it as iSCSI NOP is a very small data packet and is certain to fit well into a very small size frame. Just goes to show that MTU mismatch behavior is very unpredictable.

For jumbo MTU to work you must set all ports in the path to the same MTU size, ie including network ports of the switch. You can test your changes with : ping -Mdo -s 8192 ip
note the size has to be less than 9000 or whatever MTU you set it to.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I put all ports on the switches to 9000 and the 2 interfaces of each pve host connected to them also but it's a mess in the logs again...
Why do you say it must be mess than 9000? I should try 8192 ?
 
Some proprietary systems with a custom tcp/ip stack dont handle jumbo MTU well, or require a reboot for it to take effect. Although, I would expect more from Dell.
Troubleshoot it methodically by eliminating pieces of the network: can you ping with jumbo size between clients? If not can you connect two clients directly? Does that work? What if you connect a laptop directly into SP - does the ping work?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Some proprietary systems with a custom tcp/ip stack dont handle jumbo MTU well, or require a reboot for it to take effect. Although, I would expect more from Dell.
Troubleshoot it methodically by eliminating pieces of the network: can you ping with jumbo size between clients? If not can you connect two clients directly? Does that work? What if you connect a laptop directly into SP - does the ping work?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I just made some tests, I f i put 9000 MTU on both iscsi switches and 8600 MTU on the servers all is working smoothely.
I f i put 9000 on both switches and hosts it starts to generate NOPs and I/O errors... But i looked at the specifications of the broadcom cards, they are supposed to handle 9000. Should i let it at 8600 on hosts?
 
In general MTU should be the same across all devices.
I f i put 9000 on both switches and hosts it starts to generate NOPs and I/O errors
take storage out of it. Test just with the client servers between each other.
If you are using VLAN tagging that can affect what you should set the MTU size to.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
You must configure always a bigger MTU on your Switches. Please set minimum of 9216.
Then you can use MTU 9000 on the Storage and on your PVEs.
 
I had MTU in my first reply as a possible culprit but decided to remove it as iSCSI NOP is a very small data packet and is certain to fit well into a very small size frame. Just goes to show that MTU mismatch behavior is very unpredictable.

For jumbo MTU to work you must set all ports in the path to the same MTU size, ie including network ports of the switch. You can test your changes with : ping -Mdo -s 8192 ip
note the size has to be less than 9000 or whatever MTU you set it to.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
And thank you for your help also you really helped me better understanding the whole problem
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!