Problem with iSCSI

canikipthis · Jan 20, 2024

I have a two node cluster, each of the nodes are running 8.1.4. The nodes are Server09 and Server10.

Code:

pveversion
pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.11-7-pve)

I am running a Pure Flash Storage array that I am presenting storage to the Proxmox cluster utilizing iSCSI. Each host has two 10 GbE NIC's so I am using multipath. I configured the LUN (did NOT use it for direct use) and then formatted it with LVM. I followed Pure Storages recommendations on how to configure Multipath:

Pure - Linux Settings

Everything worked, I was able to create a VM on it, migrate back and forth etc. The problem came when one of the hosts, Server10 rebooted. The iSCSI storage never came back on that node. I can confirm that I have connectivity to the back end storage ports on the dedicated iSCSI network. But no sessions pop up.

These are the outputs from the server that is not working (Server10)

multipath -ll - this is empty

Code:

# iscsiadm --mode session
iscsiadm: No active sessions

cat /etc/iscsi/iscsid.conf

Code:

iscsid.startup = /bin/systemctl start iscsid.socket
node.startup = automatic
node.session.timeo.replacement_timeout = 15

cat /etc/multipath.conf

Code:

defaults {
        polling_interval       10
}


blacklist {
        wwid .*
}

blacklist_exceptions {
        wwid "3624a93706f94cc82e5524e2000012648"
}

devices {
    device {
        vendor                      "NVME"
        product                     "Pure Storage FlashArray"
        path_selector               "queue-length 0"
        path_grouping_policy        group_by_prio
        prio                        ana
        failback                    immediate
        fast_io_fail_tmo            10
        user_friendly_names         no
        no_path_retry               0
        features                    0
        dev_loss_tmo                60
    }
    device {
        vendor                   "PURE"
        product                  "FlashArray"
        path_selector            "service-time 0"
        hardware_handler         "1 alua"
        path_grouping_policy     group_by_prio
        prio                     alua
        failback                 immediate
        path_checker             tur
        fast_io_fail_tmo         10
        user_friendly_names      no
        no_path_retry            0
        features                 0
        dev_loss_tmo             600
    }
}

cat /etv/pve/storage.conf

Code:

dir: local
        path /var/lib/vz
        content iso,backup,vztmpl

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

iscsi: z-PROX-SRV01
        portal 192.168.0.96
        target iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e
        content none
        nodes Server10,Server09

lvm: PROX-SRV01
        vgname VG-PROX-SRV01
        content images,rootdir
        nodes Server10,Server9
        shared 1

iscsiadm -m discovery

Code:

192.168.0.97:3260 via sendtargets
192.168.0.96:3260 via sendtargets

ping to the storage controller port

Code:

ping 192.168.0.96
PING 192.168.0.96 (192.168.0.96) 56(84) bytes of data.
64 bytes from 192.168.0.96: icmp_seq=1 ttl=64 time=0.099 ms
64 bytes from 192.168.0.96: icmp_seq=2 ttl=64 time=0.122 ms
64 bytes from 192.168.0.96: icmp_seq=3 ttl=64 time=0.093 m

cat /etc/iscsi/initiatorname.iscsi

Code:

InitiatorName=iqn.1991-05.com.microsoft:Server10.domain.corp

systemctl status iscsid

Code:

iscsid.service - iSCSI initiator daemon (iscsid)
     Loaded: loaded (/lib/systemd/system/iscsid.service; disabled; preset: enabled)
     Active: active (running) since Fri 2024-01-19 17:11:23 CST; 3h 20min ago
TriggeredBy: ● iscsid.socket
       Docs: man:iscsid(8)
    Process: 15449 ExecStartPre=/lib/open-iscsi/startup-checks.sh (code=exited, status=0/SUCCESS)
    Process: 15452 ExecStart=/sbin/iscsid (code=exited, status=0/SUCCESS)
   Main PID: 15454 (iscsid)
      Tasks: 2 (limit: 618962)
     Memory: 2.4M
        CPU: 154ms
     CGroup: /system.slice/iscsid.service
             ├─15453 /sbin/iscsid
             └─15454 /sbin/iscsid

Jan 19 17:11:23 Server10 systemd[1]: Starting iscsid.service - iSCSI initiator daemon (iscsid)...
Jan 19 17:11:23 Server10 iscsid[15452]: iSCSI logger with pid=15453 started!
Jan 19 17:11:23 Server10 systemd[1]: Started iscsid.service - iSCSI initiator daemon (iscsid).
Jan 19 17:11:24 Server10 iscsid[15453]: iSCSI daemon with pid=15454 started!

Reboots of Server don't fix it.

Server09 is working fine. I haven't rebooted it, partly because I'm afraid there will be an issue.

Here are the outputs from Server09, that are working

Code:

multipath -ll
3624a93706f94cc82e5524e2000012648 dm-5 PURE,FlashArray
size=50T features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 12:0:0:1 sdc 8:32 active ready running
  `- 11:0:0:1 sdb 8:16 active ready running

Code:

iscsiadm --mode session
tcp: [1] 192.168.0.96:3260,1 iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e (non-flash)
tcp: [2] 192.168.0.97:3260,1 iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e (non-flash)

Code:

iscsid.startup = /bin/systemctl start iscsid.socket
node.startup = automatic
node.session.timeo.replacement_timeout = 15

Code:

defaults {
        polling_interval       10
}


blacklist {
        wwid .*
}

blacklist_exceptions {
        wwid "3624a93706f94cc82e5524e2000012648"
}

devices {
    device {
        vendor                      "NVME"
        product                     "Pure Storage FlashArray"
        path_selector               "queue-length 0"
        path_grouping_policy        group_by_prio
        prio                        ana
        failback                    immediate
        fast_io_fail_tmo            10
        user_friendly_names         no
        no_path_retry               0
        features                    0
        dev_loss_tmo                60
    }
    device {
        vendor                   "PURE"
        product                  "FlashArray"
        path_selector            "service-time 0"
        hardware_handler         "1 alua"
        path_grouping_policy     group_by_prio
        prio                     alua
        failback                 immediate
        path_checker             tur
        fast_io_fail_tmo         10
        user_friendly_names      no
        no_path_retry            0
        features                 0
        dev_loss_tmo             600
    }
}

Code:

dir: local
        path /var/lib/vz
        content iso,backup,vztmpl

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

iscsi: z-PROX-SRV01
        portal 192.168.0.96
        target iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e
        content none
        nodes Server09,Server10

lvm: PROX-SRV01
        vgname VG-PROX-SRV01
        content rootdir,images
        nodes Server09,Server10
        shared 1

Code:

iscsiadm -m discovery
192.168.0.96:3260 via sendtargets
192.168.0.97:3260 via sendtargets

Code:

ping 192.168.0.96
PING 192.168.0.96 (192.168.0.96) 56(84) bytes of data.
64 bytes from 192.168.0.96: icmp_seq=1 ttl=64 time=0.113 ms
64 bytes from 192.168.0.96: icmp_seq=2 ttl=64 time=0.114 ms

Code:

cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1991-05.com.microsoft:Server09.domain.corp

Code:

systemctl status iscsid
● iscsid.service - iSCSI initiator daemon (iscsid)
     Loaded: loaded (/lib/systemd/system/iscsid.service; disabled; preset: enabled)
     Active: active (running) since Thu 2024-01-18 16:26:54 CST; 1 day 4h ago
TriggeredBy: ● iscsid.socket
       Docs: man:iscsid(8)
    Process: 1443 ExecStartPre=/lib/open-iscsi/startup-checks.sh (code=exited, status=0/SUCCESS)
    Process: 1448 ExecStart=/sbin/iscsid (code=exited, status=0/SUCCESS)
   Main PID: 1465 (iscsid)
      Tasks: 2 (limit: 618962)
     Memory: 6.2M
        CPU: 7.958s
     CGroup: /system.slice/iscsid.service
             ├─1464 /sbin/iscsid
             └─1465 /sbin/iscsid

Jan 19 20:44:19 Server09 iscsid[1464]: Connection-1:0 to [target: iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e, portal: 192.168.0.99,3260] through [iface: default] is s>
Jan 19 20:44:19 Server09 iscsid[1464]: Connection-1:0 to [target: iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e, portal: 192.168.0.98,3260] through [iface: default] is s>
Jan 19 20:44:24 Server09 iscsid[1464]: connect to 192.168.0.99:3260 failed (No route to host)
Jan 19 20:44:24 Server09 iscsid[1464]: connect to 192.168.0.98:3260 failed (No route to host)
Jan 19 20:44:33 Server09 iscsid[1464]: connect to 192.168.0.99:3260 failed (No route to host)
Jan 19 20:44:33 Server09 iscsid[1464]: connect to 192.168.0.98:3260 failed (No route to host)
Jan 19 20:44:40 Server09 iscsid[1464]: connect to 192.168.0.99:3260 failed (No route to host)
Jan 19 20:44:40 Server09 iscsid[1464]: connect to 192.168.0.98:3260 failed (No route to host)
Jan 19 20:44:47 Server09 iscsid[1464]: connect to 192.168.0.99:3260 failed (No route to host)
Jan 19 20:44:47 Server09 iscsid[1464]: connect to 192.168.0.98:3260 failed (No route to host)

Any help would be greatly appreciated!

bbgeek17 · Jan 20, 2024

canikipthis said:
cat /etc/iscsi/initiatorname.iscsi

Code:

InitiatorName=iqn.1991-05.com.microsoft:Server10.domain.corp

you changed the initiator name of a Debian host to have Microsoft designation?

What is the output on each node of : pvesm status

Create a test VM with storage on the LVM pool. It doesnt need to have an OS. Start it, if you created it on server9. Or migrate it live from 10 to 9.
Does that activate the storage?

Also, these IPs dont seem to match your prior output where 96 and 97 are used:

canikipthis said:
Jan 19 20:44:19 Server09 iscsid[1464]: Connection-1:0 to [target: iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e, portal: 192.168.0.99,3260] through [iface: default] is s> Jan 19 20:44:19 Server09 iscsid[1464]: Connection-1:0 to [target: iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e, portal: 192.168.0.98,3260] through [iface: default] is s> Jan 19 20:44:24 Server09 iscsid[1464]: connect to 192.168.0.99:3260 failed (No route to host) Jan 19 20:44:24 Server09 iscsid[1464]: connect to 192.168.0.98:3260 failed (No route to host)

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

canikipthis · Jan 20, 2024

bbgeek17 said:
you changed the initiator name of a Debian host to have Microsoft designation?

What is the output on each node of : pvesm status

Create a test VM with storage on the LVM pool. It doesnt need to have an OS. Start it, if you created it on server9. Or migrate it live from 10 to 9.
Does that activate the storage?

Also, these IPs dont seem to match your prior output where 96 and 97 are used:

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Knew the Microsoft IQN change would get a comment!

So actually JUST solved this. Ran the below and saw that there were errors:


iscsiadm -m discovery -t st -p 192.168.0.96:3260
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Config file line 26 too long.
iscsiadm: Could not stat /etc/iscsi/nodes//,3260,-1/default to delete node: No such file or directory
iscsiadm: Could not add/update [tcp:[hw=,ip=,net_if=,iscsi_if=default] 192.168.0.96,3260,1 iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e]
iscsiadm: Could not stat /etc/iscsi/nodes//,3260,-1/default to delete node: No such file or directory
iscsiadm: Could not add/update [tcp:[hw=,ip=,net_if=,iscsi_if=default] 192.168.0.98,3260,1 iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e]
iscsiadm: Could not stat /etc/iscsi/nodes//,3260,-1/default to delete node: No such file or directory
iscsiadm: Could not add/update [tcp:[hw=,ip=,net_if=,iscsi_if=default] 192.168.0.97,3260,1 iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e]
iscsiadm: Could not stat /etc/iscsi/nodes//,3260,-1/default to delete node: No such file or directory
iscsiadm: Could not add/update [tcp:[hw=,ip=,net_if=,iscsi_if=default] 192.168.0.99,3260,1 iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e]
192.168.0.96:3260,1 iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e
192.168.0.98:3260,1 iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e
192.168.0.97:3260,1 iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e
192.168.0.99:3260,1 iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e

Mostly referring to the no such file or directory and the could not add/update.

I removed the top level folder that was located in /etc/iscsi/nodes with

rm -r iqn.2010-06.com.purestorage\:flasharray.4f9d18607a7d3d3e/

Then was able to rescan with

iscsiadm -m discovery -t st -p 192.168.0.96:3260

And it came back up. Any ideas what would cause that?

Side question, any idea where that config file line 26 thing is coming from?

Search

Search

Problem with iSCSI

canikipthis

New Member

bbgeek17

Distinguished Member

canikipthis

New Member