I have a two node cluster, each of the nodes are running 8.1.4. The nodes are Server09 and Server10.
I am running a Pure Flash Storage array that I am presenting storage to the Proxmox cluster utilizing iSCSI. Each host has two 10 GbE NIC's so I am using multipath. I configured the LUN (did NOT use it for direct use) and then formatted it with LVM. I followed Pure Storages recommendations on how to configure Multipath:
Pure - Linux Settings
Everything worked, I was able to create a VM on it, migrate back and forth etc. The problem came when one of the hosts, Server10 rebooted. The iSCSI storage never came back on that node. I can confirm that I have connectivity to the back end storage ports on the dedicated iSCSI network. But no sessions pop up.
These are the outputs from the server that is not working (Server10)
multipath -ll - this is empty
cat /etc/iscsi/iscsid.conf
cat /etc/multipath.conf
cat /etv/pve/storage.conf
iscsiadm -m discovery
ping to the storage controller port
cat /etc/iscsi/initiatorname.iscsi
systemctl status iscsid
Reboots of Server don't fix it.
Server09 is working fine. I haven't rebooted it, partly because I'm afraid there will be an issue.
Here are the outputs from Server09, that are working
Any help would be greatly appreciated!
Code:
pveversion
pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.11-7-pve)
I am running a Pure Flash Storage array that I am presenting storage to the Proxmox cluster utilizing iSCSI. Each host has two 10 GbE NIC's so I am using multipath. I configured the LUN (did NOT use it for direct use) and then formatted it with LVM. I followed Pure Storages recommendations on how to configure Multipath:
Pure - Linux Settings
Everything worked, I was able to create a VM on it, migrate back and forth etc. The problem came when one of the hosts, Server10 rebooted. The iSCSI storage never came back on that node. I can confirm that I have connectivity to the back end storage ports on the dedicated iSCSI network. But no sessions pop up.
These are the outputs from the server that is not working (Server10)
multipath -ll - this is empty
Code:
# iscsiadm --mode session
iscsiadm: No active sessions
cat /etc/iscsi/iscsid.conf
Code:
iscsid.startup = /bin/systemctl start iscsid.socket
node.startup = automatic
node.session.timeo.replacement_timeout = 15
cat /etc/multipath.conf
Code:
defaults {
polling_interval 10
}
blacklist {
wwid .*
}
blacklist_exceptions {
wwid "3624a93706f94cc82e5524e2000012648"
}
devices {
device {
vendor "NVME"
product "Pure Storage FlashArray"
path_selector "queue-length 0"
path_grouping_policy group_by_prio
prio ana
failback immediate
fast_io_fail_tmo 10
user_friendly_names no
no_path_retry 0
features 0
dev_loss_tmo 60
}
device {
vendor "PURE"
product "FlashArray"
path_selector "service-time 0"
hardware_handler "1 alua"
path_grouping_policy group_by_prio
prio alua
failback immediate
path_checker tur
fast_io_fail_tmo 10
user_friendly_names no
no_path_retry 0
features 0
dev_loss_tmo 600
}
}
cat /etv/pve/storage.conf
Code:
dir: local
path /var/lib/vz
content iso,backup,vztmpl
lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir
iscsi: z-PROX-SRV01
portal 192.168.0.96
target iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e
content none
nodes Server10,Server09
lvm: PROX-SRV01
vgname VG-PROX-SRV01
content images,rootdir
nodes Server10,Server9
shared 1
iscsiadm -m discovery
Code:
192.168.0.97:3260 via sendtargets
192.168.0.96:3260 via sendtargets
ping to the storage controller port
Code:
ping 192.168.0.96
PING 192.168.0.96 (192.168.0.96) 56(84) bytes of data.
64 bytes from 192.168.0.96: icmp_seq=1 ttl=64 time=0.099 ms
64 bytes from 192.168.0.96: icmp_seq=2 ttl=64 time=0.122 ms
64 bytes from 192.168.0.96: icmp_seq=3 ttl=64 time=0.093 m
cat /etc/iscsi/initiatorname.iscsi
Code:
InitiatorName=iqn.1991-05.com.microsoft:Server10.domain.corp
systemctl status iscsid
Code:
iscsid.service - iSCSI initiator daemon (iscsid)
Loaded: loaded (/lib/systemd/system/iscsid.service; disabled; preset: enabled)
Active: active (running) since Fri 2024-01-19 17:11:23 CST; 3h 20min ago
TriggeredBy: ● iscsid.socket
Docs: man:iscsid(8)
Process: 15449 ExecStartPre=/lib/open-iscsi/startup-checks.sh (code=exited, status=0/SUCCESS)
Process: 15452 ExecStart=/sbin/iscsid (code=exited, status=0/SUCCESS)
Main PID: 15454 (iscsid)
Tasks: 2 (limit: 618962)
Memory: 2.4M
CPU: 154ms
CGroup: /system.slice/iscsid.service
├─15453 /sbin/iscsid
└─15454 /sbin/iscsid
Jan 19 17:11:23 Server10 systemd[1]: Starting iscsid.service - iSCSI initiator daemon (iscsid)...
Jan 19 17:11:23 Server10 iscsid[15452]: iSCSI logger with pid=15453 started!
Jan 19 17:11:23 Server10 systemd[1]: Started iscsid.service - iSCSI initiator daemon (iscsid).
Jan 19 17:11:24 Server10 iscsid[15453]: iSCSI daemon with pid=15454 started!
Reboots of Server don't fix it.
Server09 is working fine. I haven't rebooted it, partly because I'm afraid there will be an issue.
Here are the outputs from Server09, that are working
Code:
multipath -ll
3624a93706f94cc82e5524e2000012648 dm-5 PURE,FlashArray
size=50T features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
|- 12:0:0:1 sdc 8:32 active ready running
`- 11:0:0:1 sdb 8:16 active ready running
Code:
iscsiadm --mode session
tcp: [1] 192.168.0.96:3260,1 iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e (non-flash)
tcp: [2] 192.168.0.97:3260,1 iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e (non-flash)
Code:
iscsid.startup = /bin/systemctl start iscsid.socket
node.startup = automatic
node.session.timeo.replacement_timeout = 15
Code:
defaults {
polling_interval 10
}
blacklist {
wwid .*
}
blacklist_exceptions {
wwid "3624a93706f94cc82e5524e2000012648"
}
devices {
device {
vendor "NVME"
product "Pure Storage FlashArray"
path_selector "queue-length 0"
path_grouping_policy group_by_prio
prio ana
failback immediate
fast_io_fail_tmo 10
user_friendly_names no
no_path_retry 0
features 0
dev_loss_tmo 60
}
device {
vendor "PURE"
product "FlashArray"
path_selector "service-time 0"
hardware_handler "1 alua"
path_grouping_policy group_by_prio
prio alua
failback immediate
path_checker tur
fast_io_fail_tmo 10
user_friendly_names no
no_path_retry 0
features 0
dev_loss_tmo 600
}
}
Code:
dir: local
path /var/lib/vz
content iso,backup,vztmpl
lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir
iscsi: z-PROX-SRV01
portal 192.168.0.96
target iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e
content none
nodes Server09,Server10
lvm: PROX-SRV01
vgname VG-PROX-SRV01
content rootdir,images
nodes Server09,Server10
shared 1
Code:
iscsiadm -m discovery
192.168.0.96:3260 via sendtargets
192.168.0.97:3260 via sendtargets
Code:
ping 192.168.0.96
PING 192.168.0.96 (192.168.0.96) 56(84) bytes of data.
64 bytes from 192.168.0.96: icmp_seq=1 ttl=64 time=0.113 ms
64 bytes from 192.168.0.96: icmp_seq=2 ttl=64 time=0.114 ms
Code:
cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1991-05.com.microsoft:Server09.domain.corp
Code:
systemctl status iscsid
● iscsid.service - iSCSI initiator daemon (iscsid)
Loaded: loaded (/lib/systemd/system/iscsid.service; disabled; preset: enabled)
Active: active (running) since Thu 2024-01-18 16:26:54 CST; 1 day 4h ago
TriggeredBy: ● iscsid.socket
Docs: man:iscsid(8)
Process: 1443 ExecStartPre=/lib/open-iscsi/startup-checks.sh (code=exited, status=0/SUCCESS)
Process: 1448 ExecStart=/sbin/iscsid (code=exited, status=0/SUCCESS)
Main PID: 1465 (iscsid)
Tasks: 2 (limit: 618962)
Memory: 6.2M
CPU: 7.958s
CGroup: /system.slice/iscsid.service
├─1464 /sbin/iscsid
└─1465 /sbin/iscsid
Jan 19 20:44:19 Server09 iscsid[1464]: Connection-1:0 to [target: iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e, portal: 192.168.0.99,3260] through [iface: default] is s>
Jan 19 20:44:19 Server09 iscsid[1464]: Connection-1:0 to [target: iqn.2010-06.com.purestorage:flasharray.4f9d18607a7d3d3e, portal: 192.168.0.98,3260] through [iface: default] is s>
Jan 19 20:44:24 Server09 iscsid[1464]: connect to 192.168.0.99:3260 failed (No route to host)
Jan 19 20:44:24 Server09 iscsid[1464]: connect to 192.168.0.98:3260 failed (No route to host)
Jan 19 20:44:33 Server09 iscsid[1464]: connect to 192.168.0.99:3260 failed (No route to host)
Jan 19 20:44:33 Server09 iscsid[1464]: connect to 192.168.0.98:3260 failed (No route to host)
Jan 19 20:44:40 Server09 iscsid[1464]: connect to 192.168.0.99:3260 failed (No route to host)
Jan 19 20:44:40 Server09 iscsid[1464]: connect to 192.168.0.98:3260 failed (No route to host)
Jan 19 20:44:47 Server09 iscsid[1464]: connect to 192.168.0.99:3260 failed (No route to host)
Jan 19 20:44:47 Server09 iscsid[1464]: connect to 192.168.0.98:3260 failed (No route to host)
Any help would be greatly appreciated!