Multipath iSCSI problems with 8.1

mweigelt · Dec 11, 2023

Hi,

in 8.1 was a fix for iSCSI "improvements" for trying to login in all portals delivered by sendtargets.

Problem: If you use some specific iSCSI serveres (i.e. open-e) they send you all locally configured IP addresses - there is no way to change this behavior. Even if they are not in use. So you get a big list of targets - but you are not able to use all of them. The regular linux iscsi stack tries to connect all sent targets and if it fails it will ignore the not successful paths. Only successfully etablished paths will be used for multipath and redundancy.

With the new "PVE stack over Linux stack" the PVE is trying to connect to sendtargets again and again. And because there are some paths which are not reachable (i.e. the addresses of local interfaces in an HA cluster) those connections will fail in a loop and the pve node will not start correctly.

The syslog is flooded with:
2023-12-11T13:31:55.678771+01:00 host-003 pvestatd[1999]: command '/usr/bin/iscsiadm --mode node --targetname iqn.2019-03:stor1.vg00 --login' failed: exit code 15
2023-12-11T13:31:55.832249+01:00 host-003 kernel: [ 550.553383] scsi host15: iSCSI Initiator over TCP/IP
2023-12-11T13:31:55.838261+01:00 host-003 kernel: [ 550.560050] connection19:0: detected conn error (1020)
2023-12-11T13:31:55.912233+01:00 host-003 kernel: [ 550.632201] scsi host15: iSCSI Initiator over TCP/IP
2023-12-11T13:31:55.916231+01:00 host-003 kernel: [ 550.636848] connection20:0: detected conn error (1020)
2023-12-11T13:31:55.920232+01:00 host-003 kernel: [ 550.639074] scsi host16: iSCSI Initiator over TCP/IP
2023-12-11T13:31:55.921482+01:00 host-003 kernel: [ 550.643286] connection21:0: detected conn error (1020)
2023-12-11T13:31:56.049591+01:00 host-003 iscsid: Connection-1:0 to [target: iqn.2019-03:stor1.vg00, portal: 172.20.235.2,3260] through [iface: default] is shutdown.
2023-12-11T13:31:56.049629+01:00 host-003 iscsid: Connection-1:0 to [target: iqn.2019-03:stor1.vg00, portal: 172.20.232.2,3260] through [iface: default] is shutdown.
2023-12-11T13:31:56.049647+01:00 host-003 iscsid: Connection-1:0 to [target: iqn.2019-03:stor1.vg00, portal: 172.20.233.2,3260] through [iface: default] is shutdown.
2023-12-11T13:31:56.049664+01:00 host-003 iscsid: Connection-1:0 to [target: iqn.2019-03:stor1.vg00, portal: 172.20.237.1,3260] through [iface: default] is shutdown.
2023-12-11T13:31:56.049679+01:00 host-003 iscsid: Connection-1:0 to [target: iqn.2019-03:stor1.vg00, portal: 192.168.225.2,3260] through [iface: default] is shutdown.
2023-12-11T13:31:56.049697+01:00 host-003 iscsid: connection19:0 login rejected: initiator error - target not found (02/03)
2023-12-11T13:31:56.049715+01:00 host-003 iscsid: Connection19:0 to [target: iqn.2023-07:stor1.vg02, portal: 10.20.4.102,3260] through [iface: default] is shutdown.
2023-12-11T13:31:56.049735+01:00 host-003 iscsid: connection20:0 login rejected: initiator error - target not found (02/03)
2023-12-11T13:31:56.049752+01:00 host-003 iscsid: Connection20:0 to [target: iqn.2023-07:stor1.vg02, portal: 10.20.4.101,3260] through [iface: default] is shutdown.
2023-12-11T13:31:56.049768+01:00 host-003 iscsid: connection21:0 login rejected: initiator error - target not found (02/03)
2023-12-11T13:31:56.049785+01:00 host-003 iscsid: Connection21:0 to [target: iqn.2023-07:stor1.vg02, portal: 10.20.2.101,3260] through [iface: default] is shutdown.
2023-12-11T13:31:56.049801+01:00 host-003 iscsid: connect to 192.168.225.2:3260 failed (No route to host)
2023-12-11T13:32:02.050026+01:00 host-003 iscsid: connect to 192.168.225.2:3260 failed (No route to host)
2023-12-11T13:32:05.050199+01:00 host-003 iscsid: connect to 192.168.225.2:3260 failed (No route to host)
2023-12-11T13:32:08.050441+01:00 host-003 iscsid: connect to 192.168.225.2:3260 failed (No route to host)
2023-12-11T13:32:11.050658+01:00 host-003 iscsid: connect to 192.168.225.2:3260 failed (No route to host)
2023-12-11T13:32:14.050822+01:00 host-003 iscsid: connect to 192.168.225.2:3260 failed (No route to host)
2023-12-11T13:32:17.051057+01:00 host-003 iscsid: connect to 192.168.225.2:3260 failed (No route to host)
2023-12-11T13:32:20.051213+01:00 host-003 iscsid: connect to 192.168.225.2:3260 failed (No route to host)

mweigelt · Dec 11, 2023

Furthermore the problem is: iSCSI multipath und volumes are up as they should be
But i.e. "pvesm status" does not work with the same errors:
iscsiadm: No portals found
iscsiadm: No portals found
iscsiadm: No portals found
iscsiadm: default: 1 session requested, but 1 already present.
iscsiadm: could not read session targetname: 5
iscsiadm: could not find session info for session40
iscsiadm: could not read session targetname: 5
iscsiadm: could not find session info for session40
iscsiadm: default: 1 session requested, but 1 already present.
iscsiadm: Could not login to [iface: default, target: iqn.2023-07:stor1.vg02, portal: 10.20.4.102,3260].
iscsiadm: initiator reported error (19 - encountered non-retryable iSCSI login failure)

bbgeek17 · Dec 11, 2023

mweigelt said:
Problem: If you use some specific iSCSI serveres (i.e. open-e) they send you all locally configured IP addresses - there is no way to change this behavior.

this seems to be the main problem, in my opinion. If the storage array provides a set of available IPs in the "discover target" which are known to be unusable under any circumstances, I'd think its the array's responsibility?
When you said "local IP", I thought may be 127., but then you showed a list of all private network ranges. If the vendor is unwilling to adjust their handling, then one option for you is to clone the plugin responsible and implement your own custom filtering.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

alexskysilk · Dec 11, 2023

bbgeek17 said:
I thought may be 127., but then you showed a list of all private network ranges. If the vendor is unwilling to adjust their handling, then one option for you is to clone the plugin responsible and implement your own custom filtering.

I dont think this is the case. best practices on iscsi generally suggests a different subnet per port (or port couple in a dual controller setup.) the idea being that the host should have an address on each vlan and the controller will announce all available paths- @mweigelt please post your /etc/network/interfaces, and every ip for every port on your storage.

bbgeek17 · Dec 11, 2023

alexskysilk said:
I dont think this is the case. best practices on iscsi generally suggests a different subnet per port (or port couple in a dual controller setup.) the idea being that the host should have an address on each vlan and the controller will announce all available path

yes, we are on the same page. As long as client has mirror config, it can easily decide on traffic flow.
However, if as OP said, the server advertises 10.10.10.40 in iSCSI target and this IP is only used for inter-cluster communication , perhaps on private VLAN or even direct connected, then the client will have no way to get to it. IMHO, that IP should not be in the iSCSI target in the first place.
As is, PVE will try to connect to it and status check it, every minute or so.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

mweigelt · Dec 12, 2023

I have two subnets/physical interfaces for iSCSI communication. 192.168.255.0/24 is the internal ha/drbd network for the storage redundant pair. So it's not reachable from proxmox hosts.

Maybe it's an iscsid behavior on the storage which binds to all available ip addresses, including management, non-virtual and drbd - and sends them as target. Until now I excluded all unwanted paths/addresses with an iSCSI acl. So the storage denies all unwanted paths. Those paths were not used by the initiator after the first connection fails. This is still working. The problem is the one path which can't be established via TCP to get a deny from the iscsi stack on storage.

As I said, until now there was no problem. No log entries. Working multipath. No abnormalities. Correct behavior via the wanted paths.

bbgeek17 · Dec 12, 2023

mweigelt said:
As I said, until now there was no problem. No log entries. Working multipath. No abnormalities. Correct behavior via the wanted paths.

Yes, behavior has changed and exposed a deficiency in your storage product. You have a somewhat valid claim that the change broke a long standing behavior. However, that claim should be made in https://bugzilla.proxmox.com/.

On the other hand, a quick read into RFC https://datatracker.ietf.org/doc/html/rfc3720

Code:

A system that contains targets MUST support discovery sessions on
   each of its iSCSI IP address-port pairs, and MUST support the
   SendTargets command on the discovery session.  In a discovery
   session, a target MUST return all path information (target name and
   IP address-port pairs and portal group tags) for the targets on the
   target network entity which the requesting initiator is authorized to
   access.

Note, I did not do an extensive (re)-read of the RFC, just a quick glance.

Bring it up to PVE devs to make a call.
Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

mweigelt · Dec 12, 2023

So if a path to one of the targets is down when the node boots up the node will never come up? I guess that's not the right idea.

mweigelt · Dec 12, 2023

To be clear: The Linux iscsi initiator makes no problems and creating correct multipath sessions. It's the proxmox stuff around which is always restarting all the time if a target is not reachable. THIS shoult not be a correct behavior in case of a failure of a path...

bbgeek17 · Dec 12, 2023

mweigelt said:
To be clear: The Linux iscsi initiator makes no problems and creating correct multipath sessions. It's the proxmox stuff around which is always restarting all the time if a target is not reachable. THIS shoult not be a correct behavior in case of a failure of a path...

I wouldnt say I disagree with you. However, if you had two IPs reported at initial setup and only one was available, I can see where you might expect PVE to automatically establish connection to the second one when it becomes available.
As with anything, there are always edge and special cases that need to be accommodated. I am not sure that working around incorrectly reported information from one storage vendor is such case, but perhaps it is.
Likely the easiest way to deal with it, is to provide a fallback to prior behavior. As I said, make a detailed report in bugzilla and PVE devs can then properly track this.

good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

mweigelt · Dec 12, 2023

I agree with you that pve should try to reconnect to the second one. but it should not set the node on- and offline the whole time.

alexkenon · Jan 11, 2024

mweigelt said:
I agree with you that pve should try to reconnect to the second one. but it should not set the node on- and offline the whole time.

Haven't solved the problem yet? Any ideas?

RolandK · Jan 11, 2024

ticket for this in proxmox bugzilla:

https://bugzilla.proxmox.com/show_bug.cgi?id=5173

Lephisto · May 9, 2024

is there any update on this? Verious iSCSI Cluster still won't work with Proxmox.

bbgeek17 · May 9, 2024

Lephisto said:
is there any update on this? Verious iSCSI Cluster still won't work with Proxmox.

Please take a look at https://bugzilla.proxmox.com/show_bug.cgi?id=5173#c13

Specifically:

Code:

if someone has a correct config that does not work (or would not work with a shorter, probably configurable timeout) please do tell

Note that the person who opened the bug found a way to solve the problem within their environment. Since there were no specific and concrete examples provided to developers, the bug is currently marked as "invalid".

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Lephisto · May 9, 2024

So what does this tell me? I can't find a way to fix this. My next try would be to replace the perl modules by the ones of the previous commit, but this is not a update-proof solution, and i can't say more about other side effects.

bbgeek17 · May 9, 2024

Lephisto said:
So what does this tell me? I can't find a way to fix this. My next try would be to replace the perl modules by the ones of the previous commit, but this is not a update-proof solution, and i can't say more about other side effects.

I think the developers are asking for a technical and thorough description of your environment and problem.
You can post it here, but you should also add it to the bug.

Right now we only know that something is not working for you.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Lephisto · May 9, 2024

Okay,

basically it's exactly what's happening in the bugzilla report:

I have a customer with a Proxmox Cluster who uses a shared Open-E iscsi Storage Cluster.

These storage Clusters are somwhat strange and have some design issues, but until the latest patch there was a workaround for this. The issue is that openE iscsi in cluster modes announces not only the VIP ip addresses for Multipathing but also the physical ones of each Devices. This needs to be avoided at all cost, because it will eventially lead into split-brain situations if the cluster failovers. Long time known problem, OpenE never fixed it.
The workaround was to permit iscsi traffic on all hosts only to the VIP ip's. This has worked well for most hypervisors, but in Proxmox this is failing since the latest patch. The Problem is that for some reason Proxmox continues to try to connect to the unavailable ip's (because it shall only connect to tht vip's), and subsequently pvestatd fails, no graphs are shown, all drive symbols stay grey.

It is the same effect as described in the bugzilla report.

gjani · Jun 6, 2024

Is there any update with this? I have a similar issue of receiving invalid IPs from sendtargets but with TrueNAS Scale 24.04.1.1.

Maybe an option in Proxmox to filter the list of targets (received from sendtargets) could solve both my issue and @Lephisto's issue with minimal disturbance? So if anyone has an iSCSI host that returns known invalid IPs then they could just filter those out. This would make Proxmox play nicer with iSCSI hosts that lack the proper configuration options to not send invlid IPs IMO. What do you think?

Here are the details of my issue:

The iSCSI host (mentioned TrueNAS) replies to the sendtargets query with the following:

Code:

root@pve:~# iscsiadm -m discovery -t sendtargets -p truenas.lan
192.168.12.89:3260,1 iqn.2005-10.org.freenas.ctl:proxmoxc1-test3
172.16.0.1:3260,1 iqn.2005-10.org.freenas.ctl:proxmoxc1-test3
172.17.0.10:3260,1 iqn.2005-10.org.freenas.ctl:proxmoxc1-test3
172.17.37.60:3260,1 iqn.2005-10.org.freenas.ctl:proxmoxc1-test3
172.17.0.1:3260,1 iqn.2005-10.org.freenas.ctl:proxmoxc1-test3

From the above received targets, only the first line with IP 192.168.12.89 is valid (and it indeed works - I can use the iSCSI storage as a VM disk so no problem with that). All the other lines starting with 172... are invalid - they are actually internal networks that the k8s installation uses on the TrueNAS host. If I could simply just tell the initiator on proxmox to only consider the IPs on the 192.168.12.0/24 network and ignore any other IPs that would solve the problem I believe. I don't know how feasible it is to implement this though, this is pretty much my first time using iSCSI

Code:

root@pve:~# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.4-3-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-13
pve-kernel-5.13: 7.1-9
proxmox-kernel-6.8: 6.8.4-3
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
pve-kernel-5.15.152-1-pve: 5.15.152-1
pve-kernel-5.15.143-1-pve: 5.15.143-1
pve-kernel-5.15.39-3-pve: 5.15.39-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-1-pve: 5.13.19-3
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.2
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.2-1
proxmox-backup-file-restore: 3.2.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.1.10
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.7
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2

Lephisto · Jun 6, 2024

This urgently needs to get some attention by Proxmox devs. Multipath is basically broken.

Yes, I know that iSCSI ist legacy tech, and I avoid it where I can, but still a lot of customers of me, especially those coming from VMWare often bring iSCSI Clusters.

Multipath iSCSI problems with 8.1

Renowned Member

Renowned Member

Distinguished Member

Distinguished Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

Active Member

Famous Member

Well-Known Member

Distinguished Member

Well-Known Member

Distinguished Member

Well-Known Member

Member

Well-Known Member

We value your privacy