iSCSI Login negotiation failed

coder

Active Member
Feb 5, 2018
9
2
43
43
Hi,

We are using a ReadyNAS storage device from Netgear, which is running a Linux Kernel using the LIO iSCSI target.

We are experiencing the following messages on the Netgear storage device which are logged permanently in an interval of 10 seconds:

Jan 28 19:43:42 Storage01 kernel: rx_data returned 0, expecting 48.
Jan 28 19:43:42 Storage01 kernel: iSCSI Login negotiation failed.
Jan 28 19:43:52 Storage01 kernel: rx_data returned 0, expecting 48.
Jan 28 19:43:52 Storage01 kernel: iSCSI Login negotiation failed.
Jan 28 19:44:02 Storage01 kernel: rx_data returned 0, expecting 48.
Jan 28 19:44:02 Storage01 kernel: iSCSI Login negotiation failed.
Jan 28 19:44:12 Storage01 kernel: rx_data returned 0, expecting 48.
Jan 28 19:44:12 Storage01 kernel: iSCSI Login negotiation failed.
Jan 28 19:44:22 Storage01 kernel: rx_data returned 0, expecting 48.
Jan 28 19:44:22 Storage01 kernel: iSCSI Login negotiation failed.

The iSCSI LUNs are working and can be used from within the Proxmox 4.x environment. Now, we migrated to Proxmox 5 and we are seing the following on our Proxmox systems:

[Sun Jan 28 19:44:21 2018] sd 5:0:0:0: alua: supports implicit and explicit TPGS
[Sun Jan 28 19:44:21 2018] sd 5:0:0:0: alua: device naa.60014057075af940f7e4d9e97f1a2569 port group 0 rel port 1
[Sun Jan 28 19:44:21 2018] sd 5:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA
[Sun Jan 28 19:44:21 2018] sd 5:0:0:1: alua: supports implicit and explicit TPGS
[Sun Jan 28 19:44:21 2018] sd 5:0:0:1: alua: device naa.600140563b743d68c0d47288d772f587 port group 0 rel port 1
[Sun Jan 28 19:44:21 2018] sd 5:0:0:1: alua: port group 00 state A non-preferred supports TOlUSNA
[Sun Jan 28 19:44:21 2018] sd 6:0:0:0: alua: supports implicit and explicit TPGS
[Sun Jan 28 19:44:21 2018] sd 6:0:0:0: alua: device naa.60014057075af940f7e4d9e97f1a2569 port group 0 rel port 1
[Sun Jan 28 19:44:21 2018] sd 6:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA
[Sun Jan 28 19:44:21 2018] sd 6:0:0:1: alua: supports implicit and explicit TPGS

The messages are logged constantly, like the ones on the storage device.
I tried to figure it out to pull some iSCSI traces, but up to now, I was not able to find the reason.

Are there any suggestions or ideas on this?

Thanks and Best Regards,
Volker
 
Hi,

Replying to my own post...

While debugging I've found the following is triggered by pvestatd doing its automated health checks and scsi scans. Especially the "iSCSI Login negotiation failed" messages are a result of a simple connect/disconnect to the portal port of the storage array.

I was able to track it down to the following function which is part of the ISCSIPlugin Module used by pvestatd:

Code:
sub iscsi_test_portal {
    my ($portal) = @_;

    my ($server, $port) = PVE::Tools::parse_host_and_port($portal);
    return 0 if !$server;
    return PVE::Network::tcp_ping($server, $port || 3260, 2);
}

So even the messages are annoying, and filling the logs, they can be ignored.

When working with LVM on ISCSI, one way to suppress this behavior is to deactivate the ISCSI storage target in Proxmox, and configure open-iscsi for automatic login and service discovery. In this scenario, an LVM target can directly be configured in Proxmox without the ISCSI target as a backend. - However, I didn't fully test his, yet.

Best Regards,
Volker
 
Hi, have you done any more research on this? I have something similar. I'm not sure which was the deciding factor but after upgrading Proxmox from 5.1.41 to 5.1.46 and also upgrading my SAN software (Starwind SAN) to the latest version I started getting these errors. I originally was running an older version of Starwind (v8.0.7354) but now upgraded to v8.0.11818. Originally downgraded back because it was crashing the SAN service, found out that VAAI doesn't play well with iSCSI and so once I turned it off it works fine, albeit the log messages. On the Proxmox node:


Code:
Mar 04 18:56:16 Proxmox kernel: sd 11:0:0:0: alua: supports implicit TPGS
Mar 04 18:56:16 Proxmox kernel: sd 11:0:0:0: alua: device eui.dae4a70d707241f1 port group 1 rel port 1
Mar 04 18:56:16 Proxmox kernel: sd 11:0:0:0: alua: port group 01 state A non-preferred supports TolUsNA
Mar 04 18:56:16 Proxmox kernel: sd 12:0:0:0: alua: supports implicit TPGS
Mar 04 18:56:16 Proxmox kernel: sd 12:0:0:0: alua: device eui.dae4a70d707241f1 port group 1 rel port 1
Mar 04 18:56:16 Proxmox kernel: sd 12:0:0:0: alua: port group 01 state A non-preferred supports TolUsNA
Mar 04 18:56:36 Proxmox kernel: sd 9:0:0:0: alua: supports implicit TPGS
Mar 04 18:56:36 Proxmox kernel: sd 9:0:0:0: alua: device eui.0191d2c4c8226ff0 port group 1 rel port 1
Mar 04 18:56:36 Proxmox kernel: sd 9:0:0:0: alua: port group 01 state A non-preferred supports TolUsNA
Mar 04 18:56:36 Proxmox kernel: sd 10:0:0:0: alua: supports implicit TPGS
Mar 04 18:56:36 Proxmox kernel: sd 10:0:0:0: alua: device eui.0191d2c4c8226ff0 port group 1 rel port 1
Mar 04 18:56:36 Proxmox kernel: sd 10:0:0:0: alua: port group 01 state A non-preferred supports TolUsNA
Mar 04 18:56:46 Proxmox kernel: sd 7:0:0:0: alua: supports implicit TPGS
Mar 04 18:56:46 Proxmox kernel: sd 7:0:0:0: alua: device eui.518a6e00cbfcdcf2 port group 1 rel port 1
Mar 04 18:56:46 Proxmox kernel: sd 7:0:0:0: alua: port group 01 state A non-preferred supports TolUsNA
Mar 04 18:56:46 Proxmox kernel: sd 8:0:0:0: alua: supports implicit TPGS
Mar 04 18:56:46 Proxmox kernel: sd 8:0:0:0: alua: device eui.518a6e00cbfcdcf2 port group 1 rel port 1
Mar 04 18:56:46 Proxmox kernel: sd 8:0:0:0: alua: port group 01 state A non-preferred supports TolUsNA
Mar 04 18:57:05 Proxmox kernel: sd 7:0:0:0: alua: supports implicit TPGS
Mar 04 18:57:05 Proxmox kernel: sd 7:0:0:0: alua: device eui.518a6e00cbfcdcf2 port group 1 rel port 1
Mar 04 18:57:05 Proxmox kernel: sd 7:0:0:0: alua: port group 01 state A non-preferred supports TolUsNA
Mar 04 18:57:05 Proxmox kernel: sd 8:0:0:0: alua: supports implicit TPGS
Mar 04 18:57:05 Proxmox kernel: sd 8:0:0:0: alua: device eui.518a6e00cbfcdcf2 port group 1 rel port 1
Mar 04 18:57:05 Proxmox kernel: sd 8:0:0:0: alua: port group 01 state A non-preferred supports TolUsNA

And on the SAN side (Starwind SAN software on 2012R2) a snippet of the event log:

Code:
3/4 19:00:00.310 6548 Srv: Accepted iSCSI connection from 172.16.1.2:34588 to 172.16.1.1:3260. (Id = 0x142b7)
3/4 19:00:00.310 6548 S[142b7]: Session (000000E00BF35200)
3/4 19:00:00.310 6548 C[142b7], FREE: Event - CONNECTED.
3/4 19:00:00.310 6548 C[142b7], XPT_UP: T3.
3/4 19:00:00.310 1850 C[142b7], XPT_UP: recvData returned 10058 (0x274a)!
3/4 19:00:00.310 1850 C[142b7], XPT_UP: *** 'recv' thread: recv failed 10058.
3/4 19:00:00.326 6548 Srv: Accepted iSCSI connection from 172.16.0.2:40632 to 172.16.0.1:3260. (Id = 0x142b8)
3/4 19:00:00.326 6548 S[142b8]: Session (000000E00BF32F40)
3/4 19:00:00.326 6548 C[142b8], FREE: Event - CONNECTED.
3/4 19:00:00.326 6548 C[142b8], XPT_UP: T3.
3/4 19:00:00.326 3e88 C[142b8], XPT_UP: recvData returned 10058 (0x274a)!
3/4 19:00:00.326 3e88 C[142b8], XPT_UP: *** 'recv' thread: recv failed 10058.
3/4 19:00:00.342 6548 Srv: Accepted iSCSI connection from 172.16.0.2:40634 to 172.16.0.1:3260. (Id = 0x142b9)
3/4 19:00:00.342 6548 S[142b9]: Session (000000E00BF34C40)
3/4 19:00:00.342 6548 C[142b9], FREE: Event - CONNECTED.
3/4 19:00:00.342 6548 C[142b9], XPT_UP: T3.
3/4 19:00:00.342 1bb4 C[142b9], XPT_UP: recvData returned 10058 (0x274a)!
3/4 19:00:00.342 1bb4 C[142b9], XPT_UP: *** 'recv' thread: recv failed 10058.
3/4 19:00:00.357 6548 Srv: Accepted iSCSI connection from 172.16.1.2:34594 to 172.16.1.1:3260. (Id = 0x142ba)
3/4 19:00:00.357 6548 S[142ba]: Session (000000E00BF34680)
3/4 19:00:00.357 6548 C[142ba], FREE: Event - CONNECTED.
3/4 19:00:00.357 6548 C[142ba], XPT_UP: T3.
3/4 19:00:00.357 41c8 C[142ba], XPT_UP: recvData returned 10058 (0x274a)!
3/4 19:00:00.357 41c8 C[142ba], XPT_UP: *** 'recv' thread: recv failed 10058.
3/4 19:00:00.357 6548 Srv: Accepted iSCSI connection from 172.16.1.2:34596 to 172.16.1.1:3260. (Id = 0x142bb)

Other than that I don't seem to have any issues. Based on a cursory glance on the starwind forums the "recv failed 10058" seems to be more informational than an error. Also, my multipath is over 2 1GB links and it shows that it's working properly:

Code:
root@Proxmox:~# multipath -ll
mpathc (2dae4a70d707241f1) dm-6 STARWIND,STARWIND
size=250G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 12:0:0:0 sdq 65:0  active ready running
  `- 11:0:0:0 sdp 8:240 active ready running
mpathb (2518a6e00cbfcdcf2) dm-5 STARWIND,STARWIND
size=250G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 8:0:0:0  sdl 8:176 active ready running
  `- 7:0:0:0  sdm 8:192 active ready running
mpatha (20191d2c4c8226ff0) dm-4 STARWIND,STARWIND
size=1.0T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 9:0:0:0  sdn 8:208 active ready running
  `- 10:0:0:0 sdo 8:224 active ready running

multipath.conf
Code:
defaults {
        polling_interval        4
        path_selector           "round-robin 0"
        path_grouping_policy    multibus
        uid_attribute           ID_SERIAL
        rr_min_io               100
        failback                immediate
        no_path_retry           queue
        user_friendly_names     yes
}
blacklist {
        wwid .*
}
blacklist_exceptions {
        wwid "2dae4a70d707241f1"
        wwid "2518a6e00cbfcdcf2"
        wwid "20191d2c4c8226ff0"
}

/etc/iscsi/nodes/examplenodeconfig
Code:
# BEGIN RECORD 2.0-873
node.name = iqn.xxxxxxxxxxxxxxxxxxxxxxxxxxx
node.tpgt = -1
node.startup = automatic
node.leading_login = No
iface.iscsi_ifacename = default
iface.transport_name = tcp
iface.vlan_id = 0
iface.vlan_priority = 0
iface.iface_num = 0
iface.mtu = 0
iface.port = 0
node.discovery_address = 172.16.0.1
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 8
node.session.xmit_thread_priority = -20
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.nr_sessions = 1
node.session.auth.authmethod = None
node.session.timeo.replacement_timeout = 15
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.err_timeo.host_reset_timeout = 60
node.session.iscsi.FastAbort = Yes
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 2
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.conn[0].address = 172.16.0.1
node.conn[0].port = 3260
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0
node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.HeaderDigest = None
node.conn[0].iscsi.DataDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No
# END RECORD

pveversion -v
Code:
root@Proxmox:~# pveversion -v
proxmox-ve: 5.1-41 (running kernel: 4.13.13-6-pve)
pve-manager: 5.1-46 (running version: 5.1-46/ae8241d4)
pve-kernel-4.13.13-6-pve: 4.13.13-41
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.4.98-3-pve: 4.4.98-103
pve-kernel-4.4.40-1-pve: 4.4.40-82
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.16-1-pve: 4.4.16-64
corosync: 2.4.2-pve3
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-common-perl: 5.0-28
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-17
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-2
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-11
pve-cluster: 5.0-20
pve-container: 2.0-19
pve-docs: 5.1-16
pve-firewall: 3.0-5
pve-firmware: 2.0-3
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.9.1-9
pve-xtermjs: 1.0-2
qemu-server: 5.0-22
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

open-iscsi version
Code:
root@Proxmox:~# iscsiadm --version
iscsiadm version 2.0-874

multipath-tools version
Code:
root@Proxmox:~# apt search multipath-tools
Sorting... Done
Full Text Search... Done
kpartx/stable,now 0.6.4-5 amd64 [installed,automatic]
  create device mappings for partitions

multipath-tools/stable,now 0.6.4-5 amd64 [installed]
  maintain multipath block device access

multipath-tools-boot/stable 0.6.4-5 all
  Support booting from multipath devices

Like I said before, everything seems to be working normally except the log flooding on both the Proxmox node and the SAN node.

edit: messed up the CODE tags and tried to edit but the forum won't let me?
 
Hi,

Replying to my own post...

While debugging I've found the following is triggered by pvestatd doing its automated health checks and scsi scans. Especially the "iSCSI Login negotiation failed" messages are a result of a simple connect/disconnect to the portal port of the storage array.

I was able to track it down to the following function which is part of the ISCSIPlugin Module used by pvestatd:

Code:
sub iscsi_test_portal {
    my ($portal) = @_;

    my ($server, $port) = PVE::Tools::parse_host_and_port($portal);
    return 0 if !$server;
    return PVE::Network::tcp_ping($server, $port || 3260, 2);
}

So even the messages are annoying, and filling the logs, they can be ignored.

When working with LVM on ISCSI, one way to suppress this behavior is to deactivate the ISCSI storage target in Proxmox, and configure open-iscsi for automatic login and service discovery. In this scenario, an LVM target can directly be configured in Proxmox without the ISCSI target as a backend. - However, I didn't fully test his, yet.

Best Regards,
Volker
In my case I have target (via targetcli-fb package) and initiator running on the same cluster. Target is configured on one node, and initiators on each node.
I can cinfirm that configuring the iSCSI initiators via iscsiadm, instead of via the PVE web interface, fixed the issue because "TCP pings" were no longer sent to the target, which were interpreted as connection failures.
 
In my case I have target (via targetcli-fb package) and initiator running on the same cluster. Target is configured on one node, and initiators on each node.
I can cinfirm that configuring the iSCSI initiators via iscsiadm, instead of via the PVE web interface, fixed the issue because "TCP pings" were no longer sent to the target, which were interpreted as connection failures.
Thanks, this solved my problem.
 
Based on the information in this thread, the following is happening:
- An iSCSI target is configured as PVE storage object. This means that PVE uses its own health check to ensure the remote target is available.
- In all PVE versions as of today this check is a "tcp_ping to port 3260".
- Some storage vendors report this connection attempt in the logs, and as the check is done often the logs are annoying some users.

Here is a description of what tcp_ping does:
https://perldoc.perl.org/Net::Ping
Code:
With the "tcp" protocol the ping() method attempts to establish a connection to the remote host's echo port. 
If the connection is successfully established, the remote host is considered reachable.
No data is actually echoed. This protocol does not require any special privileges but has higher overhead than the "udp" and "icmp" protocols.

The underlying code is not iSCSI specific and can be used for any service. In addition, the fact that a connection can be opened on any specific port is not an indication that a specific service is running there, or is healthy.
On the storage side, it's really up to the vendor how to interpret these SYN connections. There is no right or wrong way, but some are more curious than others. In my opinion, logging a "login failure" to a simple SYN connection is very misleading. Additionally, extensive tracking and logging of these connection attempts can lead to DoS, if the storage vendor is not careful.

In conclusion, there are two sides involved and both can improve. The storage vendors can be more judicious about error reporting, PVE could implement a more targeted (pun intended) iSCSI health check.

Having said that, only 5 people reported this as an issue since 2008 - probably not enough velocity for Proxmox developers to rototill old established code that works for most.
You will probably get a better result if you submit a bug via https://bugzilla.proxmox.com/ and propose a better health check code.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
The iSCSI target is just a proxmox node based on a hardware RAID6 and LIO. Is it possible to change the behavior there?
So technically you are running LIO on Debian which also happens to host Proxmox, and your backend disks are in a RAID. Neither Proxmox nor RAID type have any bearing on LIO configuration, which is not standard part of Proxmox.
You have not actually described what you are seeing with your specific iSCSI target. Is it " iSCSI Login negotiation failed." as it was for the thread opener? Once the behavior is described the follow up question is better suited for a LIO mailing list.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Yes we observing the same messages:
Code:
[ 3067.125460] iSCSI/iqn.1993-08.org.debian:01:6ef5e311ea: Unsupported SCSI Opcode 0x4d, sending CHECK_CONDITION.
[ 3067.127336] iSCSI/iqn.1993-08.org.debian:01:f62b97ad7382: Unsupported SCSI Opcode 0x4d, sending CHECK_CONDITION.
[ 3067.129547] iSCSI/iqn.1993-08.org.debian:01:f62b97ad7382: Unsupported SCSI Opcode 0x4d, sending CHECK_CONDITION.
[ 3067.131573] iSCSI/iqn.1993-08.org.debian:01:f62b97ad7382: Unsupported SCSI Opcode 0xb7, sending CHECK_CONDITION.
[ 3067.133524] iSCSI/iqn.1993-08.org.debian:01:f62b97ad7382: Unsupported SCSI Opcode 0x37, sending CHECK_CONDITION.
[ 3067.157657] iSCSI/iqn.1993-08.org.debian:01:6ef5e311ea: Unsupported SCSI Opcode 0x4d, sending CHECK_CONDITION.
[ 3067.213002] iSCSI/iqn.1993-08.org.debian:01:6ef5e311ea: Unsupported SCSI Opcode 0xb7, sending CHECK_CONDITION.
[ 3067.214790] iSCSI/iqn.1993-08.org.debian:01:6ef5e311ea: Unsupported SCSI Opcode 0x37, sending CHECK_CONDITION.
[ 3071.328830] rx_data returned 0, expecting 48.
[ 3071.330938] iSCSI Login negotiation failed.
[ 3075.327659] rx_data returned 0, expecting 48.
[ 3075.329304] iSCSI Login negotiation failed.
[ 3081.618558] rx_data returned 0, expecting 48.
[ 3081.620210] iSCSI Login negotiation failed.
[ 3085.644023] rx_data returned 0, expecting 48.
[ 3085.646415] iSCSI Login negotiation failed.
[ 3091.862692] rx_data returned 0, expecting 48.
[ 3091.864353] iSCSI Login negotiation failed.
[ 3095.875869] rx_data returned 0, expecting 48.
[ 3095.877525] iSCSI Login negotiation failed.
[ 3102.149106] rx_data returned 0, expecting 48.
[ 3102.151202] iSCSI Login negotiation failed.
[ 3106.201944] rx_data returned 0, expecting 48.
[ 3106.203598] iSCSI Login negotiation failed.
 
Last edited:
I am not 100% sure but I suspect you cant, without modifying Linux Kernel. If you can show that a flood of such connections causes an issue for targetcli, thats a good topic for Kernel list.
Otherwise your 4 options are: report a bug/enhancement to Proxmox via proper channel, use log management tools to filter messages out, ignore them, use direct iscsi configuration method described previously in the thread.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!