iSCSI Reconnecting every 10 seconds to FreeNAS solution

Is there an Solution meanwhile? With Proxmox 5.1-41 and FreeNAS 11.1-U2 the Problem is still available. This Message is spamming my console.
 
I know this is a really old thread, but I'm getting this same message on my console screen and in my /var/log/messages:

Jun 7 03:21:35 freenas ctld[4677]: 192.168.3.99: read: connection lost
Jun 7 03:21:35 freenas ctld[2362]: child process 4677 terminated with exit status 1

Over and over again. If I shut down the Proxmox Server, the messages stop. Is there a way to suppress this message, or maybe I don't have the iSCSI setup correctly...not sure. Everything seems to be working okay. I can create VMs, move VMs to and from the iSCSI device to local and back again. Any guidance would be helpful.

Proxmox Virtual Environment 5.2-1
FreeNAS-11.1-U5
 
I still have the same experience ... I cannot see/notice it is slowing iscsi-performance down ... but on the other hand , this message would not show up if setup was ok ... or !?
 
Can confirm this is a issue with proxmox. For now I disabled syslog-ng on freenas but this game of pointing figures at each other is getting rather old, can the devs just fix the issue please.
 
  • Like
Reactions: rmundel
I have the same issue ..... 4 proxmox nodes and two freenas 11.1 systems for back-end storage connected over ISCSI multipath. I can't see anything else within my freenas console because all I see are these timeout/exit notifications.

It is from every node on both ISCSI paths
 
I stop using FreeNAS and started using proxmox with ZFS with CT with samba or a VM with open media vault because of this issue. It's annoying. lol
 
HAHA that is funny .....
I would like to keep a central backend storage for SAN/NAS function and nothing else.

Would really like to know how to fix this as I am pretty sure it is not a freenas issue....... I will confirm by testing with Xen and ESXi
 
I'm having the same problem (connection lost logs every 10 seconds, rendering the console unusable), with a plain FreeBSD 11.2 host as iscsi target.
Not sure if my observations are of any use, but these logs can be triggered instantly with
pvesh get nodes/pve/storage/connect/status

(or from browser https://<pveSrv>:8006/api2/json/nodes/pve/storage/connect/status)

So it appears that those logs are not the result of a timeout, but instead of an unclean connection tear down?
 
Looking closer I found out that proxmox simply tests for a connection to port 3260 without sending any data, ie. kind of nc -z <iscsi-target> 3260.
Doing so causes FreeBSD to log that nasty 'connection lost' message to the console (although that message doesn't seem to be iscsi related, but more generic in the network stack).

That leads to the question if a valid datastream could be sent instead?
Looking at the iscsi protocol (https://tools.ietf.org/html/rfc7143#section-11.18) there is indeed a ping feature ('NOP-Out').
I've tried to send the following 48 bytes (with help of iscsi_proto.h):
Code:
echo -n -e \\x40\\x80\\x00\\x00 > iscsiPing.bin
echo -n -e \\x00\\x00\\x00\\x00 >> iscsiPing.bin
echo -n -e \\x00\\x00\\x00\\x00 >> iscsiPing.bin
echo -n -e \\x00\\x00\\x00\\x00 >> iscsiPing.bin
echo -n -e \\xff\\xff\\xff\\xff >> iscsiPing.bin
echo -n -e \\xff\\xff\\xff\\xff >> iscsiPing.bin
echo -n -e \\x00\\x00\\x00\\x00 >> iscsiPing.bin
echo -n -e \\x00\\x00\\x00\\x00 >> iscsiPing.bin
echo -n -e \\x00\\x00\\x00\\x00 >> iscsiPing.bin
echo -n -e \\x00\\x00\\x00\\x00 >> iscsiPing.bin
echo -n -e \\x00\\x00\\x00\\x00 >> iscsiPing.bin
echo -n -e \\x00\\x00\\x00\\x00 >> iscsiPing.bin

nc <iscsi-target> 3260 < iscsiPing.bin

The iscsi server confirms that with a protocol error. Little surprising considering the missing LUN and CmdSN information...
Nevertheless, it leads to the assumption that iscsi shall not be pinged 'out-of-band' by any standalone tool, instead by the 'real' iscsi client (initiator). Looking into that direction, chapter 8.1.1 of https://github.com/open-iscsi/open-iscsi comes in handy:
It points out that iscsi connections are pinged anyway (even in both directions)...

So instead of checking for the iscsi port being listening on the server, it would probably be more correct to tune the Nop-Out settings as desired (defaults seem fine) and ask the iscsi initator (iscsiadm?) for current status.

ps. sorry for the urls being not links (as a new member I'm not allowed to post external links)
 
Last edited:
  • Like
Reactions: sumsum
Checked a bit further (I'm new to iscsi, so please forgive if the following should be/is well known).

Actual settings of an iscsi connection can be found with
iscsiadm -m session -r <ID>
(where ID is the number in square brackets (2nd column) of the output of iscsiadm -m session)

To change a value (eg. ping interval):
iscsiadm -m session -r <ID> -o update -n node.conn[0].timeo.noop_out_interval -v 5

The (imho correct) way to check the status of an iscsi connection:
iscsiadm -m session -r <ID> -P 1 | grep "iSCSI Connection State"
must show "LOGGED IN".

Observations to changes of "iSCSI Connection State" value (starting from "LOGGED IN" state):
- start/stop of iscsi target: immediate change (guess: target informs connected initiators)
- take network interface of target down: recognition takes a few seconds (guess: nop_out timeout)
- bring network interface up again: (almost?) instant recognition

Also performance wise iscsiadm ought be superior to running a network connection test.
 
  • Like
Reactions: HybridSarcasm
@nowoe already touched on this: it's not a bug, it's expected, if unfortunate behavior. Fortunately, there's an easy fix on the FreeNAS side that doesn't involve crippling either FreeNAS or Proxmox. Taken from an ixsystems thread (can't link since I'm a new user):

I added the following at the end of the filter section of /etc/local/syslog-ng.conf on FreeNAS:

Code:
#
# Proxmox Filters
#
filter f_cut_ctld01 {
        program("ctld") and
        message("192.168.1.2: read: connection lost");
};

filter f_cut_ctld02 {
        program("ctld") and
        message("child process") and
        message("terminated with exit status 1");
};

log { source(src); filter(f_cut_ctld01); flags(final); };
log { source(src); filter(f_cut_ctld02); flags(final); };

Save, exit, and run
Code:
service syslog-ng reload
Anything matching those parameters will be blocked. Note that I don't know what will happen if proxmox has a legitimate failure and drops connections, and you may lose those log messages. But, that may be worth the sanity of having clean logs.
 
  • Like
Reactions: Itay1778
I tried adding the code above into the syslog-ng.conf on my freenas, but am still seeing the errors.....

For me all of the errors are a real problem since I have 3 proxmox servers with two ISCSI links each. That is an error message for each connection.
 
I tried adding the code above into the syslog-ng.conf on my freenas, but am still seeing the errors.....

Did you remember to restart the syslog-ng service? And also to change the ip address to whatever one is showing up in your logs?
 
Yes, I tried with just one IP address and restarted the syslog-ng.
However, I am still seeing the connection lost for that connection.

I even tried changing the IP to one of the other connections, but still same outcome.
 
Any news on this topic? This is a quite annoying bug, which makes error handling and monitoring very difficult. The analysis from nowoe sounds solid. Has anyone (at proxmox) been able to test this?
 
Anything matching those parameters will be blocked. Note that I don't know what will happen if proxmox has a legitimate failure and drops connections, and you may lose those log messages. But, that may be worth the sanity of having clean logs.
Adding filter to syslog-ng on FreeNAS work for we. but I hope proxmox devs fix it from the source.
env: PVE 6.2-12; FreeNAS-11.3-U5
 
Last edited:
Sorry to revive this thread, but for me, blocking the logs with the suggested filter did not work either.

My logs look like

Code:
Dec  9 22:12:18 freenas 1 2020-12-09T22:12:18.751514+01:00 MYDOMAIN.tld ctld 2868 - - child process 63139 terminated with exit status 1
Dec  9 22:12:28 freenas 1 2020-12-09T22:12:28.797576+01:00 MYDOMAIN.tld ctld 63152 - - 172.16.0.1: read: connection lost
Dec  9 22:12:28 freenas 1 2020-12-09T22:12:28.798263+01:00 MYDOMAIN.tld ctld 2868 - - child process 63152 terminated with exit status 1

It seems like

program("ctld")

fails for whatever reason. I changed the whole filter to match against the message via

Code:
filter f_cut_ctld01 {
  message("ctld") and
  message("172.16.0.1: read: connection lost"); };
filter f_cut_ctld02 {
  message("ctld") and
  message("child process") and
  message("terminated with exit status 1"); };

log { source(src); filter(f_cut_ctld01); flags(final); };
log { source(src); filter(f_cut_ctld02); flags(final); };

and now the logs are finally gone.

To my setup: It's running TrueNAS core, which is joined a domain. Maybe this is of importance?
 
Last edited: