ISCSI Multipath : add a new storage, and an error raised

dominique.fournier

Active Member
Jul 25, 2012
26
0
41
Hi,
We are in Proxmox 6.4 with a ISCSI Dell Compellent storage. This storage used multipath. It works really well with proxmox since years. We have added a new storage SCv3020 and since, there is a log in it with : "CTL:856522 SUB:CHELSIOT4 FNC:ActivateObjectCallback FNM:chelsioT4Connection.cxx FLN:555 MID:0 MSG:CHELSIOT4Connection CA Activate Failed: ControllerId=856522 (0x000D11CA) lp=2147549190 (0x80010006) ObjId=4544219 (0x004556db)"

This log appears 300000 times a day. It was never seen on the old storage.

We have try to check ISCSI, Multipath, routing... without luck.

Do you have any idea ? We are stuck.

The cluster is based on 4 Dell R710 and 2 Dell R640, on 10Gb/s. The firmware of the network cards is up to date.
 
I think your best bet is to work with Dell support. From the limited information you provided, this appears to be a message from the NIC. While the firmware might be up to date, is the Kernel module/driver the one recommended by Dell?

Proxmox is just an application from the NIC/storage point of view. While PVE comes with a Kernel that they feel is good for most people, it doesnt mean its a perfect fit for all custom hardware.

Reach out to Dell, tell them the OS/Kernel version you are using. I wouldn't volunteer the app (Proxmox) because its irrelevant.

Good luck


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
We have seen with Dell support all these parameters and they would like to see if someone already have this problem in Proxmox, because they never see that.
 
Sorry, sounds like you caught support on a bad day... Its almost like if you were running an Apache web server on top instead of Proxmox, and they told you to ask Apache group if anyone saw this before.

Its hard to say based on that single line, but I would definitely investigate Kernel modules on the host. Also, check the network stats - is there a Flow/MTU/Speed/Etc mismatch?


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
We didn't see anything of that. MTU 9000 set, 10G/s every where with DAC cables, flow control in switch (don't know how check that in Proxmox, but we have set "/sbin/ethtool -A eth5 autoneg off rx on tx on" in /etc/nework/interfaces).
 
You are right, the log is from the storage. But Dell has already work very hard on this topic, takes a lot of time with engineers at level 2, 3 and 4, and they request the Proxmox help. Do you think the version 7 of Proxmox can change something ?
 
Proxmox 7 comes with a new Kernel, there may be some relevant fixes that might help. Its also possible it may make it worse... It's impossible to say without getting to root cause of the issue.
You dont need to upgrade to PVE7 to get new Kernel, you can just try the kernel itself. There are guides available on how to do it. If you only upgrade the kernel there is always option to back down to prior version. You wont be able to downgrade if you upgrade to PVE7.

What I would do is find the compatibility guide for your Storage system and make sure that you are on supported OS and Kernel release. Follow the configuration guide to set all the environment variables.
What you need to do is configure Debian/Ubuntu to work with your storage. Put Proxmox aside - its just an application in this situation.
Good luck.


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Hi guys
The error coming from the storage is due to proxmox code : it is generated each time pvestatd try to connect to the portal to test the connectivity. It was not due to ISCSI stack.

Actually, I remove the test in /usr/share/perl5/PVE/Storage/ISCSIPlugin.pm by adding "return 1;" before the line "return PVE::Network::tcp_ping($server, $port || 3260, 2);"
Then there is no more log in the storage.

We will see how Dell will solve that, as a aborted TCP connexion must not generate a log.
 
I reboot the servers after applying the hack ...
Its possible that while the symptom/error is the same , the cause is different. You'd need to track down if perhaps something else is now causing the issue. You may need to run pvedaemon in debug mode, or collect a network trace to analyze.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
The only thing that changes is that we've update the Proxmox servers.
I realize that, however other things could have changed in healthchecks and other parts of the code.
What I am saying is that a generic connectivity error could be caused by other things that has changed in the code.

Since you are using storage that is not widely deployed with Proxmox, and the error is generated on the storage side - its impossible for PVE developers to determine what change could be causing your commercial storage heartburn. Your best path of action is to either troubleshoot the situation on the wire, correlating the network traffic to errors in SAN log, or open a case with storage vendor.
Once the culprit is identified PVE stuff or community may recommend a solution.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I am in PVE7.3, and the patch is applied and working for me. I need to restart pvestatd as usual to use it correctely.
In my version, the file to modify is /usr/share/perl5/PVE/Storage/ISCSIPlugin.pm on line 68, to add a "return 1;" before the line "return PVE::Network::tcp_ping($server, $port || 3260, 2);"
 
Last edited:
Hi,
We are in Proxmox 7.4-3 with a ISCSI Dell Compellent SC4020 Storage.
This storage used multipath too.

there is a log in Storage with many Lines :
CHELSIOConnection CA Activate Failed: ControllerId=81254 (0x00013D66) lp=1 (0x00000001) ObjId=478 (0x000001de)
CHELSIOConnection CA Activate Failed: ControllerId=81254 (0x00013D66) lp=2147614725 (0x80020005) ObjId=477 (0x000001dd)

About the Hack in /usr/share/perl5/PVE/Storage/ISCSIPlugin.pm
by adding "return 1;" before the line "return PVE::Network::tcp_ping($server, $port || 3260, 2);"

Howto Apply:

Code:
sub iscsi_test_portal {
    my ($portal) = @_;

    my ($server, $port) = PVE::Tools::parse_host_and_port($portal);
    return 0 if !$server;
    return 1;
    return PVE::Network::tcp_ping($server, $port || 3260, 2);
}

Or
sub iscsi_test_portal {
my ($portal) = @_;

my ($server, $port) = PVE::Tools::parse_host_and_port($portal);
return 0 if !$server;
return 1 if return PVE::Network::tcp_ping($server, $port || 3260, 2);
}


Or
sub iscsi_test_portal {
my ($portal) = @_;

my ($server, $port) = PVE::Tools::parse_host_and_port($portal);
return 0 if !$server;
return 1; return PVE::Network::tcp_ping($server, $port || 3260, 2);
}


Thanks for Any Help!
 

Attachments

  • Captura de Tela 2023-07-02 às 15.03.37.png
    Captura de Tela 2023-07-02 às 15.03.37.png
    318.6 KB · Views: 5
Last edited:
It is the case 1 :
Code:
    return 1;
    return PVE::Network::tcp_ping($server, $port || 3260, 2);

And restart the pvestatd service after modification

Dom
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!