kernel 5.15 `/usr/bin/iscsiadm --mode session --sid X --rescan` really slow

mikewilliams

Active Member
Jan 9, 2019
3
0
41
21
Hi everyone,

We're in the process of upgrading some Proxmox 6 installs to Proxmox 7 and in one environment we have a very serious issue.
In Proxmox 6, *and Proxmox 7 with kernel 5.11.22-7*, the `/usr/bin/iscsiadm --mode session --sid X --rescan` that `pvestatd` runs very often takes about 1-1.5 seconds each. The web-interface is all good, VMs start/stop/live-migrate without issue. Everything is normal.
With Proxmox 7 and kernel 5.15.35-1 however that exact same command, against the iSCSI targets in this environment, takes 10-11 seconds. Now the web-interface is sad, the status of the node and all the storage devices are unknown, the summary graphs are all blank, no VMs tasks can happen.

Running the iscsi rescan with `-d5` results in exactly the same 398 lines on Proxmox 6, Proxmox 7 with 5.11.22-7, and Proxmox 7, literally exactly the same. The only difference is in how long the command takes to execute.

We manually downgraded the proxmox kernel from 5.15.35-1 to 5.11.22-7 as an experiment.

We've checked the other environment that's running proxmox 7 and it too is affected by this slowness, however that environment has far far less LUNs exported to it so the rescan finishes withint about 1-1-5 seconds.
With `-d5` the "iscsiadm: rescanning device ......." lines flow past really slowly. In proxmox 6 they fly by.


Can anyone help, maybe something to tweak or change?
Thanks
 
Check your network interfaces for errors. Get a network trace and look at it.
You said you booted to older kernel, but did not specify if it changed anything.

iscsiadm is part of standard Linux package, it does not interact with any parts of PVE when its running. The sides involved are : kernel, network, userland iscsiadm. I suspect the network trace will be very informative if nothing else seems like a culprit.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
Hi, Mike is out of office, I can shed some light on this ...
I verified the network is fine, no drops. Cluster and storage are actually on the same switch (virtual chassis actually, four switches, nics staggered across them). Other nodes (still running Proxmox 6, haven't upgraded them yet) in the cluster attached to the same switch and storage are fine.
This worked with Proxmox 6 with all the updates right up until we upgraded to Proxmox 7.
However, in testing, Proxmox 7 with kernel 5.11.22-7 does indeed work.
Booting 7 with 5.15.35-1 breaks again.
Currently running on 5.11.22-7.
 
Last edited:
You are possibly suffering from the general hardware support breakage that many users experienced since the upgrade, i.e.:
https://forum.proxmox.com/threads/k..._sas-praid-cp400i-lsi3108.110144/#post-474149

You can try what has helped others if you want to use new kernel:
Add to the kernel command line: intel_iommu=on iommu=pt

Its hard to make a direct link, but if the NIC driver/firmware is broken it can lead to unpredictable behavior.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
Got the ixgbe drivers from Intel, dated 5/20/2022, very new, still didn't help.
Could not find firmware for these.
 
You could grab a network trace and check the round-trip-times on requests.
Terminal1: tshark -i [iscsi interface] -f "tcp port 3260"
or for later analyses: tshark -i [iscsi interface] -w iscsi.cap
Terminal2: iscsiadm invocation.
If you saved the file: tshark -r myfile.pcap -Y 'ip.addr == [iscsi.target.ip]' -T fields -e tcp.analysis.ack_rtt

But its more of an curiosity exercise, since you know what system change caused the issue. If you dont need new kernel - use what works.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Another option is for you to reach out to your storage vendor support and discuss it with them.
We've recently seen some storage vendors running outdated NFS implementations that got broken due to a kernel level change.

There are over 70 iscsi related changes between v5.11.22 and v5.15.35. Its possible things have become incompatible.

Code:
git log --oneline  v5.11.22...v5.15.35|egrep -i iscsi
50d46b5ce004 scsi: iscsi: Fix unbound endpoint error handling
578616ac3d87 scsi: iscsi: Fix conn cleanup and stop race during iscsid restart
485780af7ef1 scsi: iscsi: Fix endpoint reuse regression
cbd4f4e40944 scsi: iscsi: Fix offload conn cleanup when iscsid restarts
cc0082d45de1 scsi: iscsi: Move iscsi_ep_disconnect()
4f786e8f18c3 scsi: target: iscsi: Make sure the np under each tpg is unique
847050d40dc0 scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()
592195692021 scsi: iscsi: Unblock session then wake up error handler
187a580c9e78 scsi: iscsi: Fix set_param() handling
258aad75c621 scsi: iscsi: Fix iscsi_task use after free
4e2855082925 scsi: iscsi: Adjust iface sysfs attr detection
799206c1302e iscsi_ibft: Fix isa_bus_to_virt not working under ARM
7fd1d00bf818 iscsi_ibft: fix warning in reserve_ibft_region()
342f43af70db iscsi_ibft: fix crash due to KASLR physical memory remapping
7b0ddc134608 scsi: be2iscsi: Fix use-after-free during IP updates
e746f3451ec7 scsi: iscsi: Fix iface sysfs attr detection
c7fa2c855e89 scsi: be2iscsi: Fix some missing space in some messages
030e4138d11f scsi: be2iscsi: Fix an error handling path in beiscsi_dev_probe()
79366f0a8de2 scsi: target: iscsi: Remove redundant continue statement
60a0d379f11b scsi: qedi: Pass send_iscsi_tmf task to abort
a1f3486b3b09 scsi: iscsi: Move pool freeing
99b0603313ee scsi: iscsi: Hold task ref during TMF timeout handling
7ce9fc5ecde0 scsi: iscsi: Flush block work before unblock
f6f964574470 scsi: iscsi: Fix completion check during abort races
bdd4aad7ff92 scsi: iscsi: Fix shost->max_id use
ec29d0ac29be scsi: iscsi: Fix conn use after free during resets
fda290c5ae98 scsi: iscsi: Get ref to conn during reset handling
d39df158518c scsi: iscsi: Have abort handler get ref to conn
b1d19e8c92cf scsi: iscsi: Add iscsi_cls_conn refcount helpers
788b71c54f21 scsi: iscsi: iscsi_tcp: Start socket shutdown during conn stop
c0920cd36f17 scsi: iscsi: iscsi_tcp: Set no linger
23d6fefbb3f6 scsi: iscsi: Fix in-kernel conn failure handling
9e5fe1700896 scsi: iscsi: Rel ref after iscsi_lookup_endpoint()
b25b957d2db1 scsi: iscsi: Use system_unbound_wq for destroy_work
06c203a5566b scsi: iscsi: Force immediate failure during shutdown
27e986289e73 scsi: iscsi: Drop suspend calls from ep_disconnect
891e2639deae scsi: iscsi: Stop queueing during ep_disconnect
1486a4f5c2f3 scsi: iscsi: Add task completion helper
0edca4fc633c scsi: be2iscsi: Remove redundant initialization
998da772fd86 scsi: target: iscsi: Drop unnecessary container_of()
6235bef6f990 scsi: target: iscsi: Switch to kmemdup_nul()
6c49d847de82 ice: Recognize 860 as iSCSI port in CEE mode
31c068e73da1 scsi: target: iscsi: Fix zero tag inside a trace event
0dcf8febcb7b scsi: iscsi: Fix iSCSI cls conn state
0352c3d3959a scsi: target: iscsi: Fix zero tag inside a trace event
7f13e0be3694 RDMA/iser: struct iscsi_iser_task is declared twice
9e67600ed6b8 scsi: iscsi: Fix race condition between login and sync thread
aeac8ce864d9 ice: Recognize 860 as iSCSI port in CEE mode
adb253433dc8 scsi: bnx2i: Make bnx2i_process_iscsi_error() simpler and more robust
a90a8c607570 scsi: be2iscsi: Demote incomplete/non-conformant kernel-doc header
f1d50e8ee5c9 scsi: be2iscsi: Ensure function follows directly after its header
42ae74da77d4 scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling
ab4bab7a977d scsi: be2iscsi: Fix beiscsi_phys_port()'s name in header
a905a1dce8bf scsi: be2iscsi: Provide missing function name in header
1b8a7ee9308e scsi: be2iscsi: Fix incorrect naming of beiscsi_iface_config_vlan()
c22659fbb98b scsi: target: iscsi: Initialize arrays at declaration time
c4d81e7c53e7 scsi: target: iscsi: Remove unused macro PRINT_BUF
91ce84a3d789 scsi: target: iscsi: Remove unused macro TEXT_LEN
fdc1339a421d scsi: target: iscsi: Remove unused macro ISCSI_INST_LAST_FAILURE_TYPE
cbfa0cd44130 scsi: iscsi: Verify lengths on passthrough PDUs
99cfc479b678 scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE
3ada197fece7 scsi: iscsi: Restrict sessions and handles to admin capabilities
f9dbdf97a5bd scsi: iscsi: Verify lengths on passthrough PDUs
ec98ea7070e9 scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE
688e8128b7a9 scsi: iscsi: Restrict sessions and handles to admin capabilities
d39bfd0686fd scsi: iscsi: Drop session lock in iscsi_session_chkready()
5b0ec4cf0494 scsi: qla4xxx: Use iscsi_is_session_online()
c8447e4c2eb7 scsi: libiscsi: Reset max/exp cmdsn during recovery
25c400db2083 scsi: iscsi_tcp: Fix shost can_queue initialization
b4046922b3c0 scsi: libiscsi: Add helper to calculate max SCSI cmds per session
c435f0a9ecb7 scsi: libiscsi: Fix iSCSI host workq destruction
14936b1ed249 scsi: libiscsi: Fix iscsi_task use after free()
5923d64b7ab6 scsi: libiscsi: Drop taskqueuelock
d28d48c69977 scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling
f88a10f80da9 scsi: target: iscsi: Redo iscsit_check_session_usage_count() return code
efc9d73063c1 scsi: target: iscsi: Avoid in_interrupt() usage in iscsit_check_session_usage_count()
433675486af4 scsi: target: iscsi: Avoid in_interrupt() usage in iscsit_close_session()
429c76133fbb IB/iser: Protect iscsi_max_lun module param using callback



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!