[SOLVED] Issue with iScsi offline on one node in cluster

courtjesterau

New Member
Aug 31, 2023
14
5
3
Hi,

I have a 3 node cluster all three nodes are connected to teh same switch 2 x are in a LACP bond and one is just a single link all 3 connect to an iSCSI nas for storage for the VM's

One day ( after a hard lock up ) my 2nd node willl not see teh iscsi storage with teh UI saying it is offline. I can ping the IP address from teh shell of the node and the other 2 x nodes in the cluster are working fine.

I ahve restarted this several times updated it and removed and re added the storage but nothing has worked.

in the syslog I am getting a repeating message:

pr 21 16:16:39 pve2 pvedaemon[1608]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Apr 21 16:16:39 pve2 pvedaemon[1608]: storage 'Qnas-iSCSI-Pool' is not online
Apr 21 16:16:42 pve2 pvedaemon[1607]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Apr 21 16:16:42 pve2 pvedaemon[1607]: storage 'Qnas-iSCSI-Pool' is not online
Apr 21 16:16:44 pve2 pvedaemon[1607]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Apr 21 16:16:44 pve2 pvedaemon[1607]: storage 'Qnas-iSCSI-Pool' is not online
Apr 21 16:16:46 pve2 pvestatd[1577]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Apr 21 16:16:46 pve2 pvestatd[1577]: storage 'Qnas-iSCSI-Pool' is not online


any ideas on what to try next to get this up and running?
 

Attachments

  • proxmoxissue.png
    proxmoxissue.png
    348.3 KB · Views: 9
Did you review your system log starting from boot point? "journalctl"
What does "pvesm status" say on a bad node and good node?
What do "iscsiadm -m node" and "iscsiadm -m session" say on good and bad node?
What happens when you do "pvesm scan iscsi <portal>"
Are there any errors on storage side? Are you sure the initiator is in the allowed group to access the target?
Is the MTU correct? Are you using Jumbo? If you do, can you ping with non-fragmented high size?
Can you do manual "iscsiadm discovery" against the target?
What is the context of your "/etc/pve/storage.cfg"?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Kingneutron
Did you review your system log starting from boot point? "journalctl"
What does "pvesm status" say on a bad node and good node?
What do "iscsiadm -m node" and "iscsiadm -m session" say on good and bad node?
What happens when you do "pvesm scan iscsi <portal>"
Are there any errors on storage side? Are you sure the initiator is in the allowed group to access the target?
Is the MTU correct? Are you using Jumbo? If you do, can you ping with non-fragmented high size?
Can you do manual "iscsiadm discovery" against the target?
What is the context of your "/etc/pve/storage.cfg"?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox


I am newish to proxmox so not sure about all these questions but I will try to provide the answers the best I can


Are there any errors on storage side? Are you sure the initiator is in the allowed group to access the target?


There are no errors storage side this was running for well over a year with three nodes exactly as this is now and there are still 2x nodes suscessfully using the storage

The MTU is correct - double checked it is all standard 1500 and MTU settings are teh same on all 3 nodes and I am not using jumbo - this was changed a while back maybe 7 months ago ( and it still worked fins for a period after this was changed )


Good node details:


root@pve3:~# pvesm status

Name Type Status Total Used Available %
BKP02 pbs active 15475078144 1924850944 13550227200 12.44%
Qnas-iSCSI-Lun-0 lvm active 2147479552 1709244416 438235136 79.59%
Qnas-iSCSI-Pool iscsi active 0 0 0 0.00%
local dir active 98497780 11540784 81907448 11.72%
local-lvm lvmthin active 855855104 0 855855104 0.00%

root@pve3:~# iscsiadm -m session
tcp: [1] 172.16.103.5:3260,1 iqn.2004-04.com.qnap:ts-431xeu:iscsi.qnas.322a03 (non-flash)

root@pve3:~# iscsiadm -m node
172.16.103.5:3260,1 iqn.2004-04.com.qnap:ts-431xeu:iscsi.qnas.322a03

root@pve3:~# pvesm scan iscsi 172.16.103.5
iqn.2004-04.com.qnap:ts-431xeu:iscsi.qnas.322a03 172.16.103.5:3260

root@pve3:~# iscsiadm -m discovery
172.16.103.5:3260 via sendtargets

Storage cfg:
iscsi: Qnas-iSCSI-Pool
portal 172.16.103.5
target iqn.2004-04.com.qnap:ts-431xeu:iscsi.qnas.322a03
content none

lvm: Qnas-iSCSI-Lun-0
vgname Qnas-iSCSI-Lun-0
content images,rootdir
shared 1


bad node pvesm status

root@pve2:~# pvesm status

Command failed with status code 5.
command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
storage 'Qnas-iSCSI-Pool' is not online
Name Type Status Total Used Available %
BKP02 pbs active 15475078144 1924850944 13550227200 12.44%
Qnas-iSCSI-Lun-0 lvm inactive 0 0 0 0.00%
Qnas-iSCSI-Pool iscsi inactive 0 0 0 0.00%
local dir active 40516856 18773012 19653452 46.33%
local-lvm lvmthin active 56545280 0 56545280 0.00%


root@pve2:~# iscsiadm -m session
iscsiadm: No active sessions.

root@pve2:~# iscsiadm -m node
[]:3260,4294967295

root@pve2:~# pvesm scan iscsi 172.16.103.5
iscsiadm: Could not stat /etc/iscsi/nodes//,3260,-1/default to delete node: No such file or directory
iscsiadm: Could not add/update [tcp:[hw=,ip=,net_if=,iscsi_if=default] 172.16.103.5,3260,1 iqn.2004-04.com.qnap:ts-431xeu:iscsi.qnas.322a03]
iqn.2004-04.com.qnap:ts-431xeu:iscsi.qnas.322a03 172.16.103.5:3260

root@pve2:~# iscsiadm -m discovery
172.16.103.5:3260 via sendtargets
172.16.100.7:3260 via sendtargets

storage CFG file:

iscsi: Qnas-iSCSI-Pool
portal 172.16.103.5
target iqn.2004-04.com.qnap:ts-431xeu:iscsi.qnas.322a03
content none

lvm: Qnas-iSCSI-Lun-0
vgname Qnas-iSCSI-Lun-0
content images,rootdir
shared 1
 
  • Like
Reactions: Kingneutron
What is the output of "pveversion" from each of the nodes?
good:
root@pve3:~# iscsiadm -m discovery
172.16.103.5:3260 via sendtargets
bad:
root@pve2:~# iscsiadm -m discovery
172.16.103.5:3260 via sendtargets
172.16.100.7:3260 via sendtargets
Where does 100.7 come from? Is it reachable from the server? You may be suffering from a recent change in handling of multi-IP targets: if a non-reachable IP is reported by the target, the initiator (PVE) gets confused.

You can try to specify a particular portal:
iscsiadm --mode discovery --type sendtargets --portal 172.16.103.5

But first you should delete the strange iscsi node entry on "bad" server:
root@pve2:~# iscsiadm -m node
[]:3260,4294967295
iscsiadm -m node -o delete


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Kingneutron
What is the output of "pveversion" from each of the nodes?
good:

bad:

Where does 100.7 come from? Is it reachable from the server? You may be suffering from a recent change in handling of multi-IP targets: if a non-reachable IP is reported by the target, the initiator (PVE) gets confused.

You can try to specify a particular portal:
iscsiadm --mode discovery --type sendtargets --portal 172.16.103.5

But first you should delete the strange iscsi node entry on "bad" server:

iscsiadm -m node -o delete


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
172.16.100.7 is the managment IP for the NAS -- the prox mox clusters have magament IPs in this range as well.-- this does not accept iscsi connections via it (normally) I did allow connections via this subnet yesterday did add it yesterday troubble shooting ( where it connected / worked fine)

trying to specifiy a target I get an error:

root@pve2:~# iscsiadm --mode discovery --type sendtargets --portal 172.16.103.5
iscsiadm: Could not stat /etc/iscsi/nodes//,3260,-1/default to delete node: No such file or directory
iscsiadm: Could not add/update [tcp:[hw=,ip=,net_if=,iscsi_if=default] 172.16.103.5,3260,1 iqn.2004-04.com.qnap:ts-431xeu:iscsi.qnas.322a03]
172.16.103.5:3260,1 iqn.2004-04.com.qnap:ts-431xeu:iscsi.qnas.322a03

when I try to delete the bad node I get this error

root@pve2:~# iscsiadm -m node -o delete
iscsiadm: Could not stat /etc/iscsi/nodes//,3260,-1/default to delete node: No such file or directory
iscsiadm: Could not execute operation on all records: encountered iSCSI database failur
 
I have resolve it -- thanks for the help --

I removed the iscsi from the node in the data cetner view

then on the node went to this directory: /etc/iscsi/nodes/

and removed all listing manually

then re added the node to the iscsi in the datacenter storage manager

and this has got it working now..
 
I have resolve it -- thanks for the help --

I removed the iscsi from the node in the data cetner view

then on the node went to this directory: /etc/iscsi/nodes/

and removed all listing manually

then re added the node to the iscsi in the datacenter storage manager

and this has got it working now..

Thanks this helped me solve the same problem too.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!