[SOLVED] iscsi multipath - does it need dedicated subnet per ip?

czechsys

Renowned Member
Nov 18, 2015
419
43
93
Hi,

we are testing our new iscsi storage. Based on proxmox iscsi documentation we have now:

Code:
root@proxmox-mon-01:/etc/network# lsblk
NAME                                MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                                   8:0    0 279.4G  0 disk 
|-sda1                                8:1    0  1007K  0 part 
|-sda2                                8:2    0   512M  0 part 
`-sda3                                8:3    0 278.9G  0 part 
  |-pve-swap                        253:0    0    16G  0 lvm   [SWAP]
  `-pve-root                        253:1    0    40G  0 lvm   /
sdb                                   8:16   0     1T  0 disk 
`-3674e9bf100ead535001541e000000000 253:2    0     1T  0 mpath
  |-vg--iscsi-fiotest               253:3    0    50G  0 lvm  
  |-vg--iscsi-vm--100--disk--0      253:4    0    10G  0 lvm  
  |-vg--iscsi-vm--101--disk--0      253:5    0    10G  0 lvm  
  |-vg--iscsi-vm--100--cloudinit    253:6    0     4M  0 lvm  
  |-vg--iscsi-vm--102--cloudinit    253:7    0     4M  0 lvm  
  `-vg--iscsi-vm--102--disk--0      253:8    0    10G  0 lvm  
sdc                                   8:32   0     1T  0 disk 
`-3674e9bf100ead535001541e000000000 253:2    0     1T  0 mpath
  |-vg--iscsi-fiotest               253:3    0    50G  0 lvm  
  |-vg--iscsi-vm--100--disk--0      253:4    0    10G  0 lvm  
  |-vg--iscsi-vm--101--disk--0      253:5    0    10G  0 lvm  
  |-vg--iscsi-vm--100--cloudinit    253:6    0     4M  0 lvm  
  |-vg--iscsi-vm--102--cloudinit    253:7    0     4M  0 lvm  
  `-vg--iscsi-vm--102--disk--0      253:8    0    10G  0 lvm  
sdd                                   8:48   0     1T  0 disk 
`-3674e9bf100ead535001541e000000000 253:2    0     1T  0 mpath
  |-vg--iscsi-fiotest               253:3    0    50G  0 lvm  
  |-vg--iscsi-vm--100--disk--0      253:4    0    10G  0 lvm  
  |-vg--iscsi-vm--101--disk--0      253:5    0    10G  0 lvm  
  |-vg--iscsi-vm--100--cloudinit    253:6    0     4M  0 lvm  
  |-vg--iscsi-vm--102--cloudinit    253:7    0     4M  0 lvm  
  `-vg--iscsi-vm--102--disk--0      253:8    0    10G  0 lvm  
sde                                   8:64   0     1T  0 disk 
`-3674e9bf100ead535001541e000000000 253:2    0     1T  0 mpath
  |-vg--iscsi-fiotest               253:3    0    50G  0 lvm  
  |-vg--iscsi-vm--100--disk--0      253:4    0    10G  0 lvm  
  |-vg--iscsi-vm--101--disk--0      253:5    0    10G  0 lvm  
  |-vg--iscsi-vm--100--cloudinit    253:6    0     4M  0 lvm  
  |-vg--iscsi-vm--102--cloudinit    253:7    0     4M  0 lvm  
  `-vg--iscsi-vm--102--disk--0      253:8    0    10G  0 lvm

Code:
root@proxmox-mon-01:/etc/network# multipath -l
3674e9bf100ead535001541e000000000 dm-2 HUAWEI,XSG1
size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  |- 1:0:0:1 sdb 8:16 active undef running
  |- 2:0:0:1 sdc 8:32 active undef running
  |- 3:0:0:1 sdd 8:48 active undef running
  `- 4:0:0:1 sde 8:64 active undef running

Code:
auto bond0
iface bond0 inet manual
    ovs_bonds ens1f0 ens1f1
    ovs_type OVSBond
    ovs_bridge vmbr0
    ovs_options lacp=active bond_mode=balance-tcp

auto vmbr0
iface vmbr0 inet manual
    ovs_type OVSBridge
    ovs_ports bond0 mgmt4 iscsi1 iscsi2
    comment frontend networks

auto iscsi1
iface iscsi1 inet6 static
    address IPV6::10/64
    ovs_type OVSIntPort
    ovs_bridge vmbr0
    ovs_options tag=VLANID
    comment iscsi storage

#::11 is proxmox-03

auto iscsi2
iface iscsi2 inet6 static
    address IPV6::12/64
    ovs_type OVSIntPort
    ovs_bridge vmbr0
    ovs_options tag=VLANID
    comment iscsi storage

Fio tests (pve, single vm, two vms) shows that all iscsi connection flows only via one of the iscsiX links. Do we need dedicated subnet per iscsiX interface or it can be on single subnet? Storage has 4 dedicated physical interfaces, IPs are from the same subnet as on PVE side. Disconnecting link redirect iscsi flow to other iscsiX.

TLDR: Bond is 2x 10G, both links shows iscsi flow in the test, but with multipath only one iscsiX has flow (and is capped somehow at 10G only).
 
I think you are overcomplicating things by layering multipath over LACP bond. With each doing its own hashing the troubleshooting of the results becomes exceedingly complicated.

That said, single subnet addressing is certainly adding to complexity even more - different subnets is always the simplest solution for multipath (underlying LACP bond not considered).

You can examine your iscsi sessions with "iscsiadm -m session -P3" and look at source IP for each one. Chances are its the same. You'd need to implement policy routing to avoid Linux selecting same source IP to reach all destinations.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
  • Like
Reactions: LnxBil
We don't have experience with iscsi, so we don't know some things yet. So:

1] is LACP problem? We don't have dedicated 10G cards for iscsi on PVE hosts now
2] if we switch from single to multiple subnets, will be 1] ok or will we need to be without LACP?
3] will multiple subnets require multiple initiators per PVE?

This is iscsiadm output:
root@proxmox-mon-01:~# iscsiadm -m session -P3
iSCSI Transport Class version 2.0-870
version 2.1.3
Target: iqn.2006-08.com.huawei:oceanstor:210074e9bfead535::20003:IPV6_GUA:0:0:0:a (non-flash)
Current Portal: [IPV6_GUA:0000:0000:0000:000a]:3260,4
Persistent Portal: [IPV6_GUA:0:0:0:a]:3260,4

**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1993-08.org.debian:01:28b9b142664
Iface IPaddress: [IPV6_GUA:0000:0000:0000:0012]

Iface HWaddress: default
Iface Netdev: default
SID: 1
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
*********
Timeouts:
*********
Recovery Timeout: 5
Target Reset Timeout: 30
LUN Reset Timeout: 30
Abort Timeout: 15
*****
CHAP:
*****
username: <empty>
password: ********
username_in: <empty>
password_in: ********
************************
Negotiated iSCSI params:
************************
HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 262144
FirstBurstLength: 73728
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: Yes
MaxOutstandingR2T: 1
************************
Attached SCSI devices:
************************
Host Number: 1 State: running
scsi1 Channel 00 Id 0 Lun: 0
scsi1 Channel 00 Id 0 Lun: 1
Attached scsi disk sdb State: running
Target: iqn.2006-08.com.huawei:oceanstor:210074e9bfead535::22007:IPV6_GUA:0:0:0:b (non-flash)
Current Portal: [IPV6_GUA:0000:0000:0000:000b]:3260,8200
Persistent Portal: [IPV6_GUA:0:0:0:b]:3260,8200

**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1993-08.org.debian:01:28b9b142664
Iface IPaddress: [IPV6_GUA:0000:0000:0000:0012]

Iface HWaddress: default
Iface Netdev: default
SID: 2
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
*********
Timeouts:
*********
Recovery Timeout: 5
Target Reset Timeout: 30
LUN Reset Timeout: 30
Abort Timeout: 15
*****
CHAP:
*****
username: <empty>
password: ********
username_in: <empty>
password_in: ********
************************
Negotiated iSCSI params:
************************
HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 262144
FirstBurstLength: 73728
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: Yes
MaxOutstandingR2T: 1
************************
Attached SCSI devices:
************************
Host Number: 2 State: running
scsi2 Channel 00 Id 0 Lun: 0
scsi2 Channel 00 Id 0 Lun: 1
Attached scsi disk sdc State: running
Target: iqn.2006-08.com.huawei:oceanstor:210074e9bfead535::1020003:IPV6_GUA:0:0:0:c (non-flash)
Current Portal: [IPV6_GUA:0000:0000:0000:000c]:3260,14
Persistent Portal: [IPV6_GUA:0:0:0:c]:3260,14

**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1993-08.org.debian:01:28b9b142664
Iface IPaddress: [IPV6_GUA:0000:0000:0000:0012]

Iface HWaddress: default
Iface Netdev: default
SID: 3
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
*********
Timeouts:
*********
Recovery Timeout: 5
Target Reset Timeout: 30
LUN Reset Timeout: 30
Abort Timeout: 15
*****
CHAP:
*****
username: <empty>
password: ********
username_in: <empty>
password_in: ********
************************
Negotiated iSCSI params:
************************
HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 262144
FirstBurstLength: 73728
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: Yes
MaxOutstandingR2T: 1
************************
Attached SCSI devices:
************************
Host Number: 3 State: running
scsi3 Channel 00 Id 0 Lun: 0
scsi3 Channel 00 Id 0 Lun: 1
Attached scsi disk sdd State: running
Target: iqn.2006-08.com.huawei:oceanstor:210074e9bfead535::1022007:IPV6_GUA:0:0:0:d (non-flash)
Current Portal: [IPV6_GUA:0000:0000:0000:000d]:3260,8210
Persistent Portal: [IPV6_GUA:0:0:0:d]:3260,8210

**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1993-08.org.debian:01:28b9b142664
Iface IPaddress: [IPV6_GUA:0000:0000:0000:0012]

Iface HWaddress: default
Iface Netdev: default
SID: 4
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
*********
Timeouts:
*********
Recovery Timeout: 5
Target Reset Timeout: 30
LUN Reset Timeout: 30
Abort Timeout: 15
*****
CHAP:
*****
username: <empty>
password: ********
username_in: <empty>
password_in: ********
************************
Negotiated iSCSI params:
************************
HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 262144
FirstBurstLength: 73728
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: Yes
MaxOutstandingR2T: 1
************************
Attached SCSI devices:
************************
Host Number: 4 State: running
scsi4 Channel 00 Id 0 Lun: 0
scsi4 Channel 00 Id 0 Lun: 1
Attached scsi disk sde State: running

The source IP is "Iface IPaddress" ?
 
1] is LACP problem? We don't have dedicated 10G cards for iscsi on PVE hosts now
In general - no, its not a problem. I surmise that you, perhaps, are planning for network HA for other apps and intend to move iSCSI to dedicated network eventually. As I said, the double-hashing of packets may lead to unexpected results. I.e. even with updated IP scheme you may see all packets land on single physical interface.
2] if we switch from single to multiple subnets, will be 1] ok or will we need to be without LACP?
I suspect you will be ok. However, network design needs to take into consideration all parts of the equation : server, switches, clients. Only you have all the information.
3] will multiple subnets require multiple initiators per PVE?
Each client IP is an initiator in general. How everything gets configured depends in large part on your storage.
The source IP is "Iface IPaddress" ?
Yes. as expected - all sessions use the same IP as source, which explains your traffic distribution.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
After tests with dedicated network card i am able to get multipath only with dedicated subnets no matter if via openvswitch/lacp or dedicated card without any lacp.

Marking as solved for now.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!