iSCSI Performance tests

chrisalavoine · Mar 18, 2011

Hi all,

All my hosts are running Proxmox 1.7:

Code:

pve-manager: 1.7-11 (pve-manager/1.7/5470)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.7-30
pve-kernel-2.6.32-4-pve: 2.6.32-30
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4

I have 4 x Dell R410 connected to a Dell Equallogic PS4000XV SAN (16 x 450GB 15K SAS) at one site, and a single Dell R510 with 6TB's of local storage (6 x 2TB 7.2K SATA's broken into 3 x 2TB RAID 1's)

I've been running some dd tests as we're thinking of expanding the SAN and have got some rather worrying results.

Have tested on various SAN guests (ubuntu, CentOS) and am getting the following results across the board:

dd if=/dev/zero of=/data/13GBfile bs=128k count=100K conv=fdatasync
102400+0 records in
102400+0 records out
13421772800 bytes (13 GB) copied, 214.117 seconds, 62.7 MB/s

And the same test on my supposedly inferior R510 setup:

dd if=/dev/zero of=/data/13GBfile bs=128k count=100K conv=fdatasync
102400+0 records in
102400+0 records out
13421772800 bytes (13 GB) copied, 90.393 s, 148MB/s

Have read a lot about BBU recently and my R410's don't have BBU (not enough room with the extra NIC's for the SAN connections). The R510 does have a BBU and I wonder if this is where the problem lies. Although, my understanding is that if your storage is on a fast SAN you shouldn't need BBU on the host?

Any help or guidance much appreciated.

c

MrJacK · Mar 18, 2011

Could you post your multipath.conf configuration file and also the result of cmd "multipath -ll" ?
Are you using IDE or VIRTIO disks for the guest ?

chrisalavoine · Mar 18, 2011

MrJacK said:
Could you post your multipath.conf configuration file and also the result of cmd "multipath -ll" ?
Are you using IDE or VIRTIO disks for the guest ?

Hiya,

Thanks for the swift reply.

multipath -ll is as follows:

36090a068109cd9ce8fc0d42900008022dm-10 EQLOGIC ,100E-00
[size=50G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 12:0:0:0 sde 8:64 [active][ready]
36090a068109c591a81bc84010000a0fedm-9 EQLOGIC ,100E-00
[size=500G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 14:0:0:0 sdg 8:96 [active][ready]
36090a068109c99838fd77488000060a5dm-39 ,
[size=200G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
\_ #:#:#:# - #:# [failed][faulty]
36090a068109ca9e9c0bed41b00008052dm-7 EQLOGIC ,100E-00
[size=1.1T][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 17:0:0:0 sdi 8:128 [active][ready]
36090a068109c39b0d7cd4461000040cbdm-6 EQLOGIC ,100E-00
[size=100G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 13:0:0:0 sdf 8:80 [active][ready]
36090a068109cf9a0a2c0342b0000c09cdm-13 ,
[size=420G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
\_ #:#:#:# - #:# [failed][faulty]
36090a068109c09cc8fc0a429000000a7dm-3 EQLOGIC ,100E-00
[size=800G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 8:0:0:0 sdc 8:32 [active][ready]
36090a068109c79ce63c254330000e040dm-5 EQLOGIC ,100E-00
[size=30G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 7:0:0:0 sdd 8:48 [active][ready]
36090a068109c999bded7248a000000d1dm-42 EQLOGIC ,100E-00
[size=210G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 20:0:0:0 sdj 8:144 [active][ready]

I currently don't have a multipath.conf file installed as I had read somewhere that it wasn't always needed and seeing as my multipath command seemed to be working I didn't bother.

chrisalavoine · Mar 18, 2011

Forgot to mention. All guests are using VIRTIO drivers for both disk and interface.

Am also using "cache=none" on all my guest configs.

MrJacK · Mar 18, 2011

multipath.conf file is VERY important.
Try something like this :

blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
devnode "^sda"
devnode "^sda[0-9]"
}
devices {
device {
vendor "EQLOGIC"
product "100E-00"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id /block/%n"
path_checker readsector0
failback immediate
no_path_retry fail
path_selector "round-robin 0"
rr_min_io 8
rr_weight priorities
}
}

DON'T DO THIS ON A HOST WITH RUNNING GUEST !!

The must will be to try on a new LUN if you can.
Since i don't have a SAN like you i can't test before.

To apply modifications :
/etc/init.d/multipath-tools reload

And after that see result again with multipath -ll

spirit · Mar 18, 2011

don-t forget to tune

you network card to MTU 9000
and your /etc/sysctl.conf,

here mine (i can saturate my 2 gigabit links with multipath, around 220 mb/s)

Code:

# turns TCP timestamp support off, default 1, reduces CPU use
net.ipv4.tcp_timestamps = 0
# turn SACK support off, default on
net.ipv4.tcp_sack = 0
### tunning taille de fenetre

# maximum receive socket buffer size, default 131071
net.core.rmem_max = 16777216
# maximum send socket buffer size, default 131071
net.core.wmem_max = 16777216

# default receive socket buffer size, default 65535
net.core.rmem_default = 524287
# default send socket buffer size, default 65535
net.core.wmem_default = 524287

# maximum amount of option memory buffers, default 10240
net.core.optmem_max = 524287
# number of unprocessed input packets before kernel starts dropping them, default 300
net.core.netdev_max_backlog = 300000

net.ipv4.tcp_rmem = 4096 524287 16777216
net.ipv4.tcp_wmem = 4096 524287 16777216


net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1

chrisalavoine · Mar 19, 2011

Hiya,

Thanks for all these tips!

Didn't have any joy with the multipath.conf settings. I don't get any response from "multipath -ll" with those settings. Is this because I have 4 hosts in a cluster? Do all hosts have to be identical?

Both my SAN network connections are set to 9000 MTU

Will try out those /etc/sysctl.conf settings.

Thanks,
c

copymaster · Mar 23, 2011

hi!

Well i don't know about the multipathing things (but i am learning actually)

But what from what i know i can say: A BBU is definitive a great performance point

I realised it nearly the same way..

I had several servers, some with and some without BBU. The controllers without BBU had two problems (maybe you can check this)

1) Write caching was enabled (which is no good without BBU, it comes to great TIMEOUTS, as you can see by using ATOP)
2) Without writecache enabled, the timeouts where not so great but still there, the problem is how the controller works. depends on the controller

greetings

chrisalavoine · Mar 23, 2011

copymaster said:
hi!

Well i don't know about the multipathing things (but i am learning actually)

But what from what i know i can say: A BBU is definitive a great performance point

I realised it nearly the same way..

I had several servers, some with and some without BBU. The controllers without BBU had two problems (maybe you can check this)

1) Write caching was enabled (which is no good without BBU, it comes to great TIMEOUTS, as you can see by using ATOP)
2) Without writecache enabled, the timeouts where not so great but still there, the problem is how the controller works. depends on the controller

greetings

Hi there,

You don't mention whether or what type of SAN you have. My understanding was that if your SAN held all your VM Storage then a BBU on the host would be redundant.

Unfortunately, all my Dell R410's can't be fitted with a BBU as the PCI slot is used to give me extra NIC connections to the SAN. Problem.

Chris.

chrisalavoine · Mar 23, 2011

Hi,

Have made some progress. My multipath -ll now looks like this:

36090a068109cd9ce8fc0d42900008022dm-17 EQLOGIC ,100E-00
[size=50G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
\_ 24:0:0:0 sds 65:32 [active][ready]
\_ 23:0:0:0 sdq 65:0 [active][ready]
36090a068109c591a81bc84010000a0fedm-3 EQLOGIC ,100E-00
[size=500G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
\_ 10:0:0:0 sde 8:64 [active][ready]
\_ 6:0:0:0 sdb 8:16 [active][ready]
36090a068109ca9e9c0bed41b00008052dm-4 EQLOGIC ,100E-00
[size=1.1T][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
\_ 11:0:0:0 sdf 8:80 [active][ready]
\_ 7:0:0:0 sdc 8:32 [active][ready]
36090a068109c39b0d7cd4461000040cbdm-11 EQLOGIC ,100E-00
[size=100G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
\_ 14:0:0:0 sdi 8:128 [active][ready]
\_ 13:0:0:0 sdh 8:112 [active][ready]
36090a068109c09cc8fc0a429000000a7dm-14 EQLOGIC ,100E-00
[size=800G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
\_ 19:0:0:0 sdp 8:240 [active][ready]
\_ 20:0:0:0 sdn 8:208 [active][ready]
36090a068109c19fd9dd8248d00002067dm-12 EQLOGIC ,100E-00
[size=30G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
\_ 18:0:0:0 sdm 8:192 [active][ready]
\_ 17:0:0:0 sdk 8:160 [active][ready]
36090a068109c59e992d604840000a07cdm-6 EQLOGIC ,100E-00
[size=850G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
\_ 8:0:0:0 sdd 8:48 [active][ready]
\_ 12:0:0:0 sdg 8:96 [active][ready]
36090a068109c79ce63c254330000e040dm-13 EQLOGIC ,100E-00
[size=30G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
\_ 16:0:0:0 sdj 8:144 [active][ready]
\_ 15:0:0:0 sdl 8:176 [active][ready]
36090a068109c999bded7248a000000d1dm-16 EQLOGIC ,100E-00
[size=210G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
\_ 21:0:0:0 sdo 8:224 [active][ready]
\_ 22:0:0:0 sdr 65:16 [active][ready]

My problem now is that I can't seem to get my second NIC (eth3) to join the party. The most I've been able to achieve with dd is 114 MB/Sec. I think I should be able to double this if my second NIC was working.

When I look at atop whilst running the dd command I can see two disks being used, and eth2 is maxed out in red whereas eth3 doesn't appear.

I feel like I'm pretty close now. Any one have any suggestions?

Chris.

chrisalavoine · Mar 23, 2011

More info:

I've added the SAN ifaces as follows:

iscsiadm -m iface -I eth3 -o new
iscsiadm -m iface -I eth2 -o new

and

iscsiadm -m iface -I eth3 --op=update --name iface.net_ifacename -v eth3
iscsiadm -m iface -I eth2 --op=update --name iface.net_ifacename -v eth2

This was recommended on the Equallogic site.

I then try to connect to a target:

iscsiadm -m node --targetname "iqn.2001-05.com.equallogic:0-8a0906-fd199c106-672000008d24d89d-test" --portal "192.168.20.11:3260" --login

eth2 seems to log in fine but it gets stuck on eth3 and times out.

I can still start the VM that resides on this volume but when I look at atop on the host only eth2 is being used.

spirit · Mar 24, 2011

do you have 2 differents target ip in 2 differents network ?

exemple :

192.168.20.11 (255.255.255.0)

and

192.162.21.11 (255.255.255.0).

You really need 2 differents networks (path) for multipath.

chrisalavoine · Mar 24, 2011

spirit said:
do you have 2 differents target ip in 2 differents network ?

exemple :

192.168.20.11 (255.255.255.0)

and

192.162.21.11 (255.255.255.0).

You really need 2 differents networks (path) for multipath.

Hi there,

Thanks for that! No, I didn't realise they had to be on different subnets. I have a second switch but it's on the same subnet. Are you certain that works with Equallogic SAN? I haven't seen anything in the docs about having separate subnets.

Chris.

MrJacK · Mar 24, 2011

Maybe this can help you :

http://attachments.wetpaintserv.us/pnQ0y4ZuIGot8nntAORFyg215741

This is for RHEL, but most can be applied to an other distro.

chrisalavoine · Mar 24, 2011

MrJacK said:
Maybe this can help you :

http://attachments.wetpaintserv.us/pnQ0y4ZuIGot8nntAORFyg215741

This is for RHEL, but most can be applied to an other distro.

Heh - this is the doc I've been working from. I can't see any mention of having two subnets though.

Chris.

chrisalavoine · Mar 25, 2011

Should I be able to ping my SAN from the secondary NIC? (because at the moment I can't).

This is my network conf:

auto lo
iface lo inet loopback

auto eth1
iface eth1 inet static
address 192.168.20.120
netmask 255.255.255.0
network 192.168.20.0
broadcast 192.168.20.255
mtu 9000

auto eth3
iface eth3 inet static
address 192.168.20.122
netmask 255.255.255.0
network 192.168.20.0
broadcast 192.168.20.255
mtu 9000

auto vmbr0
iface vmbr0 inet static
address 192.168.16.253
netmask 255.255.252.0
gateway 192.168.16.1
bridge_ports eth0
bridge_stp off
bridge_fd 0

If I ping -I eth1 I get a response from 192.168.20.11 (my SAN group). However, if I ping -I eth3 I get nothing. I guess this is where the problem lies.

I have added the interfaces using the "iscsiadm -m iface -I eth3 -o new" as detailed above. Not sure what else to try really.

Chris.

chrisalavoine · Mar 28, 2011

As an addendum to this:

I used to have eth1 and eth3 plugged into separate Cisco switches with an LACP aggregation link between them, but have recently removed this and have both plugged into the same switch (just to remove a layer of complexity until I get multipathing working ok).

spirit · Mar 28, 2011

if you shutdown your eth1 (ifdown eth1), can you ping your san with eth3 ?

chrisalavoine · Mar 28, 2011

spirit said:
if you shutdown your eth1 (ifdown eth1), can you ping your san with eth3 ?

Yes. If I shutdown eth1 I can ping using eth3.

spirit · Mar 28, 2011

try with 2 subnets

i'm pretty sure your packets always going out through eth1.

iSCSI Performance tests

Renowned Member

MrJacK

Guest

Renowned Member

Renowned Member

MrJacK

Guest

Distinguished Member

Renowned Member

Member

Renowned Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

MrJacK

Guest

Renowned Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

We value your privacy