Storage (san fibre channel) and load sharing

Goran Skular · Mar 17, 2016

Hi,
does anybody have some experience using proxmox with san? I have a san storage, connected with fibre channel hba cards (dual, with multipath) to multiple pve servers sharing the same storage group / pvs for lvm storage.

What bothers me is that one virtual machine or pve host can saturate the san leaving other machines unresponsive.

Is there a way to change a scheduler or something? Disk throttle is not a solution.... some fair load sharing would be a way better.

PVE Version: 4.1-15/8cd55b52 is using deadline:

Code:

$cat /sys/block/sda/queue/scheduler
noop [deadline] cfq

LnxBil · Mar 17, 2016

Hi,

I'm running my PVE cluster also with a san-backed storage over LVM. I have multiple LUNs with multiple volume groups (RAID10 and RAID50 volumes from SAN) and therefore never encountered such a problem. I'm also not able to reproduce.

What did you do in your VM to saturate your links?
Have you tried Disk Throttling (Tab Hardware of the VM)?

Goran Skular · Mar 17, 2016

Thanks,

I made only two LUNs (on two different RAID volumes) as I do not have a large number of disks available. 6 PVE hosts with a dozen guest machines are using the same LUN with shared LVM storage on that LUN. I tought that PVE will manage fair usage distribution amongs them.

Under normal work load everything is fine, but when I am restoring a backup image... everything stops. Also, I can easily reproduce it with doing a dd write to disk inside a guest or on a host.

I tried Disk Throttling, and it's a good workaround for limiting guests... to be sure they don't eat all resources. But I am wondering if there is a better solution. Also, restoring from backup can not be limited by disk throttling.

LnxBil · Mar 17, 2016

I do dd's inside my VMs regularly (writing zeros) and never had problems. I also backup my whole cluster at once, so every node reads from SAN.

Here is an excerpt from my multipath output:

Code:

EVA6400_PROXMOX_DATA_FAST_02 (3AABB....ZZ) dm-2 HP,HSV400
size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=12 status=active
| |- 10:0:1:2 sdg  8:96   active ready running
| |- 10:0:3:2 sdo  8:224  active ready running
| |- 11:0:0:2 sds  65:32  active ready running
| `- 11:0:2:2 sdaa 65:160 active ready running
`-+- policy='round-robin 0' prio=2 status=enabled
  |- 10:0:0:2 sdc  8:32   active ready running
  |- 10:0:2:2 sdk  8:160  active ready running
  |- 11:0:1:2 sdw  65:96  active ready running
  `- 11:0:3:2 sdae 65:224 active ready running

If I read via dd I can see that the round-robin is working and all paths are used. If I create multiple reading streams via dd on different volumes on the same VG, the bandwidth is aggregated and it maxes on a little bit under 600 MB/sec:

Code:

----total-cpu-usage---- -dsk/total----dsk/sdf-----dsk/sdn-----dsk/sdr-----dsk/sdz-- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ: read  writ: read  writ: read  writ: read  writ| recv  send|  in   out | int   csw
  0   4  87   8   0   0| 552M  100k: 138M    0 : 138M    0 : 138M    0 : 138M    0 |  34k   50k|   0     0 |  13k   20k
  1   4  87   8   0   0| 558M  180k: 140M    0 : 140M    0 : 140M    0 : 139M    0 |  11k   20k|   0     0 |  13k   20k
  1   4  87   8   0   0| 593M 1288k: 148M    0 : 148M    0 : 148M    0 : 148M    0 |  74k  263k|   0     0 |  16k   25k
  1   5  86   9   0   0| 571M   60k: 144M    0 : 141M    0 : 143M    0 : 142M    0 |  71k  136k|   0     0 |  16k   26k
  0   4  88   8   0   0| 571M   16k: 143M    0 : 143M    0 : 143M    0 : 143M    0 |1637B 1992B|   0     0 |  13k   20k

While the operation run, I cannot notice any slowness of other VMs.

Maybe some configuration issues on the switch or SAN?

Goran Skular · Mar 17, 2016

I'll try to find something, and if I do, I'll post it here.
Can you tell me how you designed the LUNs? For those speeds you have a lot of spindles.. Also, do you use one LUN/PV for multiple guests (just letting pve to create new LVs) or you always provision new ones as needed?
Thank you!

LnxBil · Mar 17, 2016

Goran Skular said:
I'll try to find something, and if I do, I'll post it here.

That would be nice for others.

Goran Skular said:
Can you tell me how you designed the LUNs? For those speeds you have a lot of spindles.. Also, do you use one LUN/PV for multiple guests (just letting pve to create new LVs) or you always provision new ones as needed?
Thank you!

I have currently only 48 (15k rpm) disks in use and 4 LUNs for my cluster (started with 2) and only two volume groups (FAST and SLOW) with RAID10 and RAID50. I configures 2 storage entries (one for each VG) and let Proxmox create all logical volumes for me on the fly. Works like a charm.

One of the nodes in the cluster has also local storage (6x 960 GB SSD) with at maximum 2,5 GB/sec with multiple streams, so my SAN is not that fast :-D

Search

Search

Storage (san fibre channel) and load sharing

Goran Skular

Renowned Member

LnxBil

Distinguished Member

Goran Skular

Renowned Member

LnxBil

Distinguished Member

Goran Skular

Renowned Member

LnxBil

Distinguished Member

We value your privacy