high io delay with iscsi

Ronny

Well-Known Member
Sep 12, 2017
59
4
48
41
hi,

my config:
3 nodes-cluster (4.4.1) with 1 OMV-iscsi-Storage (OpenMediaVault) -> 14x 1TB SSD Raid-5

every time i restore an VM (from NFS or Lokal) to the ISCSI-Storage the proxmox-node becomes very IO-Delay (arround 20-50%). Sometimes other VMs on this Storage (and on this cluster) hangs and remount their volumes as read-only!

last week this happened also with a WindowsServer2016-SQL-VM - SQL-DB Restore inside VM and 50% IO-Delay.

I think my config is very simple:
ISCSI (not use directly)
LVM on this ISCSI (shared)
no multipath, is use LACP (Balance SLB) on Proxmox (2 Interfaces) and Storage-Side (4 Interfaces)

any suggestions?
 
the VMs remounting their volumes as read only is an indication that you're trying to push to much IO through the pipe

I would advise you to monitor the latency while doing a restore ( with ping) and the io latency with ioping

how much VMs do you have and which link to do you have to your storage ?
 
Hi manu,

thanks for your answer.

i will run a test with ping and ioping - from proxmox node to storage, right?

there are around 15-20 VMs per Node. all nodes are connected with two 1GBit in LACP-Bond. The Storage the same, but with 4GBit
 
that can be really a lot of VMs for such a small pipe.

Also reading the LACP entry in wikipedia:
"This selects the same NIC slave for each destination MAC address"

which means all your outgoing writes to the iscsi target might go through 1 single GB link ....

I think you system is in danger of a "boot storm", if you reboot all the nodes at the same time, as each VM needs to read between 300 MB ( Linux ) and 1GB on boot. your system will never be able to sustain that.
 
Hi Manu,

ping - (node to storage)
idle: 0.150ms
on VM restore: 2.200ms (!)

ioping - on the proxmox node i only see the vm-lv's and not the "real" lvm of the storage. how can i monitor the iscsi-connection?

OK, i see LACP ist not the right way for ISCSI.
But will it be better with ISCSI-MPIO? Or what can i do?
My prefered way is to use CEPH - in the future
 
i have tested with ISCIS-MPIO (2 Gbit in different Networks on both sides), but also 20-50% IO-Delay/WA

any suggestions about?