URGENT - High IO delay 90+% on newly configured system

Ray_ · Apr 18, 2022

Hey,

over the weekend I brought our 2nd server online to redo the first one.
I moved all VMs/CTs to the 2nd server and deleted the first since everything seemed fine.
However when even the slightest IO operation occurs (e.g. opening a browser on a server) the IO delay skyrockets up to 90+%. Same ofc when backups are created. It goes back to 0% when the systems are idle.

I really need some sort of idea of what could cause this, since if the IO delay reaches 30+% websites become unavailable and other services come to a crawl!
The only real difference between these 2 servers is that the first one booted via iSCSI and not from a disk and that it didn't use multipath to connect to the main storage (I think).

Configuration:

Hard drives:

Boot drive:
2x NAS SSDs SATA in Raidz1 (Nothing besides some ISOs and the OS is stored here)
VM/CT storage:
Fujitsu Eternus DX100 via 2x 10 Gib/s fiber over iSCSI (multipathed which I think could be the problem) - 3TB Raid6 LVM-Thin

Multipath config:

Code:

defaults {
       user_friendly_names     yes
       polling_interval        2
       path_selector           "round-robin 0"
       path_grouping_policy    multibus
       path_checker            readsector0
       rr_min_io               100
       rr_weight               priorities
       failback                immediate
       no_path_retry           queue
}
blacklist {
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^hd[a-z][[0-9]*]"
        wwid    ".*"
}

blacklist_exceptions {
        wwid "3600000e00d28000000281cc800000000"
        wwid "3600000e00d28000000281cc800010000"
}
 
multipaths {
        multipath {
            # id retrieved with the utility /lib/udev/scsi_id
                wwid                    3600000e00d28000000281cc800000000
                alias                   pm2_main_mpath
        }
        multipath {
                wwid                    3600000e00d28000000281cc800010000
                alias                   pm2_ssd_mpath #nothing is on here yet
}


# Default from multipath -t

device {
                vendor "FUJITSU"
                product "ETERNUS_DX(H|L|M|400|8000)"
                path_grouping_policy "group_by_prio"
                prio "alua"
                failback "immediate"
                no_path_retry 10
}

multipath -ll

Code:

pm2_main_mpath (3600000e00d28000000281cc800000000) dm-1 FUJITSU,ETERNUS_DXL
size=3.2T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 12:0:0:0 sde 8:64  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 11:0:0:0 sdd 8:48  active ready running
pm2_ssd_mpath (3600000e00d28000000281cc800010000) dm-0 FUJITSU,ETERNUS_DXL
size=366G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 11:0:0:1 sdf 8:80  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 12:0:0:1 sdg 8:96  active ready running

LVM creation:

Code:

pvcreate /dev/mapper/pm2_main_mpath
vgcreate VMs_PM2 /dev/mapper/pm2_main_mpath
lvcreate -L 3.5T --thinpool main_thinpl_pm2 VMs_PM2

Standard VM config: (most of the VMs are the same)

Code:

agent: 1
bios: ovmf
boot: order=scsi0
cores: 10
cpu: host,flags=+aes
efidisk0: Thin_2:vm-101-disk-0,efitype=4m,format=raw,pre-enrolled-keys=1,size=528K
machine: pc-i440fx-6.1
memory: 16384
meta: creation-qemu=6.1.0,ctime=1640785552
name: exchange
net0: virtio=36:B1:CE:95:59:E3,bridge=vmbr0
numa: 1
onboot: 1
ostype: win10
scsi0: Thin_2:vm-101-disk-1,cache=writeback,discard=on,format=raw,size=700G
scsihw: virtio-scsi-pci
smbios1: uuid=8672338d-9c84-4758-a99c-1ef44c798e4b
sockets: 1
startup: order=2
vga: qxl
vmgenid: 30360784-fb3d-462d-b814-e3fb088992d1

VictorSTS · Apr 19, 2022

Where does the IO delay "skyrockets"? In the VM or in the Proxmox host?

One issue (wich might not be related to this issue) that gets my attention is that your CPU (e5-2620 v3) has 6 cores + HT and the server uses 2 sockets. Your VM's configuration uses NUMA with 1 socket and 10 cores. That configuration will never properly match the NUMA of the hardware (which will probably be 2 NUMA nodes with 12 CPU's each, BUT remember that half of them are HT cores -> not as capable as "real" cores), as the VM will be force to use 4 HT cores from one socket. In the VM, either use 1 socket/6 cores or use 2 sockets/10 cores.

Ray_ · Apr 19, 2022

VictorSTS said:
Where does the IO delay "skyrockets"? In the VM or in the Proxmox host?

One issue (wich might not be related to this issue) that gets my attention is that your CPU (e5-2620 v3) has 6 cores + HT and the server uses 2 sockets. Your VM's configuration uses NUMA with 1 socket and 10 cores. That configuration will never properly match the NUMA of the hardware (which will probably be 2 NUMA nodes with 12 CPU's each, BUT remember that half of them are HT cores -> not as capable as "real" cores), as the VM will be force to use 4 HT cores from one socket. In the VM, either use 1 socket/6 cores or use 2 sockets/10 cores.

On the host, that's why it's affecting all VMs.
I just set up my second server and it has the same Problem. The difference here: no multipathing, only one regular iSCSI connection.

Ray_ · May 11, 2022

If anyone else has a similar problem:
1. Check if the MTU to your storage is correct. In my case, Proxmox used 1500 and the storage used 1300. I changed everything to Jumboframed (9000 mtu)
2. Cachemode of Virtual Disks. I had writeback active on most of my VMs and that caused high IO-Delay when the cache got full and it had to flush to storage. Changing it to none reduced the write speed, but heavily stabilized the IO-Delay.

GoZippy · Nov 2, 2022

Ray_ said:
If anyone else has a similar problem:
1. Check if the MTU to your storage is correct. In my case, Proxmox used 1500 and the storage used 1300. I changed everything to Jumboframed (9000 mtu)
2. Cachemode of Virtual Disks. I had writeback active on most of my VMs and that caused high IO-Delay when the cache got full and it had to flush to storage. Changing it to none reduced the write speed, but heavily stabilized the IO-Delay.

Please say how to check MTU to storage you mention.
My issue was I forgot to uncheck ceph noOut after I did maintenance - duh... but found your post and am wondering what exactly you mean. MTU to storage... ?

LnxBil · Nov 3, 2022

GoZippy said:
but found your post and am wondering what exactly you mean. MTU to storage... ?

Normally with iSCSI, you have a dedicated storage network with a dedicated storage nic that is an ordinary network card. You can check if ifconfig or ip a.
You should check your SAN as well, but that relies heavily on the SAN itself.

Search

Search

URGENT - High IO delay 90+% on newly configured system

Ray_

Member

VictorSTS

Famous Member

Ray_

Member

Ray_

Member

GoZippy

Member

LnxBil

Distinguished Member