URGENT - High IO delay 90+% on newly configured system

Ray_

Member
Nov 5, 2021
23
1
8
24
Hey,

over the weekend I brought our 2nd server online to redo the first one.
I moved all VMs/CTs to the 2nd server and deleted the first since everything seemed fine.
However when even the slightest IO operation occurs (e.g. opening a browser on a server) the IO delay skyrockets up to 90+%. Same ofc when backups are created. It goes back to 0% when the systems are idle.

I really need some sort of idea of what could cause this, since if the IO delay reaches 30+% websites become unavailable and other services come to a crawl!
The only real difference between these 2 servers is that the first one booted via iSCSI and not from a disk and that it didn't use multipath to connect to the main storage (I think).

Configuration:
1650312282374.png

Hard drives:
  1. Boot drive:
    2x NAS SSDs SATA in Raidz1 (Nothing besides some ISOs and the OS is stored here)
  2. VM/CT storage:
    Fujitsu Eternus DX100 via 2x 10 Gib/s fiber over iSCSI (multipathed which I think could be the problem) - 3TB Raid6 LVM-Thin
Multipath config:
Code:
defaults {
       user_friendly_names     yes
       polling_interval        2
       path_selector           "round-robin 0"
       path_grouping_policy    multibus
       path_checker            readsector0
       rr_min_io               100
       rr_weight               priorities
       failback                immediate
       no_path_retry           queue
}
blacklist {
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^hd[a-z][[0-9]*]"
        wwid    ".*"
}

blacklist_exceptions {
        wwid "3600000e00d28000000281cc800000000"
        wwid "3600000e00d28000000281cc800010000"
}
 
multipaths {
        multipath {
            # id retrieved with the utility /lib/udev/scsi_id
                wwid                    3600000e00d28000000281cc800000000
                alias                   pm2_main_mpath
        }
        multipath {
                wwid                    3600000e00d28000000281cc800010000
                alias                   pm2_ssd_mpath #nothing is on here yet
}


# Default from multipath -t

device {
                vendor "FUJITSU"
                product "ETERNUS_DX(H|L|M|400|8000)"
                path_grouping_policy "group_by_prio"
                prio "alua"
                failback "immediate"
                no_path_retry 10
}

multipath -ll
Code:
pm2_main_mpath (3600000e00d28000000281cc800000000) dm-1 FUJITSU,ETERNUS_DXL
size=3.2T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 12:0:0:0 sde 8:64  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 11:0:0:0 sdd 8:48  active ready running
pm2_ssd_mpath (3600000e00d28000000281cc800010000) dm-0 FUJITSU,ETERNUS_DXL
size=366G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 11:0:0:1 sdf 8:80  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 12:0:0:1 sdg 8:96  active ready running

LVM creation:
Code:
pvcreate /dev/mapper/pm2_main_mpath
vgcreate VMs_PM2 /dev/mapper/pm2_main_mpath
lvcreate -L 3.5T --thinpool main_thinpl_pm2 VMs_PM2

Standard VM config: (most of the VMs are the same)
Code:
agent: 1
bios: ovmf
boot: order=scsi0
cores: 10
cpu: host,flags=+aes
efidisk0: Thin_2:vm-101-disk-0,efitype=4m,format=raw,pre-enrolled-keys=1,size=528K
machine: pc-i440fx-6.1
memory: 16384
meta: creation-qemu=6.1.0,ctime=1640785552
name: exchange
net0: virtio=36:B1:CE:95:59:E3,bridge=vmbr0
numa: 1
onboot: 1
ostype: win10
scsi0: Thin_2:vm-101-disk-1,cache=writeback,discard=on,format=raw,size=700G
scsihw: virtio-scsi-pci
smbios1: uuid=8672338d-9c84-4758-a99c-1ef44c798e4b
sockets: 1
startup: order=2
vga: qxl
vmgenid: 30360784-fb3d-462d-b814-e3fb088992d1
 
Last edited:
Where does the IO delay "skyrockets"? In the VM or in the Proxmox host?

One issue (wich might not be related to this issue) that gets my attention is that your CPU (e5-2620 v3) has 6 cores + HT and the server uses 2 sockets. Your VM's configuration uses NUMA with 1 socket and 10 cores. That configuration will never properly match the NUMA of the hardware (which will probably be 2 NUMA nodes with 12 CPU's each, BUT remember that half of them are HT cores -> not as capable as "real" cores), as the VM will be force to use 4 HT cores from one socket. In the VM, either use 1 socket/6 cores or use 2 sockets/10 cores.
 
Where does the IO delay "skyrockets"? In the VM or in the Proxmox host?

One issue (wich might not be related to this issue) that gets my attention is that your CPU (e5-2620 v3) has 6 cores + HT and the server uses 2 sockets. Your VM's configuration uses NUMA with 1 socket and 10 cores. That configuration will never properly match the NUMA of the hardware (which will probably be 2 NUMA nodes with 12 CPU's each, BUT remember that half of them are HT cores -> not as capable as "real" cores), as the VM will be force to use 4 HT cores from one socket. In the VM, either use 1 socket/6 cores or use 2 sockets/10 cores.
On the host, that's why it's affecting all VMs.
I just set up my second server and it has the same Problem. The difference here: no multipathing, only one regular iSCSI connection.
 
If anyone else has a similar problem:
1. Check if the MTU to your storage is correct. In my case, Proxmox used 1500 and the storage used 1300. I changed everything to Jumboframed (9000 mtu)
2. Cachemode of Virtual Disks. I had writeback active on most of my VMs and that caused high IO-Delay when the cache got full and it had to flush to storage. Changing it to none reduced the write speed, but heavily stabilized the IO-Delay.
 
If anyone else has a similar problem:
1. Check if the MTU to your storage is correct. In my case, Proxmox used 1500 and the storage used 1300. I changed everything to Jumboframed (9000 mtu)
2. Cachemode of Virtual Disks. I had writeback active on most of my VMs and that caused high IO-Delay when the cache got full and it had to flush to storage. Changing it to none reduced the write speed, but heavily stabilized the IO-Delay.
Please say how to check MTU to storage you mention.
My issue was I forgot to uncheck ceph noOut after I did maintenance - duh... but found your post and am wondering what exactly you mean. MTU to storage... ?
 
but found your post and am wondering what exactly you mean. MTU to storage... ?
Normally with iSCSI, you have a dedicated storage network with a dedicated storage nic that is an ordinary network card. You can check if ifconfig or ip a.
You should check your SAN as well, but that relies heavily on the SAN itself.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!