Disk slows down after resize (ceph)

mart.v

Well-Known Member
Mar 21, 2018
32
0
46
44
Hi all,

I'm facing a strange problem. I'm using latest Proxmox with Ceph storage backend (SSD only), 10Gbit network, KVM virtualization, CentOS in guest.

When I create a fresh VM with 10 GB attached Ceph storage (cache disabled, virtio drivers), I'm getting roughly these speeds in fio:

READ: bw=115MiB/s (120MB/s), 115MiB/s-115MiB/s (120MB/s-120MB/s), io=1209MiB (1267MB), run=10552-10552msec
WRITE: bw=49.0MiB/s (51.4MB/s), 49.0MiB/s-49.0MiB/s (51.4MB/s-51.4MB/s), io=518MiB (543MB), run=10552-10552msec

After resizing storage to 100 GB (I only resize attached image in proxmox interface, I do not touch filesystem/partition table, so inside the guest there is still 10 GB partition), fio benchmark drops to:

READ: bw=20.7MiB/s (21.7MB/s), 20.7MiB/s-20.7MiB/s (21.7MB/s-21.7MB/s), io=504MiB (529MB), run=24359-24359msec
WRITE: bw=9039KiB/s (9256kB/s), 9039KiB/s-9039KiB/s (9256kB/s-9256kB/s), io=215MiB (225MB), run=24359-24359msec

No other changes were made to the system (reboot, etc.). Proxmox is running in test mode and no other VMs have impact on cluster performance (=there is no other workload).

Thank you for your tips / advice.
 
This is strange. for new space, it could be explain because objects are not yet allocated on ceph. But as you don't have resize the fs, it can't be that.

do you have still performance problem after stop/start of the vm ?
 
Thanks for the reply. Yes, the problem persists after stop/start. I tried to run it multiple times and the results were similar.

I run only 3 node cluster with 2 osds per host. Drives are Intel S4500. But I dont know if this is relevant to my problem.
 
How does your fio benchmark look like?
 
fio --filename=/dev/sda --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=8k --rwmixread=70 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=8k7030test

But I started to investigate this issue after I found out that regular work with disk is very slow. So I believe that this is not an issue with benchmark.
 
When I tried to increase the block size to 128k, I'm getting on the 10GB VM speeds like:
READ: bw=592MiB/s (621MB/s), 592MiB/s-592MiB/s (621MB/s-621MB/s), io=16.9GiB (18.2GB), run=29251-29251msec
WRITE: bw=253MiB/s (265MB/s), 253MiB/s-253MiB/s (265MB/s-265MB/s), io=7393MiB (7752MB), run=29251-29251msec

On the "big" 100GB VM results are like this:
READ: bw=152MiB/s (159MB/s), 152MiB/s-152MiB/s (159MB/s-159MB/s), io=5414MiB (5677MB), run=35724-35724msec
WRITE: bw=64.0MiB/s (68.2MB/s), 64.0MiB/s-64.0MiB/s (68.2MB/s-68.2MB/s), io=2322MiB (2435MB), run=35724-35724msec
 
can you try a bench 100% read and another 100%write, to see the difference ?

Hmm, interesting. It seems that there is no difference

fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k --numjobs=4 --size=2G --runtime=600 --group_reporting
Gives me about 50-60 MB/s on both VM's


fio --name=seqread --rw=read --direct=1 --ioengine=libaio --bs=8k --numjobs=8 --size=1G --runtime=600 --group_reporting
Gives me about 100-110 MB/s on both VM's

But the original command (fio --filename=/dev/sda --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=8k --rwmixread=70 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=8k7030test) still shows big difference.