[7.2-7] IO problems

otto001 · Aug 8, 2022

Hi,

I have been using pve now for some years to run some linux VMs on my home server.
It has a hw-raid mainly for data plus two ssds (also as raid on a different controller) for os and most of the VMs.
Now, I am restoring a VM from backup using qmrestore and the host (including VMs) is almost inaccessible, really bad lags (I am talking of minutes in which the system does not respond to input on ssh sessions, mail server and other systems like home automation too).
I have:

Code:

root@maul:~# cat /sys/block/sdb/queue/scheduler
[mq-deadline] none

the same for sda which is the data raid (sdb is boot).
Would be nice to get some hints how to avoid this, cause its annoying...
The hardware of the host is not too bad (X10DRH-ILN4, 64GB RAM, E5-2637 v3), so I do not think this is the problem.

Best regards and thanks in advance,
Otto

otto001 · Aug 8, 2022

Hm. Update:
seems that the proxmox-wiki here https://pve.proxmox.com/wiki/IO_Scheduler is outdated. If I got it right,cfq is not supported anymore by kernel since version 5.
Furthermore it seems that in some situation it makes sense to set different schedulers for different setups (mix of SSDs and regular disks/raid).
Does anyone know best practise for setting scheduler by disk with proxmox?
Best regards and thanks in advance,
Otto

LnxBil · Aug 8, 2022

otto001 said:
The hardware of the host is not too bad (X10DRH-ILN4, 64GB RAM, E5-2637 v3), so I do not think this is the problem.

The problem is the hardware, but not the one you provided: the disks. Your disks and/or controller is not able to handle the load. What disks do you use? With qmrestore, you should use the --bwlimit option.

otto001 · Aug 8, 2022

Hi,

thanks for your answer. Well, I forgot to mention that I also have lags during normal operation which should not occure in my eyes.
bwlimit is a good hint for future qmrestore operations, thank you!
regarding the hardware:
I am using a MegaRaid SAS 9240-4i with two Sandisk SSDs on 6.0Gb/s in a raid-1 as root-disks and for most of my VMs.
Additionally (and this is where the dump was stored in my case) I am using an Adaptec 5805 controller with 3 TOSHIBA HDWD130 disks configured as RAID5. So I think this should be a quite stable configuration if there are no really heavy IO-operations?

otto001 · Aug 8, 2022

Ahm. and 114 minutes for a restore of a 44GB disk from the "normal" disk raid to the ssds? plus really heavy lags? I can not imagine that my hardware is SO slow....

LnxBil · Aug 8, 2022

otto001 said:
Ahm. and 114 minutes for a restore of a 44GB disk from the "normal" disk raid to the ssds? plus really heavy lags? I can not imagine that my hardware is SO slow....

What type of sandisk? I've never seen enterprise hardware from sandisk.

otto001 · Aug 8, 2022

of course it is no enterprise hardware. it is a home-server. nevertheless this should not matter here. nearly 2 hours for 44GB??

otto001 · Aug 8, 2022

just for completeness:

Code:

SSD-Raid (RAID1)
root@maul:~# hdparm -Ttv /dev/sdb1
/dev/sdb1:
 multcount     =  0 (off)
 readonly      =  0 (off)
 readahead     = 8192 (on)
 geometry      = 115718/255/63, sectors = 1859012607, start = 1
 Timing cached reads:   25806 MB in  1.99 seconds = 12972.64 MB/sec
 Timing buffered disk reads: 1926 MB in  3.00 seconds = 641.97 MB/sec

root@maul:~# hdparm -Ttv /dev/sda1
classical-disk raid (RAID5)
/dev/sda1:
 multcount     =  0 (off)
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 728421/255/63, sectors = 11702085599, start = 2048
 Timing cached reads:   26966 MB in  1.99 seconds = 13558.92 MB/sec
 Timing buffered disk reads: 1038 MB in  3.00 seconds = 345.95 MB/sec

should be both ok for my needs.

LnxBil · Aug 8, 2022

otto001 said:
should be both ok for my needs.

Sequential read is not what you need here. Please do a proper fio test.

otto001 said:
of course it is no enterprise hardware. it is a home-server.

My home servers all have enterprise SSD, because consumer and prosumer suck very bad and your "error pattern" is exacly a non-enterprise SSD.

otto001 · Aug 8, 2022

what would you suggest as "proper"?

LnxBil · Aug 8, 2022

otto001 said:
what would you suggest as "proper"?

Any enterprise hardware that you cannot buy in a normal store. For older hardware like your X10, I can recommend just to buy used enterprise SSDs, e.g. Samsung SM/PM series. An older list can be found here.

otto001 · Aug 8, 2022

LnxBil said:
Any enterprise hardware that you cannot buy in a normal store. For older hardware like your X10, I can recommend just to buy used enterprise SSDs, e.g. Samsung SM/PM series. An older list can be found here.

The question was regarding your proper fio test. Which parameters would you suggest as proper?

LnxBil · Aug 8, 2022

otto001 said:
The question was regarding your proper fio test. Which parameters would you suggest as proper?

The parameters that are stated in the linked list of SSDs. You can see how your SSD ranks afterwards.

otto001 · Aug 9, 2022

I changed the command a little bit as my SSDs are in use, so I used a dir for testing, but I am not sure how to interpret what I see:

Code:

root@maul:~# fio --name=4kwrite --ioengine=libaio --directory=/root/fio --blocksize=4k --readwrite=write --filesize=1G --end_fsync=1 --numjobs=4 --iodepth=128 --direct=1 --group_reporting
4kwrite: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.25
Starting 4 processes
4kwrite: Laying out IO file (1 file / 1024MiB)
4kwrite: Laying out IO file (1 file / 1024MiB)
4kwrite: Laying out IO file (1 file / 1024MiB)
4kwrite: Laying out IO file (1 file / 1024MiB)
Jobs: 2 (f=2): [_(1),W(1),F(1),_(1)][100.0%][w=668KiB/s][w=167 IOPS][eta 00m:00s]
4kwrite: (groupid=0, jobs=4): err= 0: pid=1197755: Tue Aug  9 07:17:00 2022
  write: IOPS=1502, BW=6012KiB/s (6156kB/s)(4096MiB/697701msec); 0 zone resets
    slat (usec): min=2, max=4264.8k, avg=299.39, stdev=19604.64
    clat (usec): min=314, max=13905k, avg=338931.49, stdev=871262.07
     lat (usec): min=332, max=13905k, avg=339230.98, stdev=872110.35
    clat percentiles (msec):
     |  1.00th=[    5],  5.00th=[   15], 10.00th=[   16], 20.00th=[   18],
     | 30.00th=[   20], 40.00th=[   22], 50.00th=[   26], 60.00th=[   35],
     | 70.00th=[   52], 80.00th=[  144], 90.00th=[ 1133], 95.00th=[ 2039],
     | 99.00th=[ 4329], 99.50th=[ 5134], 99.90th=[ 7819], 99.95th=[ 9060],
     | 99.99th=[10134]
   bw (  KiB/s): min=   32, max=113816, per=100.00%, avg=13246.63, stdev=5870.58, samples=2533
   iops        : min=    8, max=28454, avg=3311.40, stdev=1467.64, samples=2533
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.05%, 4=0.74%, 10=1.71%, 20=30.05%, 50=36.63%
  lat (msec)   : 100=9.51%, 250=2.36%, 500=2.31%, 750=2.83%, 1000=2.58%
  lat (msec)   : 2000=6.12%, >=2000=5.11%
  cpu          : usr=0.08%, sys=0.20%, ctx=49168, majf=0, minf=276
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,1048576,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: bw=6012KiB/s (6156kB/s), 6012KiB/s-6012KiB/s (6156kB/s-6156kB/s), io=4096MiB (4295MB), run=697701-697701msec

Disk stats (read/write):
  sdb: ios=2890/77596, merge=401/1001812, ticks=90234/23409767, in_queue=23500000, util=92.21%

LnxBil · Aug 9, 2022

With 6 MB/s, the SSD is on the lower end of the consumer spectrum (as you can see the list on the page) and therefore the reason for the I/O problems.

otto001 · Aug 9, 2022

Thanks for the clarification - and damn.
I will have to break into my piggy bank for new ssds....

LnxBil · Aug 9, 2022

otto001 said:
Thanks for the clarification - and damn.
I will have to break into my piggy bank for new ssds....

I'm very satisfied with my used enterprise SSD from Samsung. I use 120, 240 and 480 GB versions. Just look at ebay.

otto001 · Aug 9, 2022

which models do you exactly use (480GB)?
I think I will have to split up a little bit. luckily this raid-controller still has 2 ports available...

LnxBil · Aug 9, 2022

otto001 said:
which models do you exactly use (480GB)?

Samsung PM863 and SM863 (depending on the use case). Both are relatively cheap on ebay and work great (at work, we have also the 960GB model running for over 10 years without any incident)

otto001 · Aug 9, 2022

Thanks! Just ordered two used SM863. Usually I do never buy used harddisks. will see, if the new disks help to avoid those lags. As I am also running my firewall and home automation on this host, they have been really, really annoying from time to time...

[7.2-7] IO problems

Well-Known Member

Well-Known Member

Distinguished Member

Well-Known Member

Well-Known Member

Distinguished Member

Well-Known Member

Well-Known Member

Distinguished Member

Well-Known Member

Distinguished Member

Well-Known Member

Distinguished Member

Well-Known Member

Distinguished Member

Well-Known Member

Distinguished Member

Well-Known Member

Distinguished Member

Well-Known Member

We value your privacy