Extrem low IO-Performance (Fsync/Seconds)

DrStone

New Member
Nov 22, 2014
3
0
1
Hi @all.

I have IO problems with my Proxmox Server. If a single VM has a peak, for the whole system IO-wait increases extremly.

At the beginning there where only one VM, so I cannot say if it were slower with the proxmox-upgrades (starting 3.0) or with more VMs coming.

Now the system is most of the time extrem slow, so I look a bit deeper and found:

#1 software raid are not supported - damn - don't realize at the beginning
#2 Ext4 has sometimes poor Performance
#3 I have absymbal fsync/sec values

So, #1 is bad, but I cannot change immediately (online hosted server), so I hope it was not the main reason for the bad IO performance.

#2 seems slower than ext3, but not soo much as here, right?

And the main reason for #3 I found, was wrong disk alignment. So I checked this and get different results:

parted says the alignment should be OK:
Code:
root@abba:/# parted -s /dev/sda unit s print
Model: ATA TOSHIBA DT01ACA3 (scsi)
Disk /dev/sda: 5860533168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start    End          Size         File system  Name  Flags
 3      2048s    4095s        2048s                           bios_grub
 1      4096s    528383s      524288s                         raid
 2      528384s  5860533134s  5860004751s                     raid
The build in alignment test were successfull also.

but with fdisk it seems wrong:
Code:
root@abba:/# fdisk -c -u -l /dev/sda
WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.
Disk /dev/sda: 3000.6 GB, 3000592982016 bytes
256 heads, 63 sectors/track, 363376 cylinders, total 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000


   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1  4294967295  2147483647+  ee  GPT
Partition 1 does not start on physical sector boundary.
Maybe because fdisk don't support GPT?

What is the correct result?


Here is my pveperf output:

Not running a VM:
Code:
root@abba:/# pveperfCPU BOGOMIPS:      54397.28
REGEX/SECOND:      1734493
HD SIZE:           4.96 GB (/dev/mapper/vg0-abba_root)
BUFFERED READS:    179.83 MB/sec
AVERAGE SEEK TIME: 6.55 ms
FSYNCS/SECOND:     531.51
DNS EXT:           64.11 ms

With running some VMs, but no traffic/workload on the VMs
Code:
root@abba:/# pveperf
CPU BOGOMIPS:      54402.00
REGEX/SECOND:      1581485
HD SIZE:           4.96 GB (/dev/mapper/vg0-abba_root)
BUFFERED READS:    174.91 MB/sec
AVERAGE SEEK TIME: 9.88 ms
FSYNCS/SECOND:     9.86
DNS EXT:           57.10 ms



Here are my versiondump:
Code:
root@abba:/# pveversion -v
proxmox-ve-2.6.32: 3.3-138 (running kernel: 2.6.32-20-pve)
pve-manager: 3.3-5 (running version: 3.3-5/bfebec03)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-33-pve: 2.6.32-138
pve-kernel-2.6.32-30-pve: 2.6.32-130
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.3-3
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-25
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

Here my running VMs (only one really used, the others are very low frequented):
Code:
root@abba:/# qm list|grep running
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
       100 XXXX           running    6144               4.00 8241
       101 XXXX       running    512                1.00 8409
       103 XXXX          running    1024               5.00 8548
       104 XXXX             running    768                1.00 8646
       107 XXXX               running    3072               1.00 8742
       108 XXXX               running    512                1.00 10803

Can someone help me to find a possible reason for this?


:thorsten
 
about fdisk, yes , don't use it, use parted. (alignment is ok).

Is it raid hardware with cache (how man disk) or a single disk (500 fsync/s is not bad for 1disk)

about your vm, what is the cache mode for your disk ?


can you do "#iostat -x 1" with and without vms running ?

also check if your are not swapping (#free -m )
 
Hi spirit, thanks for your help.

about fdisk, yes , don't use it, use parted. (alignment is ok).
great news, thanks.

Is it raid hardware with cache (how man disk) or a single disk (500 fsync/s is not bad for 1disk)

It's a software raid 1. I saw mostly values beyond the 2000, so I thought something is wrong here.
So if 500 fsync/s without vms is ok/normal for my setup and with running vms the value decrease to 12, it should be a problem of
something the vms do, right? But this was the first point I look, but no process runs crazy :confused:

about your vm, what is the cache mode for your disk ?
No Cache - I read it is the best value for performance, right?

can you do "#iostat -x 1" with and without vms running ?

Its difficult to give you meaningfull values here, cause it changes much.
Here is a peak of the output with VMs running:
Code:
Linux 2.6.32-20-pve (abba.buss-networks.de)     11/25/2014     _x86_64_    (8 CPU)


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.67    0.00    2.28    3.27    0.00   90.78


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              15.18     1.18  343.56   26.62 20003.53   292.79   109.66     1.80    4.87    4.70    6.99   1.08  40.08
sdb              27.39     4.47    4.79  321.22   143.96 19358.30   119.64     2.52    7.74    2.29    7.82   0.51  16.48
md0               0.00     0.00    0.85    0.00     3.40     0.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
md1               0.00     0.00   90.00   27.80  1073.19   292.79    23.19     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00   35.74    5.36   155.07    22.84     8.66     0.12    2.85    2.19    7.26   0.32   1.33
dm-1              0.00     0.00    0.60    0.00     2.41     0.00     8.00     0.00    5.74    5.74    0.00   5.72   0.34
dm-2              0.00     0.00    4.13    1.42    29.68    12.32    15.14     0.03    5.41    4.41    8.29   2.11   1.17
dm-3              0.00     0.00    1.65    0.23     6.61     0.94     8.00     0.24  128.94    3.55 1011.14   2.54   0.48
dm-4              0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.01    9.37    9.37    0.00   9.32   0.56
dm-5              0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    5.05    5.05    0.00   5.02   0.30
dm-6              0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    4.81    4.81    0.00   4.77   0.29
dm-7              0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    5.26    5.26    0.00   5.23   0.31
dm-8              0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    5.08    5.08    0.00   5.05   0.30
dm-9              0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    4.48    4.48    0.00   4.44   0.27
dm-10             0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    4.96    4.96    0.00   4.93   0.30
dm-11             0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    4.64    4.64    0.00   4.58   0.28
dm-14             0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    4.64    4.64    0.00   4.55   0.27
dm-15             0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    4.12    4.12    0.00   3.99   0.24
dm-16             0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    7.31    7.31    0.00   7.15   0.43
dm-20             0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    3.83    3.83    0.00   3.75   0.23
dm-21             0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    7.68    7.68    0.00   7.54   0.45
dm-22             0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    3.98    3.98    0.00   3.90   0.23
dm-23             0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    4.07    4.07    0.00   4.00   0.24
dm-24             0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.00    4.01    4.01    0.00   3.92   0.24
dm-27             0.00     0.00    0.60    0.00     2.40     0.00     8.00     0.01    9.37    9.37    0.00   8.97   0.54
dm-28             0.00     0.00   30.15   12.44   787.82   135.48    43.36     0.48   11.32   14.57    3.44   5.85  24.91
dm-12             0.00     0.00    0.60    0.00     2.40     0.00     8.07     0.00    7.11    7.11    0.00   7.00   0.42
dm-13             0.00     0.00    0.72    0.00     2.89     0.00     7.97     0.00    4.45    4.45    0.14   4.34   0.31
dm-18             0.00     0.00    1.01    0.70    11.81     6.00    20.85     0.01    7.55    8.98    5.48   6.86   1.17
dm-19             0.00     0.00    0.91    7.35     8.51   113.01    29.44     0.12   13.99    9.89   14.49  13.62  11.25
dm-25             0.00     0.00    0.76    0.15     6.80     0.83    16.89     0.01   12.72   11.38   19.69  11.27   1.02
dm-26             0.00     0.00    0.68    0.00     3.30     0.00     9.69     0.01    7.84    7.82   29.25   7.69   0.52

values without running VMs I can add tonight, when I can shutdown the vms.


also check if your are not swapping (#free -m )

No swapping, RAM is available, also in the vms:
Code:
             total       used       free     shared    buffers     cached
Mem:         15730       9707       6023          0         12        142
-/+ buffers/cache:       9551       6178
Swap:         4095          0       4095

Maybe my problem is uptime-related. The Server was freezed tonight and uptime is now 6h. Everything seems OK, even pveperf shows me 239 Fsyncs/s while running all vms (yesterday and before 12-25 fsync/s).

:thorsten