pveperf LVM storage performance drop from 1.7 to 2.1pvetest

mmenaz

Renowned Member
Jun 25, 2009
835
25
93
Northern east Italy
Hi, I have a server with 2 x 500GB sata as raid1 where proxmox is installed, and 4x15Ksas raid10 as additional storage.
The system was on 1.7 version, now I've updated the MB bios, the LSI Raid controller firmware, destroyed both raid and reinstalled from scratch.
The problem is that now pvetest on both raid is 30% slower:
Code:
2620.33 now 1865.88
2744.75 now 1957.99

What could be wrong?
It WAS:

Code:
# pveversion  -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.32-4-pve

# pveperf
CPU BOGOMIPS:      34133.63
REGEX/SECOND:      767232
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    105.17 MB/sec
AVERAGE SEEK TIME: 9.75 ms
FSYNCS/SECOND:     2620.33
DNS EXT:           250.23 ms
DNS INT:           452.42 ms (mydomain.it)
# 

# pveperf /mnt/test
CPU BOGOMIPS:      34133.63
REGEX/SECOND:      797590
HD SIZE:           59.06 GB (/dev/mapper/sas--data-testhd)
BUFFERED READS:    371.24 MB/sec
AVERAGE SEEK TIME: 3.66 ms
FSYNCS/SECOND:     2744.75
DNS EXT:           112.42 ms
DNS INT:           63.21 ms (mydomain.it)

while now is:
Code:
# pveversion -v
pve-manager: 2.1-12 (pve-manager/2.1/be112d89)
running kernel: 2.6.32-13-pve
proxmox-ve-2.6.32: 2.1-71
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-13-pve: 2.6.32-71
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-45
pve-firmware: 1.0-17
libpve-common-perl: 1.0-28
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-23
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-6
ksm-control-daemon: 1.1-1
root@proxmox:~# 

root@proxmox:~# pveperf
CPU BOGOMIPS:      34129.69
REGEX/SECOND:      846053
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    87.22 MB/sec
AVERAGE SEEK TIME: 9.78 ms
FSYNCS/SECOND:     1865.88
DNS EXT:           143.12 ms
DNS INT:           195.53 ms (mydomain.it)

root@proxmox:~# pveperf /mnt/test
CPU BOGOMIPS:      34129.69
REGEX/SECOND:      862642
HD SIZE:           59.06 GB (/dev/mapper/sas--data-testhd)
BUFFERED READS:    358.05 MB/sec
AVERAGE SEEK TIME: 3.79 ms
FSYNCS/SECOND:     1957.99
DNS EXT:           137.76 ms
DNS INT:           146.19 ms (mydomain.it)

The host is up since mora than 1 day so raid sync has been completed.
The LVM storage has been partitioned with parted -a optimal /dev/sdb
Are the two pvetest version not comparable? Any tip?
 
Thanks but I have done a bare metal installation from 2.1 iso and added pvetest repo since I do need kvm 1.1.
So is all ext3 (lvm volume I've created has been formatted with ext3) and I've done the tests with the same steps (I just follow the old notes I took about it).
This is an installation where I/O performance is important, and is a dual socket server that had troubles with 64bit Win2003 guests in the past (BSOD) probably due to badly managed latencies.
So I hope that with new kvm 1.1 and new firmware things can run now (one Win2003 is a SQL server, we badly need more memory, and AWE is better than nothing but not as good as a pure 64bit addressing).
I've issued this command to check if LVM sector alignment is ok, and I've found
Code:
# pvs -o+pe_start --units s
  PV         VG       Fmt  Attr PSize       PFree       1st PE 
  /dev/sda2  pve        lvm2 a--    974643200S    33546240S   2048S
  /dev/sdb1  sas-data lvm2 a--  1755955200S 1441382400S   2048S

that is different from a (but different server) 1.9 installation I have:
Code:
proxmox:~# pvs -o+pe_start --units s
  PV         VG   Fmt  Attr PSize       PFree       1st PE 
  /dev/sda2  pve  lvm2 a-   1951391744S      8380416S    384S
  /dev/sdb   sas   lvm2 a-   1754521600S 1544790016S    384S

Wondering if the different default alignment can be the problem, or if the new firmware has problems, or proxmox has (btw I've updated to latest pvetest tonight kernel: 2.6.32-13-pve but same result).
I've no idea about how to troubleshoot the issue, what test worth do and how to interpret the results.
 
I've reinstalled 1.7 from scratch, and I got the same high original performance, so is not a matter of bios or controller settings.
Then I've upgraded to 1.9, and I had performance drop again, so is not something new to 2.x.
Wondering if is a matter of new driver or new kernel.
Code:
RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
I've seen that 1.7 had:
Code:
# modinfo  megaraid_sas
version:        00.00.04.01
parm:           poll_mode_io:Complete cmds from IO path, (default=0) (int)

while 1.9 has:
Code:
version:        00.00.05.40-rh2
parm:           poll_mode_io:Complete cmds from IO path, (default=0) (int)
parm:           max_sectors:Maximum number of sectors per IO command (int)
parm:           msix_disable:Disable MSI-X interrupt handling. Default: 0 (int)

so is it a matter of different default parameters? How can I tell which ones are passed? Is the driver bugged?
Wondering if is related with this thread:
http://forum.proxmox.com/threads/8843-Finally-stable!-LSI-Megaraid-Proxmox-2

A FSYNCS/SECOND drop from 2889.92 to 1780.17 is not something you don't notice or can be ignored.
Thanks a lot
 
My guess is that those are simply recent changes to the ext3 fs. AFAIK, they fixed a few issues, and that makes fsync slower.
 
Thanks for your answer dietmar, but that means that you have noted the issue too? Just me and you, and no one else in this forum?
Googling for "ext3 fsync slower" does not produce results that are recent (at least in top pages).
Time to switch to ext4 (that maybe has been improved in the meanwhile instead) or still better ext3?
If you are right, VM on lvm storage (no file based) should not be affected, but the ones in pve-data are (urges the possibility to leave space for lvm storage at least at install time, see http://forum.proxmox.com/threads/10...-install-and-or-minimum-free-space-on-pve-LVM).
What is your suggestion?
Thanks a lot
 
ext3 and ext4 are great, both have advantages and disadvantages. we think for the majority of our users ext3 is still the better choice, that's why we use ext3 by default in our ISO installer.

depending on your needs and storage subsystem, ext4 can also be better. so just take the best for your needs.
 
I've tested same hardware with ext3 and ext4:
ext3: FSYNCS/SECOND: 1920.67
ext4: FSYNCS/SECOND: 311.88
Doubt that I will ever need something 6 times slower :)
In any case my question is: is a LSI driver/kernel issue, or really Dietman is right and you also have noticed this drop in performances on all your installations?
I've a test pc with Adaptec controller, I will try to install 1.7 and then upgrade to 1.9 there too, and compare the results (and report back).
In virtualization, HD performances are very important, a 30% drop is not good, wondering if there is some "fine tuning" possible to make it work reliable (paramount) and fast (very important) as before.
Would hate have a server work not at it's full potential just because of my ignorance and/or misconfiguration.
Best regards
 
you need to take a deeper look on the mount options. each kernel can have different defaults (see cat /proc/mounts)

and just to note, pveperf is just a small mini benchmark tool to give a short overview. if you really want to compare different storage settings you need to do real bench-marking.
 
I've tested same hardware with ext3 and ext4:
ext3: FSYNCS/SECOND: 1920.67
ext4: FSYNCS/SECOND: 311.88
Doubt that I will ever need something 6 times slower :)

In fact, they fixed even more bugs in ext4.

All in all, I do not understand what they do, so we use ext3 for now.

xfs seems also good (fsync perf), most times better than ext4.
 
you need to take a deeper look on the mount options. each kernel can have different defaults (see cat /proc/mounts)
I've created a VM with proxmox 1.7 just to have a look at this issue, but seems there are not relevant options:
Code:
pve-manager/1.7/5323
/dev/mapper/pve-root / ext3 rw,relatime,errors=remount-ro,data=ordered 0 0

pve-manager/2.1/be112d89
/dev/mapper/pve-root / ext3 rw,realtime,errors=remount-ro,barrier=0,data=ordered 0 0
the only difference being barrier=0 (barrier disabled) in 2.1, but is an option to have maximum performance (even if a risk of data loss, but we all have UPS and BBU raid controller, don't we? ;))

and just to note, pveperf is just a small mini benchmark tool to give a short overview. if you really want to compare different storage settings you need to do real bench-marking.

Yes, but they measure some performances, and they abruptly dropped from 1.7 to 1.9. My problem is to understand if is specific to that hardware/kernel module or just "from 1.9 FSYNC index are different, don't compare with previous values".
Let's see what happens with adaptec controller.
Would love to have a pveperf --deep option to do test that are specific for performances on virtualization :)
Thanks a lot
 
I've tested with a different server, 4 sata RAID5, Adaptec 5805 with solid state BBU and performances DO NOT DROP:

Code:
root@proxmox:~# lspci | grep -i raid
04:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)

1.7:
proxmox:~# pveperf
CPU BOGOMIPS:      38399.03
REGEX/SECOND:      857571
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    255.85 MB/sec
AVERAGE SEEK TIME: 8.84 ms
FSYNCS/SECOND:     3471.91
DNS EXT:           58.26 ms
DNS INT:           30.26 ms

1.9:
proxmox:~# pveversion -v
pve-manager: 1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.32-7-pve

proxmox:~# pveperf
CPU BOGOMIPS:      38399.54
REGEX/SECOND:      892874
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    254.87 MB/sec
AVERAGE SEEK TIME: 8.83 ms
FSYNCS/SECOND:     3413.13
DNS EXT:           98.88 ms
DNS INT:           28.18 ms

2.1:
root@proxmox:~# pveversion -v
pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-11-pve

root@proxmox:~# pveperf
CPU BOGOMIPS:      38398.61
REGEX/SECOND:      959202
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    234.00 MB/sec
AVERAGE SEEK TIME: 8.83 ms
FSYNCS/SECOND:     3417.36
DNS EXT:           51.05 ms
DNS INT:           62.52 ms

So seems a bug related to LSI controller, don't know if specific to that model or whatever.
I'm not good enough with drivers to try to find a more updated one, if exists, compile and test, unfortunately.
LSI is very used, especially rebranded (i.e. Dell), hope this server will be fast enough and that future release will fix, sigh.
Btw, the server with LSI is dual socket, just in case could be the thing that triggers the driver problem, or lights you on some parameter to suggest me.
(I've also tried with deadline scheduler, but performances for the test are the same).
best regards
 
I've disabled the second CPU (two old Intel Xeon E5506 @ 2.13GHz) and pveperf has improved (but is not as good as in 1.7).
I report just in case puts some lights on the problem:
With 2 CPU was
Code:
FSYNCS/SECOND:     1890.19
With 1 CPU is:
Code:
FSYNCS/SECOND:     2407.44

sigh!
 
I've disabled the second CPU (two old Intel Xeon E5506 @ 2.13GHz) and pveperf has improved (but is not as good as in 1.7).

I have three Dell 2950's with the Perc6i (aka LSI Megaraid) and Dual Xeon E5420 @2.5Ghz
I followed some of the things mentioned here and they did help by about 15%-20% on fsync
http://kb.lsi.com/KnowledgebaseArticle16607.aspx

Near the bottom of that article that mentioned CPU Affinity with link to this: http://kb.lsi.com/KnowledgebaseArticle16667.aspx
With lots of CPU cores and dual CPU's that actually could help improve IO.
If the IO is occurring on one CPU, then for some odd reason the IO starts happening on the other CPU, data may be missing from the cache on the 2nd cpu and as such take longer to perform the IO operation.
Setting the CPU affinity will lock operations to specific CPU thus eliminating performance hit from processes floating around.

Not sure if it will help but it might be worth seeing if CPU affinity helps or not.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!