Proxmox very slow (IO, reproducible?)

xkuba · Nov 13, 2009

Hi,

we're evaluating Proxmox (OpenVZ containers) and found a strange issue.

I've run several tests with postgresql's benchmarking tool pgbench. Below are my findings. Maybe it's similiar issue to the issue mentioned in other threads.

1) When I run pgbench on the host itself it's always very slow, it's running at 10-20% of expected performance.

2) When I run pgbench in OpenVZ container, it's usually slow (cca 20% of expected performance) but sometimes I get the expected performance for the hardware it's running on.

We're running several servers with postgresql and never seen anything similar. We have server with nearly same hardware which is running ok (with pure debian).

Can anyone else reproduce this?

I'll try non-proxmox kernel next week - any other suggestions how to investigate further?

How to reproduce(?):

Server - 4 cores, 4GB RAM - I guess that 2 cores and say 1GB also demostrate the issue (but haven't tried)

Install postgresql 8.3 from debian package (postgresql-8.3, postgresql-contrib-8.3).

Edit

/etc/postgresql/8.3/main/postgresql.conf

shared_buffers = 512MB
fsync = off # uncomment this line

Run as root:

sysctl kernel.shmmax=600000000

/etc/init.d/postgresql-8.3 restart

su postgres

cd /usr/lib/postgresql/8.3/bin

./createdb pgbench
./pgbench -i -s100 pgbench

And now repeat this test several times:

./pgbench -c50 -t200 pgbench

From time to time restart postgresql

/etc/init.d/postgresql-8.3 restart

Sometimes the first run after the restart is ok performance wise. But only in container, never in host.

Expected result for modern multicore server is 1000-2000tps. But most of the time I was getting 200-400tps.

pgbench is known for irreproducible results so anything higher than 1000tps is good and below 500tps is serious problem (on modern hardware with ram and fsync off).

Looking at vmstat I think that the issue is IO connected.

I've tried both cfq and deadline IO schedulers but the result is the same.

bonnie++ performance on host is ok, in container is lower (why? simfs overhead?).

For the test, I've used one sata disk - but it shouldn't matter because fsync=off should have made everyhing to go through kernel buffers/cache...

xkuba · Nov 14, 2009

The issue is much worse when using stock debian kernel - maybe it's problem in hardware/drivers. Will investigate and report back. I'll install stock debian on this machine...

dietmar · Nov 14, 2009

xkuba said:
The issue is much worse when using stock debian kernel - maybe it's problem in hardware/drivers. Will investigate and report back. I'll install stock debian on this machine...

What debian kernel do you test exactly?

dietmar · Nov 14, 2009

And what is the output of

# pveperf

xkuba · Nov 15, 2009

Hi Dietmar,

thanks for your response.

Output of pveperf:

CPU BOGOMIPS: 22678.03
REGEX/SECOND: 903607
HD SIZE: 94.49 GB (/dev/pve/root)
BUFFERED READS: 99.26 MB/sec
AVERAGE SEEK TIME: 8.93 ms
FSYNCS/SECOND: 716.72
DNS EXT: 24.06 ms
DNS INT: 0.68 ms (xxx)

The debian kernel I've tried to use instead of 2.6.24-8-pve to test pgbench on host was 2.6.26-2-amd64. I did not change any other setting, only changed kernel.

The results are really strange for me:

2.6.26-2-amd64 on host (proxmox) ~100tps
2.6.24-8-pve o host ~200-500tps
openvz guest (lenny from proxmox template) ~200-500tps, sometimes 1000-2000tps

Expected result for me is 1000-2000tps (and more) on this hardware.

I'd would uderstand the results if they were exactly backwards...

[...more testing...]

I've tried the same test but inicialized test database with:

pgbench -i -s10 pgbench

So that the dataset is 10x smaller, that means 160MB vs 1.6GB.

Now, on host I get results >2000tps (sometimes slower but that's ok with almost default postgresql.conf). I get the similar numbers on guest.

Why there is a difference between 160MB and 1.6GB dataset size? Both should perfectly fit in kernel buffer cache (machine has 4GB RAM). There should be no real io involved in this test (fsync=off in postgresql.conf).

From watching vmstat output the difference is that when running with larger dataset, there is 100% iowait and almost no other activity (including reads/writes to disk):

vmstat 1

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 54 0 2347928 28648 1263592 0 0 1064 0 142 3325 2 1 0 97
1 53 0 2345488 28648 1265856 0 0 1080 0 141 2982 2 1 0 97
0 54 0 2343028 28648 1268196 0 0 1168 0 148 3067 2 1 0 97
0 54 0 2328776 28648 1270828 0 0 1136 4168 168 3277 4 1 0 95
0 51 0 2329004 28648 1273552 0 0 1072 0 145 3340 1 2 0 97
0 52 0 2326240 28648 1275732 0 0 1072 0 135 3203 1 1 0 97
0 53 0 2323428 28648 1278044 0 0 1112 0 140 3275 2 2 0 96

How is configured proxmox regarding kernel buffer cache? Is there any setting somewhere? How is working kernel buffer cache in openvz environment? Say I have 4GB RAM, I limit openvz guest machine to use 2GB RAM. How much buffer cache can use "process in guets"? Only free RAM upto 2GB or is it shared resource among all containers so it's theoretically up to 4GB?

I know this thread is more and more getting into how to optimize postgresql on proxmox question but maybe others and other use cases might also benefit.

Actually, the pgbench test was meant only as a proof-of-concept test of proxmox+openvz+postgresql suitability before we dig deeper, e.g. try our production environment (self-compiled version of postgresql, tweaked postgresql.conf, xfs without lvm as a filesystem for database etc). Maybe we reached that point now. But still, the results with bigger dataset are puzzling me.

Kuba

udo · Nov 15, 2009

Hi xkuba,
i made the same test, but only direct on the host.
Here my results for comparision:

Code:

...
9990000 tuples done.
10000000 tuples done.
set primary key...
HINWEIS:  ALTER TABLE / ADD PRIMARY KEY erstellt implizit einen Index »branches_pkey« für Tabelle »branches«
HINWEIS:  ALTER TABLE / ADD PRIMARY KEY erstellt implizit einen Index »tellers_pkey« für Tabelle »tellers«
HINWEIS:  ALTER TABLE / ADD PRIMARY KEY erstellt implizit einen Index »accounts_pkey« für Tabelle »accounts«
vacuum...done.
postgres@bigproxmox:/usr/lib/postgresql/8.3/bin$ ./pgbench -c50 -t200 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 50
number of transactions per client: 200
number of transactions actually processed: 10000/10000
tps = 699.752358 (including connections establishing)
tps = 703.655688 (excluding connections establishing)
postgres@bigproxmox:/usr/lib/postgresql/8.3/bin$ ./pgbench -c50 -t200 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 50
number of transactions per client: 200
number of transactions actually processed: 10000/10000
tps = 1716.904140 (including connections establishing)
tps = 1740.707364 (excluding connections establishing)
postgres@bigproxmox:/usr/lib/postgresql/8.3/bin$ ./pgbench -c50 -t200 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 50
number of transactions per client: 200
number of transactions actually processed: 10000/10000
tps = 531.975590 (including connections establishing)
tps = 534.224306 (excluding connections establishing)
postgres@bigproxmox:/usr/lib/postgresql/8.3/bin$ ./pgbench -c50 -t200 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 50
number of transactions per client: 200
number of transactions actually processed: 10000/10000
tps = 1008.464445 (including connections establishing)
tps = 1016.658250 (excluding connections establishing)

After printing the result the raid-disk are very busy (app. 40 sec.) - if i start the second benchmark directly behind one, i got also 200 tps.

The test was made on a 4core machine with 4GB ram and a fast raid-controller, but slow disks and the new testing-kernel (2.6.24-9-pve). I don't restart the database between the tests.

Code:

CPU BOGOMIPS:      24082.13
REGEX/SECOND:      614078
HD SIZE:           94.49 GB (/dev/pve/root)
BUFFERED READS:    203.51 MB/sec
AVERAGE SEEK TIME: 11.82 ms
FSYNCS/SECOND:     3331.46
DNS EXT:           133.78 ms
DNS INT:           67.98 ms

Hope it's help

Udo

xkuba · Nov 15, 2009

udo said:
After printing the result the raid-disk are very busy (app. 40 sec.) - if i start the second benchmark directly behind one, i got also 200 tps.

This is expected - postgresql is deffering some work (autovacuum, bgwriter).

Thanks for you input!

xkuba · Nov 15, 2009

After more testing:

Now, I've got good performance on host system: I'm using xfs with noatime for database files (/var/lib/postgresql). So I've eliminated lvm, ext3, simfs from the test. The results of benchmarks above are approximatly 3000tps for small dataset and 2000tps for big dataset. These are the expected numbers. So far so good.

But performance in openvz container is still greatly varying and much worse. Small dataset <2000tps, big dataset <1000tps, usually ~200tps. Vmstat output of the problematic runs is still the same as shown above - 100% iowait, not much disk reads nor writes.

I've mounted xfs partition using mount --bind on host. In container I see this:

/dev/sdb1 on /mnt/data type xfs (rw,noatime,relatime,noquota)

My questions are:

1) How to mount xfs into guest container with noatime (without relatime)? I've tried mount --bind -o noatime but relatime is still there. I've tried vzctl set 104 --noatime yes --save but nothing changed.

2) How to use block device exclusivelly in a container?

I've tried this:
vzctl set 104 --devnodes sdb:rw --save
vzctl set 104 --devnodes sdb1:rw --save
But then mount command in container fails with:

mount: unknown filesystem type 'xfs'

I can't see xfs in /proc/filesystems - how to enable filesystem/kernel module in container?

There is working xfs in host system.

3) I'm still not sure how kernel buffer cache in openvz environment works - is it global for all containers or local to each one?

4) Is there a template for 64bit lenny openvz container? This is another difference between host and guest.

5) Any other performance tips regarding IO in openvz? Is it possible to disable quotas, auditing/accounting etc if I don't need it?

Thank you very much.

Kuba

xkuba · Nov 15, 2009

After some more poking I've finally got accataptable performance from proxmox+openvz+postgresql. The bottom line is that not optimized postgresql runs much better on host than in container. For properly tunned postgresql the difference is not measurable (using my benchmark).

Running postgresql in 64bit container is 10% faster than in 32bit container.

I don't know why running not tunned postgresql in container is so much slower than running it on host - maybe it has something to do with cpu/io scheduling and/or caching...

Still I'd like to know answers for my questions above.

So far I discovered this:

ad 1) Doesn't seem to be issue - but still would like to know

ad 3) From what I've read buffer cache is common to all containers - can someone confirm it?

ad 4) http://wiki.openvz.org/Download/template/precreated

tom · Nov 16, 2009

xkuba said:
...

ad 4) http://wiki.openvz.org/Download/template/precreated

for lenny 64-bit appliance, see ftp://pve.proxmox.com/appliances/system/

Search

Search

Proxmox very slow (IO, reproducible?)

xkuba

New Member

xkuba

New Member

dietmar

Proxmox Staff Member

dietmar

Proxmox Staff Member

xkuba

New Member

udo

Distinguished Member

xkuba

New Member

xkuba

New Member

xkuba

New Member

tom

Proxmox Staff Member

We value your privacy