Horrible performance after upgrade from 1.3 to 1.4

Which version of proxmox was originally released with the kernel 2.6.24-5-pve ? Was it 1.1 or 1.2 or else?


I confirm that it still happens with pve-kernel-2.6.24-7-pve_2.6.24-8_amd64.deb
 
Last edited:
2.6.24-5-pve runs since 10 hours like a charm!

I guess you are referring to pve-kernel-2.6.24-5-pve_2.6.24-6_amd64.deb?

Earlier you said that it improved the situation a bit. Over the course of 10 hours when you run the command

sar -P
(assuming you installed systats), do you still see load spikes?
 
Last edited:
I am not so sure that I am seeing much improvements. I seem to still see some spikes...

However my sap -P command also seems to have stopped at the last reboot.... I tried to uninstall and re-install and even reboot the server but it still shows only the stats up to when I last changed the kernel to pve-kernel-2.6.24-5-pve_2.6.24-6_amd64.deb

So I do not have a stat history to be able to view how the server has been doing.

This happened to be once before last week and then it started working again on its own... not sure what is going on.
 
i would love to see an official statement.
there are several threads with the same issue.
on some hosts the raid controllers do have so much performance that i did not even recognize the problem.

the host, which i downgraded yesterday still works. no complaints anymore.
now i do not knwo if i should downgrade the other machine that can bear the i/o peaks but need hours to backup a VM, or wait for a fix.
 
i would love to see an official statement.
there are several threads with the same issue.

I guess about five users reported the problem so far. But nodody found a way to reliable reproduce the behaviour, so i cant test and fix it.

now i do not knwo if i should downgrade the other machine that can bear the i/o peaks but need hours to backup a VM, or wait for a fix.

So you can reproduce the behaviour with backup? What kind of backup do you use - vzdump?
 
I guess about five users reported the problem so far. But nodody found a way to reliable reproduce the behaviour, so i cant test and fix it.
The question still is, if there is a configuration issue or a general problem.
I feel relieved that the one host is working again, because i really got into trouble with the users.
maybe one could try to install a latest proxmox installation without a raid and maybe a slow external USB disk to see if the problem comes up?
maybe some windows and linux KVMs for testing

So you can reproduce the behaviour with backup? What kind of backup do you use - vzdump?
don't take this backup issue too serious, after the problems with that slow host i checked the i/o delays on other hosts and the vzdump backup appeared slower.
i took about 5 hours for a hosting panel VM, which called my attention

don't get me wrong dietmar, i know how much work you guys put into this, but if there really is a performance bug, then it would be a serious issue for virtualization!
 
don't get me wrong dietmar, i know how much work you guys put into this, but if there really is a performance bug, then it would be a serious issue for virtualization!

Again, that is why i am asking how to reproduce the bahaviour. If 5 people have serious performance problems, it should be possible that one of you find a way to reproduce the behaviour?

I can't fix something that works for me.
 
Again, that is why i am asking how to reproduce the bahaviour. If 5 people have serious performance problems, it should be possible that one of you find a way to reproduce the behaviour?

I can't fix something that works for me.

Good point Dietmar,

However I would like to clarify a couple of points. We have that issue on a production server that only runs OpenVZ and no KVM. We have not noticed any IO issue at all as we run Adaptec Raid with 512MB which at no load gives us between 500 - 600 MB/s bandwidth. Our issue is clearly caused by many processes which seem to be more mysql that spike every now and then. The only issue we have seem to be this load spike problems. Our IO delays is usually 0.05% rarely goes up.

So it is possible that something is not managing the processes very well or so.


In order to reproduce this issue I think you would need to setup a test environment with 10 OpenVZ vps that are live and get traffic. Maybe it is time to create such a benchmark environment. You can install some templates like wordpress, sugarcrm etc that use databases and run some queries that would heavily use database as well as hit the VPS with traffic (maybe use the ab tool to call a page that makes an intensive database call) . This is in my view the proper way to emulate a production environment.

We are at the verge of losing customers because of these spikes that slow down all the other VPSs and until you can really build load testing environments you might never get proxmox to be seriously considered again to use in production environments but a lot of traffic.

I am extremely frustrated right now because it is not even easy to migrate some VPS to another host. Some of our users pointed their domains to their IPs. a few days ago I was trying to migrate a couple of VPSs to the previous host so I could test there instead of testing on a production server entirely but it was not picking up the IPs. Although that host has the proper values for vmbr0.0 and vmbr0.1 which has the IP ranges of the VPS.

This was very frustrating that two proxmox servers cannot pickup the IPs properly. Now if I migrate a VPS to the other host I have to change its IP to put it in a range that would be defined only on the other host and not on the main one. I rebooted that old host many times, it comes online, the interface settings are right but you cannot ping the VPS.

Let me know your thoughts.
I wonder if we should just go back to straight OpenVZ...
 
In order to reproduce this issue I think you would need to setup a test environment with 10 OpenVZ vps that are live and get traffic. Maybe it is time to create such a benchmark environment.

I alread have such environment. But as I said before, I cant reproduce the behaviour.
 
I shut down the Windows XP VM on a bigger machine and like i thought the vzdump backup took only 2 hrs instead of 5 hrs.

i have 3 proxmox servers running and two have the latest build installed. two servers have this problem.
the bigger machine can cope with the i/o loads because of the better hardware.
on the windows VMs, the swap file is deactivated.

edit: on the third machine i have " pve-kernel-2.6.24-7-pve" running with three windows 2003 KVMs and no delays...
 
Last edited:
Please run:

Code:
ps aux | grep " D"

It will show you the processes in "uninterruptible sleep" state (i.e. waiting for IO).
Each of such processes will add one to the load average; these processes will (usually) not take much CPU, so may not be obvious to spot in htop/top.

For example, it can return:

Code:
# ps aux | grep " D"
root       998  0.0  0.0      0     0 ?        D<   Oct24   0:14 [kjournald]
 
I was 1.2. But I just uploaded the version release with 1.1 again

ftp://pve.proxmox.com/debian/dists/lenny/pve/binary-amd64/pve-kernel-2.6.24-2-pve_2.6.24-5_amd64.deb

Please can you test that too?

@Dietmar,

I just uploaded the pve-kernel-2.6.24-2-pve_2.6.24-5_amd64.deb kernel and rebooted the production server and as soon as the server was up (I was working through kvm), nether the server, nor the containers were reachable over the internet. I could ping the server IP but I could not load the promox interface and I could not load any website on any container.

I wonder if this is an incompatibility between that kernel and proxmox 1.4...

So I reverted to Proxmox 1.2 kernel for now....
 
I wonder if this is an incompatibility between that kernel and proxmox 1.4...

No, it work here. Maybe the hardware is incompatible - did you run pve 1.1 on this host before?

So I reverted to Proxmox 1.2 kernel for now....

Testing this kernel would be quite helpful, because thats the one which comes with 1.1.

Anyway, I have run almost any benchmarks I know in the past few days, and I cant meassure any slowdown.
 
@dietmar

This host never ran proxmox 1.1. It was purchased and proxmox 1.3 was installed on it and then upgraded to 1.4 beta and then to 1.4.

The previous host was running 1.1 and was then updated to 1.4. Today I am going to re-install 1.2 on the previous host and hope that the vps are reachable. Then I will transfer some VPSs from this host into it and check again the performance issue.
 
Well, it's been some time since I made the change. I'm now running the prior kernel and everything is working just fine as before. This confirms to me there is indeed a problem with the latest kernel in 1.4. Some notes:

Code:
vms1:~# uname -a
Linux vms1 2.6.24-7-pve #1 SMP PREEMPT Fri Aug 21 09:07:39 CEST 2009 x86_64 GNU/Linux
Code:
vms1:~# uptime
 10:34:14 up 17 days, 10:10,  1 user,  load average: 0.49, 0.34, 0.33
Now, how to proceed? Is there an official way to submit bugs/errata?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!