pve server randomly freezes

earcesilas

New Member
Jul 20, 2010
5
0
1
Hello all,

We have a problem with proxmox ve 1.5. (pveversion : pve-manager/1.5/4822) :

We have random freezes (up to ~2 min per freeze, several times a day).

During those freezes, all the system is completely frozen (the proxmox server, and all the containers). Unable to open connections (ssh/apache/...).
We're only using openvz virtualisation with debian guests (all VMs are up to date lennys) and the average load of the server (outside freezes) is under 5%.

I don't see anything relevant in the logs and .... i'm stuck :(

Do you have an idea ?

Regards,
 
A network problem? Try to monitor the network traffic during those freezes - is there any traffic at all?
 
Thanks for your answer,

It doesn't look like a network problem : We had this morning a ~6 min freeze and during this freeze we were unable to physically access the server (with keyboard and screen).

We installed rrdt scripts (executed each minute via crontab) to monitor cpu usage and here are the results :


the freeze exactly occurs from ~10:08 to ~10:14 (blank period on he graph). It means that crontabbed scripts aren't executed during the freezes.
 
No, no hint in any log.

For info, we installed the following cron job : */1 * * * * /bin/date >> /folio/crashlog
- Here's a sample of a normal output :
Wed Jul 21 15:28:01 CEST 2010
Wed Jul 21 15:29:01 CEST 2010
Wed Jul 21 15:30:01 CEST 2010
Wed Jul 21 15:31:01 CEST 2010
Wed Jul 21 15:32:01 CEST 2010
Wed Jul 21 15:33:01 CEST 2010
Wed Jul 21 15:34:01 CEST 2010
Wed Jul 21 15:35:01 CEST 2010
Wed Jul 21 15:36:01 CEST 2010
- And here's a sample of an ouput around a freeze :
Wed Jul 21 15:42:01 CEST 2010
Wed Jul 21 15:43:01 CEST 2010
Wed Jul 21 15:44:01 CEST 2010
Wed Jul 21 15:45:10 CEST 2010
Wed Jul 21 15:48:30 CEST 2010
Wed Jul 21 15:48:30 CEST 2010
Wed Jul 21 15:48:30 CEST 2010
Wed Jul 21 15:49:01 CEST 2010
Wed Jul 21 15:50:01 CEST 2010
Wed Jul 21 15:51:01 CEST 2010
Wed Jul 21 15:52:01 CEST 2010

We also have disabled the DBS -demand-based switching- of the CPUs (2 Xeons 5650) in bios but the freezes are going on.

We're now trying with a 2.6.18 kernel. Do you have any feedback about the 2.6.24-11-pve kernel and Xeon 5650 CPUs ?

Thanks for your help.
 
Do you have any feedback about the 2.6.24-11-pve kernel and Xeon 5650 CPUs ?
Er, yes, based on your post the feedback is don't use them together :)

Sorry - that wasn't very helpful at all. I have nothing good to contribute, I just couldn't resist a patch of sunshine in a very very bad day for me. I will go now and read up on proper netiquette.
 
FWIW I found 2.6.32 is a bit snappier for my KVM based machines. I don't use OpenVZ, which isn't supported by 2.6.32.
 
OK, I confirm : it's the kernel. (2.6.18-2-pve = OK; 2.6.24-11-pve = FREEZES)
Do you think it should be linked to the "PREEMPT" kerned provided by proxmox ?
 
I have just spent about a week hunting down a very similar issue.

We have a cluster with two very lightly loaded quad core xeons, 8gb ram, latest 2.6.18 proxmox kernel, 1 kvm vm, 4 openvz containers.

The problem was first detected as database backed webpages hosted in container sometimes (but rarely) loading with distinct delays, sometimes mid-page. These delays could last up to a minute, during which ssh login or even a simple "ls" in an already established ssh connection would take longer than expected. Ping responses would remain timely. A script doing a "time ssh root@host sleep 1" login every 10 seconds showed that most of the time this would take around 1.2 seconds, but sometimes considerably longer, all this while the host was 799% idle (quad core, hyperthreading) according to atop and htop.

It turns out that a java application (jruby even) running in one of the containers manages to effectively freeze the whole host for up to a minute at a time. We have disabled that application (it was non essential) and so far, so good.
 
I have a maybe related problem with one of our Proxmox/OpenVZ host machines.
Every time a client did something like this...

rsync -avvr
rsync://ftp.funet.fi/ftp/pub/mirrors/ftp.debian.org/debian/pool/non-free/n/
.
The whole host's OpenVZ clients freeze. (Not the host it self.) Not just that but manages to somehow cause so much interference that couple of other hosts close by slow to a crawl.
It's not the NIC. All ready changed that. No affect. I doubt it's the new 1GB switch.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!