i/o problem

bazzi · Jan 2, 2012

For extra information:
On the nodes in our cluster never was there any usb drive connected. We installed proxmox trough cd trough the remote KVM option in the IMS.

udi · Jan 2, 2012

i have an usb stick attached to the server just to be able to reinstall squeeze remotely when everything breaks.

can this be the problem?

now my server hangs every week

not sure but as i remember mostly on sundays

e100 · Jan 2, 2012

bread-baker said:
after reading https://www.multifake.net/2011/12/debian-squeeze-lvm-udev-etc-buggy/ , I wonder if the issue is related to having a usb drive attached... I often have a 500gb drive , formatted ext3 attached to proxmox 2.0 test servers. [we use to transport pve dumps off site ].

I doubt USB is an issue, the problem that multifake.net mentioned is that his USB disk seems to stop working.
If IO is stalled to any disk, USB or not, that could cause this problem.

udi · Jan 2, 2012

i'm afraid i will have to go back to good-old 1.x/lenny soon because i use this in (not so important but) production and i'm tired of this error.
if i read the linked info correctly only squeeze is affected.

bread-baker · Jan 3, 2012

e100 said:
I doubt USB is an issue, the problem that multifake.net mentioned is that his USB disk seems to stop working.
If IO is stalled to any disk, USB or not, that could cause this problem.

I also doubt that having a usb disk attached is the cause of the issue.

The still undetermined cause of the issue could be something which has already been fixed in Debian testing [ wheezy ].

bazzi · Jan 12, 2012

Today again a problem. The lvremove hangs...

udi · Jan 12, 2012

me too, yesterday..

i will go back to lenny but i have questions first:

1. i will install lenny on a new partition (i have a spare raid volume just for this).
my vms have lvm storages, these are on a separate raid volume.
will i be able to directly add these volumes to 1.9 without creating/restoring backup?

2. my backup disk has ext4 filesystem on it.
can this be mounted to lenny or i need to format it with ext3?

thanks
u

bazzi · Jan 14, 2012

today the backup went fine. I have delayed the backup of proxmox to 8:00. Then the VM's have done there backup task.

So for more information:
The VM's backup on the same NAS user files every day.
The cluster will backup 3 times a week the complete VM's.

It looks like if we have running the backup of the user files AND complete VM we will get the problems.

But this is just a hunch, but today it worked. I will get you posted if it was just luck or it was indeed a problem of running to much backups on the same time...

psokolovas · Jan 18, 2012

We can confirm that the new kernel 2.6.32-6-pve has i/o issues. We are using software raid10 built from SATA hdds since forever, never was a problem. On the new 2.0 beta the loaded server stopped i/o response in 4-8 hours, load goes up (100,200,1000,...) until all system hangs and stops responding completely. The only i/o requests being services are from disk cache in RAM. Output blocks completely. Reverting to kernel 2.6.32-4-pve solves this issue, however we lack new RH kernel features in it. All I want to say - there are i/o issues in this kernel.

EDIT: We use new Intel Core i7 eXtreme 3.3GHz 15MB cache socket 2011 and 32GB RAM

bazzi · Jan 18, 2012

Ok when I read more on the internet it could be a bug related to the type of CPU.

So what kind of CPU's are you using?
We dual Intel(R) Xeon(R) CPU L5630 @ 2.13Ghz in our IMS blades.

Maybe we can find some common grounds an pinpoint it to the I/O problem.

udi · Jan 18, 2012

i have 2x xeon x5450 cpu.

psokolovas said:
Reverting to kernel 2.6.32-4-pve solves this issue, however we lack new RH kernel features in it.

does this mean that i can install 2.6.32-4-pve on my 2.0 node?

psokolovas · Jan 18, 2012

udi said:
i have 2x xeon x5450 cpu.

does this mean that i can install 2.6.32-4-pve on my 2.0 node?

Kernel - yes. However You will need to create and copy CT config files with 1.x version. Or You will get crazy resources available to CTs

udi · Jan 18, 2012

psokolovas said:
Or You will get crazy resources available to CTs

what do you mean?

i'm not a guru, could you please give step-by-step instructions what to do?

thanks

psokolovas · Jan 20, 2012

Additionally, today we noticed another problem, but now with kernel 2.6.32-4-pve and LVM snapshot on 2.0 beta. All VG stopped responding. Before VG stopped, it was lvremove command issued, and we got this:

device-mapper: remove ioctl failed: Device or resource busy
Unable to deactivate pve-backup1104-cow (254:5)
Failed to resume backup1104.
libdevmapper exiting with 1 device(s) still suspended.
Node /dev/mapper/pve-backup1104-cow was not removed by udev. Falling back to direct node removal.

After that VG stopped responding.

udi · Jan 20, 2012

then, using 2.6.32-4 is not a solution

psokolovas · Jan 20, 2012

udi said:
then, using 2.6.32-4 is not a solution

Not exactly. When using kernel 2.6.32-6-pve, the system was losing i/o every 4-8 hours, when not using LVM snapshot. Reverting to kernel 2.6.32-4-pve solved this problem, system was stable. This means kernel 2.6.32-6-pve has i/o issues, at least on the software raid (or LVM on top of software raid) level.

The fact that LVM Snapshot removal forced all VG to stop responding means that this is more likely new LVM utilities problem. I will try to use 2.6.18-6-pve on PVE 2.0 beta and will snapshot the LVM. This kernel worked for months with LVM backups, flawlessly. If VG will stop responding again, this is not a kernel issue. If it will - this is LVM utilities issue

obrienmd · Jan 22, 2012

FYI, seems somewhat similar to our issue in last 1/2 of this thread: http://forum.proxmox.com/threads/2649-KVM-machines-very-slow-unreachable-during-vmtar-backup

frantek · Jan 24, 2012

The same problem here with 2.6.35-2-pve but "only" sometimes while backing up to a USB disk.

udi · Jan 24, 2012

frantek said:
The same problem here with 2.6.35-2-pve but "only" sometimes while backing up to a USB disk.

on squeeze?

frantek · Jan 24, 2012

Sorry, no, I'm still on 1.9. Found this thread by forum search and overlooked "2.0".

i/o problem

Active Member

Active Member

Renowned Member

Active Member

Member

Active Member

Active Member

Active Member

psokolovas

Guest

Active Member

Active Member

psokolovas

Guest

Active Member

psokolovas

Guest

Active Member

psokolovas

Guest

Member

Renowned Member

Active Member

Renowned Member

We value your privacy