i/o problem

For extra information:
On the nodes in our cluster never was there any usb drive connected. We installed proxmox trough cd trough the remote KVM option in the IMS.
 
i have an usb stick attached to the server just to be able to reinstall squeeze remotely when everything breaks.

can this be the problem?

now my server hangs every week :( not sure but as i remember mostly on sundays
 
after reading https://www.multifake.net/2011/12/debian-squeeze-lvm-udev-etc-buggy/ , I wonder if the issue is related to having a usb drive attached... I often have a 500gb drive , formatted ext3 attached to proxmox 2.0 test servers. [we use to transport pve dumps off site ].

I doubt USB is an issue, the problem that multifake.net mentioned is that his USB disk seems to stop working.
If IO is stalled to any disk, USB or not, that could cause this problem.
 
i'm afraid i will have to go back to good-old 1.x/lenny soon because i use this in (not so important but) production and i'm tired of this error.
if i read the linked info correctly only squeeze is affected.
 
I doubt USB is an issue, the problem that multifake.net mentioned is that his USB disk seems to stop working.
If IO is stalled to any disk, USB or not, that could cause this problem.

I also doubt that having a usb disk attached is the cause of the issue.

The still undetermined cause of the issue could be something which has already been fixed in Debian testing [ wheezy ].
 
me too, yesterday..

i will go back to lenny but i have questions first:

1. i will install lenny on a new partition (i have a spare raid volume just for this).
my vms have lvm storages, these are on a separate raid volume.
will i be able to directly add these volumes to 1.9 without creating/restoring backup?

2. my backup disk has ext4 filesystem on it.
can this be mounted to lenny or i need to format it with ext3?

thanks
u
 
Last edited:
today the backup went fine. I have delayed the backup of proxmox to 8:00. Then the VM's have done there backup task.

So for more information:
The VM's backup on the same NAS user files every day.
The cluster will backup 3 times a week the complete VM's.

It looks like if we have running the backup of the user files AND complete VM we will get the problems.

But this is just a hunch, but today it worked. I will get you posted if it was just luck or it was indeed a problem of running to much backups on the same time...
 
We can confirm that the new kernel 2.6.32-6-pve has i/o issues. We are using software raid10 built from SATA hdds since forever, never was a problem. On the new 2.0 beta the loaded server stopped i/o response in 4-8 hours, load goes up (100,200,1000,...) until all system hangs and stops responding completely. The only i/o requests being services are from disk cache in RAM. Output blocks completely. Reverting to kernel 2.6.32-4-pve solves this issue, however we lack new RH kernel features in it. All I want to say - there are i/o issues in this kernel.

EDIT: We use new Intel Core i7 eXtreme 3.3GHz 15MB cache socket 2011 and 32GB RAM
 
Last edited by a moderator:
Ok when I read more on the internet it could be a bug related to the type of CPU.

So what kind of CPU's are you using?
We dual Intel(R) Xeon(R) CPU L5630 @ 2.13Ghz in our IMS blades.

Maybe we can find some common grounds an pinpoint it to the I/O problem.
 
i have 2x xeon x5450 cpu.

does this mean that i can install 2.6.32-4-pve on my 2.0 node?

Kernel - yes. However You will need to create and copy CT config files with 1.x version. Or You will get crazy resources available to CTs :D
 
Additionally, today we noticed another problem, but now with kernel 2.6.32-4-pve and LVM snapshot on 2.0 beta. All VG stopped responding. Before VG stopped, it was lvremove command issued, and we got this:

device-mapper: remove ioctl failed: Device or resource busy
Unable to deactivate pve-backup1104-cow (254:5)
Failed to resume backup1104.
libdevmapper exiting with 1 device(s) still suspended.
Node /dev/mapper/pve-backup1104-cow was not removed by udev. Falling back to direct node removal.

After that VG stopped responding.
 
then, using 2.6.32-4 is not a solution :(

Not exactly. When using kernel 2.6.32-6-pve, the system was losing i/o every 4-8 hours, when not using LVM snapshot. Reverting to kernel 2.6.32-4-pve solved this problem, system was stable. This means kernel 2.6.32-6-pve has i/o issues, at least on the software raid (or LVM on top of software raid) level.

The fact that LVM Snapshot removal forced all VG to stop responding means that this is more likely new LVM utilities problem. I will try to use 2.6.18-6-pve on PVE 2.0 beta and will snapshot the LVM. This kernel worked for months with LVM backups, flawlessly. If VG will stop responding again, this is not a kernel issue. If it will - this is LVM utilities issue :)
 
The same problem here with 2.6.35-2-pve but "only" sometimes while backing up to a USB disk.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!