Kernel oops under heavy I/O

kefear

New Member
Oct 8, 2010
10
0
1
We experience server (kernel?) problems during I/O peaks (backup). After the event server is still running but every new process hangs. Stack trace attached

pveversion
Code:
30 14:02 db-hosting-4 ~ # pveversion --verbose
pve-manager: 1.6-2 (pve-manager/1.6/5087)
running kernel: 2.6.32-3-pve
proxmox-ve-2.6.32: 1.6-13
pve-kernel-2.6.32-3-pve: 2.6.32-13
pve-kernel-2.6.32-2-pve: 2.6.32-8
qemu-server: 1.1-18
pve-firmware: 1.0-7
libpve-storage-perl: 1.0-13
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-7
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.12.5-1
ksm-control-daemon: 1.0-4

Hardware is HP Blade Bl460c with two Intel Xeon CPUs and 32 GB RAM. Storage is done on external array via FC. System is Debian 5.0.6.

Any help would be appreciated
 

Attachments

pls upgrade to the latest stable.
http://pve.proxmox.com/wiki/Downloads

KVM guest Debian:
If you use virtio, make sure you run 2.6.32 inside. default 2.6.26 is known to have issues with virtio.
 
Thanks for Your response. We use 2.6.32 with virtio under our guests and we're gonna upgrade kernel tonight.

Just out of curiosity, are You aware of this 'bug' ? Or is it just 'first shot' advice to upgrade kernel to the latest stable version and then dig deeper in case it's not resolved ?

Anyway, keep up a good work!
 
you are using a quite old kernel and yes, its just a general hint. btw, we just announce the latest KVM 0.14.1 (in pve test) so probably you can test this one.
 
Hello,
I'm in a really similar configuration (BL460c G6, HP EVA 6000/8000), but I'm still on the stable 2.6.18 kernel.
You're using pve-kernel-2.6.32-3-pve, there is an update : 2.6.32-4-pve.

It seems like your storage hang.With FC, you certainly use multipathing.
Can you show us your config (/etc/multipath.conf) and the multiapath state after oops (run "mulltipath -ll") ?
 
Code:
defaults {
    udev_dir			/dev
    polling_interval		3
    selector			"round-robin 0"
    path_grouping_policy	failover
    getuid_callout		"/lib/udev/scsi_id -g -u -s /block/%n"
    path_checker		directio
    rr_min_io			100
    failback			immediate
    features			"1 queue_if_no_path"
}

blacklist {
    devnode	"^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
    devnode	"^hd[a-z][[0-9]*]"
    devnode	"^cciss"
}

devices {
    device {
	vendor	"3PARdata"
	product	"VV"
    }

    device {
	vendor	"Promise"
	product	"VTrak E610f"
    }
}

Code:
30 17:18 db-hosting-4 ~ # multipath -ll
222e60001550c2509dm-7 Promise ,VTrak E610f   
[size=137G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=1][active]
 \_ 0:0:0:7 sdg 8:96  [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 1:0:0:7 sdo 8:224 [active][ready]
22224000155a8c5c5dm-0 Promise ,VTrak E610f   
[size=137G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=1][active]
 \_ 0:0:0:1 sda 8:0   [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 1:0:0:1 sdi 8:128 [active][ready]
222f30001559c7d22dm-5 Promise ,VTrak E610f   
[size=137G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=1][active]
 \_ 0:0:0:5 sde 8:64  [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 1:0:0:5 sdm 8:192 [active][ready]
22209000155c958c6dm-6 Promise ,VTrak E610f   
[size=1.0T][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=1][active]
 \_ 0:0:1:1 sdh 8:112 [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 1:0:1:1 sdp 8:240 [active][ready]
2229f000155f68c42dm-1 Promise ,VTrak E610f   
[size=137G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=1][active]
 \_ 0:0:0:2 sdb 8:16  [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 1:0:0:2 sdj 8:144 [active][ready]
2229f00015582edaddm-3 Promise ,VTrak E610f   
[size=137G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=1][active]
 \_ 0:0:0:4 sdd 8:48  [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 1:0:0:4 sdl 8:176 [active][ready]
2221c0001558623eadm-2 Promise ,VTrak E610f   
[size=137G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=1][active]
 \_ 0:0:0:3 sdc 8:32  [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 1:0:0:3 sdk 8:160 [active][ready]
2220e000155b1a68cdm-4 Promise ,VTrak E610f   
[size=137G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=1][active]
 \_ 0:0:0:6 sdf 8:80  [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 1:0:0:6 sdn 8:208 [active][ready]

FC paths are ok, it's rather related to kernel space. As I said server is running 'fine' but all new processes that we're created after the event are frozen.
 
KVM guest Debian:
If you use virtio, make sure you run 2.6.32 inside. default 2.6.26 is known to have issues with virtio.

Can you provide a reference for this claim? I can't seem to find anything on Google about 2.6.26 and virtio issues....
 
I remember some posting in the KVM list, but I suggest you just test it to see if it solves the issue for you.