kernel bug every few minutes.

keinstein

New Member
Nov 2, 2009
2
0
1
I keep hitting a kernelbug:
Code:
Unable to handle kernel NULL pointer dereference at 0000000000000001 RIP:
 [<ffffffff881f4884>] :intel_agp:____versions+0x6c4/0xffffffffffffe9bc
PGD df46f067 PUD 0
Oops: 0002 [1] PREEMPT SMP
CPU: 1
Modules linked in: kvm_intel kvm vzethdev vznetdev simfs vzrst vzcpt tun vzdquota vzmon vzdev xt_tcpudp xt_length ipt_ttl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit ipt_tos ipt_REJECT ip_tables x_tables nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ipv6 bridge xfs loop nvidiafb fb_ddc iTCO_wdt evdev pcspkr i2c_algo_bit i2c_i801 parport_pc parport evbug vgastate iTCO_vendor_support intel_agp i2c_core button sha256_generic aes_generic aes_x86_64 cbc blkcipher dm_crypt dm_mirror dm_snapshot dm_mod raid456 md_mod async_xor async_memcpy async_tx xor pata_jmicron sd_mod pata_acpi ata_generic sata_sil ahci floppy libata ehci_hcd uhci_hcd scsi_mod e1000e usbcore thermal processor fan
Pid: 2361, comm: md0_raid5 Not tainted 2.6.24-8-pve #1 ovz005
RIP: 0010:[<ffffffff881f4884>]  [<ffffffff881f4884>] :intel_agp:____versions+0x6c4/0xffffffffffffe9bc
RSP: 0018:ffff81011512dc08  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff81007dbd93e0 RCX: 0000000000000004
RDX: ffff81007da22b80 RSI: 0000000000000003 RDI: ffff81007da22b80
RBP: 0000000000000001 R08: 0000000000000002 R09: ffff810001000180
R10: 0000000000000002 R11: ffffffff881f4810 R12: ffff81007da22b80
R13: ffff8100a2bc5a10 R14: 0000000000000000 R15: ffff810117829200
FS:  0000000000000000(0000) GS:ffff81011b402880(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000001 CR3: 00000000df415000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process md0_raid5 (pid: 2361, veid=0, threadinfo ffff81011512c000, task ffff81011513d1c0)
Stack:  ffffffff881b48c4 ffff81011695a200 ffff81007be6f380 ffff8101151cc058
 0000000000000000 000000000000000c ffff8101151cc000 000000000000000b
 ffffffff88168037 0000000000000000 ffffffff8816c7dd ffffffff806e9e80
Call Trace:
 [<ffffffff881b48c4>] :dm_crypt:crypt_endio+0x94/0x110
 [<ffffffff88168037>] :raid456:return_io+0x37/0x50
 [<ffffffff8816c7dd>] :raid456:handle_stripe+0x73d/0x2c90
 [<ffffffff88168e02>] :raid456:__release_stripe+0x152/0x190
 [<ffffffff881709aa>] :raid456:raid5d+0x33a/0x3f0
 [<ffffffff8815310b>] :md_mod:md_thread+0x4b/0x130
 [<ffffffff8025c230>] autoremove_wake_function+0x0/0x30
 [<ffffffff881530c0>] :md_mod:md_thread+0x0/0x130
 [<ffffffff8025be87>] kthread+0x47/0x90
 [<ffffffff8020d4e8>] child_rip+0xa/0x12
 [<ffffffff8025be40>] kthread+0x0/0x90
 [<ffffffff8020d4de>] child_rip+0x0/0x12


Code: 00 00 00 00 70 63 69 5f 62 75 73 5f 77 72 69 74 65 5f 63 6f
RIP  [<ffffffff881f4884>] :intel_agp:____versions+0x6c4/0xffffffffffffe9bc
 RSP <ffff81011512dc08>
CR2: 0000000000000001
---[ end trace 2f331a7687f2bdde ]---

I 16tb mdadm-raid5, bound with "mount -o bind" from the host to one of the vms. The VM write with about 5MB/s to that array.
then it suddenly crashes.

same happens when the host writes directly to the array.

super annoying, has happened 20 times now. I fear for my data.

kernel buggy?

any more recent kernel anywhere? I only use openvz, no kvm right now.

or: I heard linux-vserver has better (newer) kernels than openvz, should I try that?

thanks!
 
We do not support software raid. I strongly recommend that you use a hardware raid controller instead.