.32 kernel crashing on our hardware.

mvrhov

Active Member
Jan 29, 2011
20
2
43
I tried to upgrade our server from 1.5 and .24 kernel to 1.8 and .32 kernel.
The proxmox upgrade went well but we are having a problem with .32 kernel.
The 1st crash is in vzwdog module:

Code:
Apr 23 00:30:00 master kernel: disk_io:  104       0 cciss/c0d0 1877 409 56530 8272 111 126 1896 192 0 6088 8456
Apr 23 00:30:00 master kernel: 104       1 cciss/c0d0p1 44 264 1434 188 1 0 8 4 0 124 192
Apr 23 00:30:00 master kernel: 104       2 cciss/c0d0p2 1808 85 54756 8032 110 126 1888 188 0 5956 8212
Apr 23 00:30:00 master kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
Apr 23 00:30:00 master kernel: IP: [<ffffffff811705ee>] disk_part_iter_next+0x59/0xb6
Apr 23 00:30:00 master kernel: PGD 0 
Apr 23 00:30:00 master kernel: Oops: 0000 [#1] SMP 
Apr 23 00:30:00 master kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:01:04.6/class
Apr 23 00:30:00 master kernel: CPU 12 
Apr 23 00:30:00 master kernel: Modules linked in: vzrst vzcpt vzwdog vzdquota vzmon vzdev xt_tcpudp ipt_ULOG nf_nat_ftp iptable_nat nf_nat xt_state xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_owner xt_multiport xt_mac xt_limit ipt_ecn xt_recent nf_conntrack_ftp nf_conntrack_irc nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ipt_LOG xt_DSCP xt_dscp ipt_REJECT ip_tables x_tables ipmi_devintf ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp bonding ipmi_si ipmi_msghandler usbhid container snd_pcm evdev hid snd_timer snd soundcore snd_page_alloc hpilo psmouse serio_raw power_meter pcspkr processor button ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot ehci_hcd uhci_hcd usbcore nls_base bnx2 cciss thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Apr 23 00:30:00 master kernel: Pid: 2639, comm: vzwdog Not tainted 2.6.32-4-pve #1 feoktistov ProLiant DL360 G6
Apr 23 00:30:00 master kernel: RIP: 0010:[<ffffffff811705ee>]  [<ffffffff811705ee>] disk_part_iter_next+0x59/0xb6
Apr 23 00:30:00 master kernel: RSP: 0018:ffff88031b4dfcc0  EFLAGS: 00010246
Apr 23 00:30:00 master kernel: RAX: 0000000000000008 RBX: ffff88031b4dfd60 RCX: 0000000000000001
Apr 23 00:30:00 master kernel: RDX: 0000000000000000 RSI: ffff88031c111bc0 RDI: ffff88031b4dfd60
Apr 23 00:30:00 master kernel: RBP: ffff88031b4dffd8 R08: 0000000000000002 R09: ffffffff81403814
Apr 23 00:30:00 master kernel: R10: 0000000000000004 R11: 0000000000000710 R12: ffffffffa02cd000
Apr 23 00:30:00 master kernel: R13: 0000000000000000 R14: 0000000000000001 R15: ffffffff8150a5c0
Apr 23 00:30:00 master kernel: FS:  0000000000000000(0000) GS:ffff88000f580000(0000) knlGS:0000000000000000
Apr 23 00:30:00 master kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Apr 23 00:30:00 master kernel: CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006e0
Apr 23 00:30:00 master kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 23 00:30:00 master kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 23 00:30:00 master kernel: Process vzwdog (pid: 2639, veid=0, threadinfo ffff88031b4de000, task ffff88031c7db000)
Apr 23 00:30:00 master kernel: Stack:
Apr 23 00:30:00 master kernel: ffff88031b4dfd60 ffff88031b4dffd8 ffffffffa02cd000 ffffffffa02cd428
Apr 23 00:30:00 master kernel: <0> 000000000000d5e4 0000000000001f60 000000000000006e 000000000000007e
Apr 23 00:30:00 master kernel: <0> 0000000000000760 ffffffff000000bc 0000000000000000 0000000000001744
Apr 23 00:30:00 master kernel: Call Trace:
Apr 23 00:30:00 master kernel: [<ffffffffa02cd000>] ? show_one_disk_io+0x0/0x46d [vzwdog]
Apr 23 00:30:00 master kernel: [<ffffffffa02cd428>] ? show_one_disk_io+0x428/0x46d [vzwdog]
Apr 23 00:30:00 master kernel: [<ffffffff8131386d>] ? printk+0x4e/0x59
Apr 23 00:30:00 master kernel: [<ffffffffa02cd000>] ? show_one_disk_io+0x0/0x46d [vzwdog]
Apr 23 00:30:00 master kernel: [<ffffffff812140d2>] ? class_for_each_device+0x7f/0xac
Apr 23 00:30:00 master kernel: [<ffffffffa02cd46d>] ? wdog_loop+0x0/0x36b [vzwdog]
Apr 23 00:30:00 master kernel: [<ffffffffa02cd64a>] ? wdog_loop+0x1dd/0x36b [vzwdog]
Apr 23 00:30:00 master kernel: [<ffffffffa02cd46d>] ? wdog_loop+0x0/0x36b [vzwdog]
Apr 23 00:30:00 master kernel: [<ffffffff81066742>] ? kthread+0xc0/0xca
Apr 23 00:30:00 master kernel: [<ffffffff81011c6a>] ? child_rip+0xa/0x20
Apr 23 00:30:00 master kernel: [<ffffffff81066682>] ? kthread+0x0/0xca
Apr 23 00:30:00 master kernel: [<ffffffff81011c60>] ? child_rip+0x0/0x20
Apr 23 00:30:00 master kernel: Code: 2a 83 c9 ff a8 0c b8 00 00 00 00 41 89 cc 0f 44 c8 48 8d 72 20 49 b8 ff ff ff ff 08 00 00 00 48 bf 00 00 00 00 08 00 00 00 eb 4d <8b> 4a 10 41 bc 01 00 00 00 eb db 48 63 c2 48 8d 04 c6 48 8b 28 
Apr 23 00:30:00 master kernel: RIP  [<ffffffff811705ee>] disk_part_iter_next+0x59/0xb6
Apr 23 00:30:00 master kernel: RSP <ffff88031b4dfcc0>
Apr 23 00:30:00 master kernel: CR2: 0000000000000010
Apr 23 00:30:00 master kernel: ---[ end trace 377c218aff8daffa ]---
The other one is:
Code:
Apr 23 00:31:06 master kernel: ------------[ cut here ]------------
Apr 23 00:31:06 master kernel: WARNING: at mm/page_alloc.c:1828 __alloc_pages_nodemask+0x183/0x6a8()
Apr 23 00:31:06 master kernel: Hardware name: ProLiant DL360 G6
Apr 23 00:31:06 master kernel: Modules linked in: vzethdev vznetdev simfs vzrst vzcpt vzwdog vzdquota vzmon vzdev xt_tcpudp ipt_ULOG nf_nat_ftp iptable_nat nf_nat xt_state xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_owner xt_multiport xt_mac xt_limit ipt_ecn xt_recent nf_conntrack_ftp nf_conntrack_irc nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ipt_LOG xt_DSCP xt_dscp ipt_REJECT ip_tables x_tables ipmi_devintf ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp bonding ipmi_si ipmi_msghandler usbhid container snd_pcm evdev hid snd_timer snd soundcore snd_page_alloc hpilo psmouse serio_raw power_meter pcspkr processor button ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot ehci_hcd uhci_hcd usbcore nls_base bnx2 cciss thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Apr 23 00:31:06 master kernel: Pid: 13308, comm: mountall Tainted: G      D    2.6.32-4-pve #1
Apr 23 00:31:06 master kernel: Call Trace:
Apr 23 00:31:06 master kernel: [<ffffffff810bd037>] ? __alloc_pages_nodemask+0x183/0x6a8
Apr 23 00:31:06 master kernel: [<ffffffff810bd037>] ? __alloc_pages_nodemask+0x183/0x6a8
Apr 23 00:31:06 master kernel: [<ffffffff8104e21c>] ? warn_slowpath_common+0x77/0xa3
Apr 23 00:31:06 master kernel: [<ffffffff810bd037>] ? __alloc_pages_nodemask+0x183/0x6a8
Apr 23 00:31:06 master kernel: [<ffffffff810e99a7>] ? new_slab+0x104/0x236
Apr 23 00:31:06 master kernel: [<ffffffff81100735>] ? __d_path+0x116/0x1e0
Apr 23 00:31:06 master kernel: [<ffffffff810bc4e1>] ? __get_free_pages+0x9/0x46
Apr 23 00:31:06 master kernel: [<ffffffff810e9435>] ? __kmalloc+0x3f/0x17f
Apr 23 00:31:06 master kernel: [<ffffffff81109324>] ? seq_read+0x226/0x388
Apr 23 00:31:06 master kernel: [<ffffffff810f2122>] ? vfs_read+0xa6/0xff
Apr 23 00:31:06 master kernel: [<ffffffff810f2295>] ? sys_read+0x49/0xc4
Apr 23 00:31:06 master kernel: [<ffffffff81037623>] ? ia32_sysret+0x0/0x5
Apr 23 00:31:06 master kernel: ---[ end trace 377c218aff8daffb ]---
And I have no Idea from where it's coming.

Regards,
Miha
 
This is the other part as the forum didn't like the length of a post.

This is the one I got after I decided it's best to restart and boot back to .24 kernel:
Code:
Apr 23 00:42:10 master kernel: BUG: unable to handle kernel paging request at 00000000dead111c
Apr 23 00:42:10 master kernel: IP: [<ffffffff81074aa6>] ub_task_put+0x15/0xf4
Apr 23 00:42:10 master kernel: PGD 61ba2e067 PUD 0 
Apr 23 00:42:10 master kernel: Oops: 0000 [#2] SMP 
Apr 23 00:42:10 master kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:01:04.6/class
Apr 23 00:42:10 master kernel: CPU 3 
Apr 23 00:42:10 master kernel: Modules linked in: kvm_intel kvm simfs  vzwdog(-) vzdquota vzmon vzdev xt_tcpudp ipt_ULOG nf_nat_ftp iptable_nat  nf_nat xt_state xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle  iptable_filter xt_owner xt_multiport xt_mac xt_limit ipt_ecn xt_recent  nf_conntrack_ftp nf_conntrack_irc nf_conntrack_ipv4 nf_conntrack  nf_defrag_ipv4 ipt_LOG xt_DSCP xt_dscp ipt_REJECT ip_tables x_tables  ipmi_devintf ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr  iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp bonding  ipmi_si ipmi_msghandler usbhid container snd_pcm evdev hid snd_timer snd  soundcore snd_page_alloc hpilo psmouse serio_raw power_meter pcspkr  processor button ext3 jbd mbcache dm_mirror dm_region_hash dm_log  dm_snapshot ehci_hcd uhci_hcd usbcore nls_base bnx2 cciss thermal fan  thermal_sys [last unloaded: vzrst]
Apr 23 00:42:10 master kernel: Pid: 22096, comm: modprobe Tainted: G      D W  2.6.32-4-pve #1 feoktistov ProLiant DL360 G6
Apr 23 00:42:10 master kernel: RIP: 0010:[<ffffffff81074aa6>]  [<ffffffff81074aa6>] ub_task_put+0x15/0xf4
Apr 23 00:42:10 master kernel: RSP: 0018:ffff8805f6933e98  EFLAGS: 00010287
Apr 23 00:42:10 master kernel: RAX: 00000000dead100c RBX: ffff88031c7db000 RCX: 0000000000000200
Apr 23 00:42:10 master kernel: RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffff88031c7db000
Apr 23 00:42:10 master kernel: RBP: ffff88031c7db000 R08: 0000000000000000 R09: ffffffff8150a600
Apr 23 00:42:10 master kernel: R10: 0000000100000000 R11: ffffffff813e1c43 R12: 0000000000000000
Apr 23 00:42:10 master kernel: R13: 00000000dead100c R14: 0000000000000000 R15: 0000000000000000
Apr 23 00:42:10 master kernel: FS:  00007f7adc23c6e0(0000) GS:ffff88032dc40000(0000) knlGS:0000000000000000
Apr 23 00:42:10 master kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Apr 23 00:42:10 master kernel: CR2: 00000000dead111c CR3: 000000061690a000 CR4: 00000000000026e0
Apr 23 00:42:10 master kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 23 00:42:10 master kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 23 00:42:10 master kernel: Process modprobe (pid: 22096, veid=0, threadinfo ffff8805f6932000, task ffff8805cac63000)
Apr 23 00:42:10 master kernel: Stack:
Apr 23 00:42:10 master kernel: ffff88031c7db000 ffff88031c7db000 0000000000000000 00007fffa584d650
Apr 23 00:42:10 master kernel: <0> 0000000000000000 ffffffff8104dbe8 ffff88031c7db010 ffffffff810667c2
Apr 23 00:42:10 master kernel: <0> 00000000fffffff5 ffffffffa02cda10 0000000000000080 ffffffffa02cd7e8
Apr 23 00:42:10 master kernel: Call Trace:
Apr 23 00:42:10 master kernel: [<ffffffff8104dbe8>] ? __put_task_struct+0x5d/0xc5
Apr 23 00:42:10 master kernel: [<ffffffff810667c2>] ? kthread_stop+0x76/0xa2
Apr 23 00:42:10 master kernel: [<ffffffffa02cd7e8>] ? wdog_exit+0x10/0x1f [vzwdog]
Apr 23 00:42:10 master kernel: [<ffffffff81081804>] ? sys_delete_module+0x1d2/0x258
Apr 23 00:42:10 master kernel: [<ffffffff810d4f6c>] ? sys_munmap+0x4d/0x59
Apr 23 00:42:10 master kernel: [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
Apr 23 00:42:10 master kernel: Code: a7 90 00 00 00 48 83 c4 28 89 d8 5b  5d 41 5c 41 5d 41 5e 41 5f c3 41 56 41 55 41 54 55 53 48 8b 87 10 07 00  00 48 89 fb 49 89 c5 <48> 8b 80 10 01 00 00 48 85 c0 75 f1 4d 8d  65 50 4c 89 e7 e8 91 
Apr 23 00:42:10 master kernel: RIP  [<ffffffff81074aa6>] ub_task_put+0x15/0xf4
Apr 23 00:42:10 master kernel: RSP <ffff8805f6933e98>
Apr 23 00:42:10 master kernel: CR2: 00000000dead111c
Apr 23 00:42:10 master kernel: ---[ end trace 377c218aff8daffc ]---
pveversion run on .24 kernel
Code:
pveversion -V
pve-manager: 1.8-15 (pve-manager/1.8/5754)
running kernel: 2.6.24-8-pve
proxmox-ve-2.6.32: 1.8-32
pve-kernel-2.6.32-4-pve: 2.6.32-32
pve-kernel-2.6.24-8-pve: 2.6.24-16
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-11
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.0-3
ksm-control-daemon: 1.0-5

Regards,
Miha
 
try 2.6.18. working?
 
We were using .24 for the past 18 months and with .32 crashing I went back to it and .24 is working just fine with 1.8.
We have a development environment built with standard PC components: Phenom 2 x6 and Proxmox 1.8 with .32 kernel are working without a hitch.
But somewhat .32 doesn't like HP ProLiant DL360 G6

Regards,
Miha