PVE 2.3 - 2.6.32.19 kernel panics

dlasher

Renowned Member
Mar 23, 2011
242
30
93
Was running 2.6.32-12, upgraded to 2.6.32-19, since I've taken (4) kernel panic/hard crash in the last 4 days. :(

I can't find anywhere in /var/log those are dumped, so I'm at the mercy of my memory.. the Crash was caused by individual OpenVZ container, with lots of complaints about XFS Writes. (Multiple containers read/write to/from a xxTB array formatted XFS)

I've rebooted under 2.6.32-16, we'll see if I get crashes there. So couple of questions:

1. Where can I find, if existing, better logs written to disk to help address this issue?
2. What can I set now, if anything, to make better logs be generated if it crashes again?


Code:
root@pmx4:/var/log# pveversion -v
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.3-95
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-80
pve-kernel-2.6.32-12-pve: 2.6.32-68
pve-kernel-2.6.32-19-pve: 2.6.32-95
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-20
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-7
vncterm: 1.0-4
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-10
ksm-control-daemon: 1.1-1
 
:(

Did find a few issues under 2.6.32-16 as well, so I'll reboot under 2.6.32-12. Dug things like this out of the kern.log from two days ago:

Code:
May  8 15:12:47 pmx4 kernel: WARNING: at fs/xfs/linux-2.6/xfs_aops.c:1125 xfs_vm_releasepage+0xb6/0xc0 [xfs]() (Tainted: G        WC ---------------   )
May  8 15:12:47 pmx4 kernel: Hardware name: empty
May  8 15:12:47 pmx4 kernel: Modules linked in: usb_storage dm_snapshot powernow_k8 mperf cpufreq_stats vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit vhost_net xt_dscp macvtap macvlan ipt_REJECT tun ip_tables kvm_amd kvm cpufreq_conservative cpufreq_powersave cpufreq_ondemand freq_table vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi fuse scsi_transport_iscsi nfsd nfs lockd fscache nfs_acl auth_rpcgss sunrpc bonding ipv6 8021q garp xfs lm85 hwmon_vid ipmi_si ipmi_msghandler radeon snd_pcsp ttm drm_kms_helper snd_pcm snd_timer drm amd64_edac_mod i2c_piix4 tpm_tis snd edac_core tpm k10temp soundcore i2c_algo_bit edac_mce_amd i2c_core shpchp tpm_bios snd_page_alloc ext3 jbd mbcache ata_generic sg pata_acpi e100 3c59x
May  8 15:12:47 pmx4 kernel: e1000e arcmsr tg3 mii pata_serverworks 3w_9xxx sata_svw [last unloaded: scsi_wait_scan]
May  8 15:12:47 pmx4 kernel: Pid: 100, comm: kswapd1 veid: 0 Tainted: G        WC ---------------    2.6.32-16-pve #1
May  8 15:12:47 pmx4 kernel: Call Trace:
May  8 15:12:47 pmx4 kernel: [<ffffffff8106c658>] ? warn_slowpath_common+0x88/0xc0
May  8 15:12:47 pmx4 kernel: [<ffffffff8106c6aa>] ? warn_slowpath_null+0x1a/0x20
May  8 15:12:47 pmx4 kernel: [<ffffffffa041be06>] ? xfs_vm_releasepage+0xb6/0xc0 [xfs]
May  8 15:12:47 pmx4 kernel: [<ffffffff81124552>] ? try_to_release_page+0x32/0x50
May  8 15:12:47 pmx4 kernel: [<ffffffff8113dd0a>] ? pagevec_strip+0x7a/0x80
May  8 15:12:47 pmx4 kernel: [<ffffffff811419c3>] ? move_active_pages_to_lru+0x1e3/0x2b0
May  8 15:12:47 pmx4 kernel: [<ffffffff81143d7d>] ? shrink_active_list+0x32d/0x4a0
May  8 15:12:47 pmx4 kernel: [<ffffffff81144531>] ? shrink_zone+0x641/0x900
May  8 15:12:47 pmx4 kernel: [<ffffffff81133c5d>] ? zone_watermark_ok_safe+0xad/0xc0
May  8 15:12:47 pmx4 kernel: [<ffffffff81145e19>] ? balance_pgdat+0x739/0x820
May  8 15:12:47 pmx4 kernel: [<ffffffff81141a90>] ? isolate_pages_global+0x0/0x530
May  8 15:12:47 pmx4 kernel: [<ffffffff81146031>] ? kswapd+0x131/0x3a0
May  8 15:12:47 pmx4 kernel: [<ffffffff81095d60>] ? autoremove_wake_function+0x0/0x40
May  8 15:12:47 pmx4 kernel: [<ffffffff81145f00>] ? kswapd+0x0/0x3a0
May  8 15:12:47 pmx4 kernel: [<ffffffff81095786>] ? kthread+0x96/0xa0
May  8 15:12:47 pmx4 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
May  8 15:12:47 pmx4 kernel: [<ffffffff810956f0>] ? kthread+0x0/0xa0
May  8 15:12:47 pmx4 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
May  8 15:12:47 pmx4 kernel: ---[ end trace 862b353c6f05549e ]---
 
Looks like whatever this is started under 2.6.32-16, but doesn't happen under 2.6.32-12. Now that I'm digging, I see I missed complaints after the upgrade from -12 to -16.. Lots of these:

Code:
Apr 28 23:18:02 pmx4 kernel: ------------[ cut here ]------------
Apr 28 23:18:02 pmx4 kernel: WARNING: at fs/xfs/linux-2.6/xfs_aops.c:1125 xfs_vm_releasepage+0xb6/0xc0 [xfs]() (Tainted: G        WC ---------------   )
Apr 28 23:18:02 pmx4 kernel: Hardware name: empty
Apr 28 23:18:02 pmx4 kernel: Modules linked in: dm_snapshot powernow_k8 mperf cpufreq_stats vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 plo
op simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_
length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit vhost_net xt_dscp macvtap macvlan ipt_REJECT tun ip_tables kvm_amd k
vm cpufreq_conservative cpufreq_powersave cpufreq_ondemand freq_table vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi
_tcp libiscsi fuse scsi_transport_iscsi nfsd nfs lockd fscache nfs_acl auth_rpcgss sunrpc bonding ipv6 8021q garp xfs lm85 hwmon_vid ipmi_si ipmi_msghand
ler radeon snd_pcsp ttm drm_kms_helper snd_pcm snd_timer drm amd64_edac_mod i2c_piix4 tpm_tis snd edac_core tpm k10temp soundcore i2c_algo_bit edac_mce_a
md i2c_core shpchp tpm_bios snd_page_alloc ext3 jbd mbcache ata_generic sg pata_acpi e100 3c59x e1000e arcm
Apr 28 23:18:02 pmx4 kernel: sr tg3 mii pata_serverworks 3w_9xxx sata_svw [last unloaded: scsi_wait_scan]
Apr 28 23:18:02 pmx4 kernel: Pid: 99, comm: kswapd0 veid: 0 Tainted: G        WC ---------------    2.6.32-16-pve #1
Apr 28 23:18:02 pmx4 kernel: Call Trace:
Apr 28 23:18:02 pmx4 kernel: [<ffffffff8106c658>] ? warn_slowpath_common+0x88/0xc0
Apr 28 23:18:02 pmx4 kernel: [<ffffffff8106c6aa>] ? warn_slowpath_null+0x1a/0x20
Apr 28 23:18:02 pmx4 kernel: [<ffffffffa041be06>] ? xfs_vm_releasepage+0xb6/0xc0 [xfs]
Apr 28 23:18:02 pmx4 kernel: [<ffffffff81124552>] ? try_to_release_page+0x32/0x50
Apr 28 23:18:02 pmx4 kernel: [<ffffffff8113dd0a>] ? pagevec_strip+0x7a/0x80
Apr 28 23:18:02 pmx4 kernel: [<ffffffff811419c3>] ? move_active_pages_to_lru+0x1e3/0x2b0
Apr 28 23:18:02 pmx4 kernel: [<ffffffff81143d7d>] ? shrink_active_list+0x32d/0x4a0
Apr 28 23:18:02 pmx4 kernel: [<ffffffff81144531>] ? shrink_zone+0x641/0x900
Apr 28 23:18:02 pmx4 kernel: [<ffffffff81133c5d>] ? zone_watermark_ok_safe+0xad/0xc0
Apr 28 23:18:02 pmx4 kernel: [<ffffffff81145e19>] ? balance_pgdat+0x739/0x820
Apr 28 23:18:02 pmx4 kernel: [<ffffffff81141a90>] ? isolate_pages_global+0x0/0x530
Apr 28 23:18:02 pmx4 kernel: [<ffffffff81146031>] ? kswapd+0x131/0x3a0
Apr 28 23:18:02 pmx4 kernel: [<ffffffff81095d60>] ? autoremove_wake_function+0x0/0x40
Apr 28 23:18:02 pmx4 kernel: [<ffffffff81145f00>] ? kswapd+0x0/0x3a0
Apr 28 23:18:02 pmx4 kernel: [<ffffffff81095786>] ? kthread+0x96/0xa0
Apr 28 23:18:02 pmx4 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Apr 28 23:18:02 pmx4 kernel: [<ffffffff810956f0>] ? kthread+0x0/0xa0
Apr 28 23:18:02 pmx4 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Apr 28 23:18:02 pmx4 kernel: ---[ end trace 862b353c6f029903 ]---
Apr 28 23:18:02 pmx4 kernel: ------------[ cut here ]------------

- - - Updated - - -


Perhaps: http://oss.sgi.com/archives/xfs/2010-11/msg00251.html
 
Ran clean over the weekend on -12 so there's something ugly crawled in from -16 forward. How do I best help troubleshoot this issue?
 
all your logs shows 2.6.32-16-pve, not 2.6.32-19.?

test with latest, best with 30rc1.
 
all your logs shows 2.6.32-16-pve, not 2.6.32-19.?

test with latest, best with 30rc1.

Thanks for the reply... Started happening under .16, rolled to .19 to see if it would solve it, didn't, rolled back to .12, and have been stable.

What's the safest way to try "30rc1" ?
 
there is no 3.0rc1 anymore, its 3.0 stable, see announcement from today.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!