Crash in proxmox with pve-kernel-2.6.32-4-pve

nmartin. · Jul 19, 2011

Hi,

I have a Proxmox server running Proxmox with kernel "Linux 2.6.32-4-pve #1 SMP Mon May 9 12:59:57 CEST 2011"

# pveversion -v
pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.32-4-pve
pve-kernel-2.6.32-4-pve: 2.6.32-33
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.28-1pve1
vzdump: 1.2-14
vzprocps: 2.0.11-2
vzquota: 3.0.11-1dso1

The server is quite new and has recent and powerful components (2xSSD hard drives / ...)

We have 3 containers running (Ubuntu 10.04.2), and it appears the whole server is randomly rebooting. The following trace can be seen in the logs:

Code:

Jul 19 18:04:17 server kernel: warning: `vzctl' uses 32-bit capabilities (legacy support in use)
Jul 19 18:04:17 server kernel: CT: 101: started
Jul 19 18:04:17 server kernel: ------------[ cut here ]------------
Jul 19 18:04:17 server kernel: WARNING: at mm/page_alloc.c:1828 __alloc_pages_nodemask+0x183/0x6a8()
Jul 19 18:04:17 server kernel: Hardware name:         
Jul 19 18:04:17 server kernel: Modules linked in: vzethdev vznetdev simfs vzrst vzcpt vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_tcpudp xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables x_tables vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dummy bridge evdev psmouse serio_raw button i2c_i801 i2c_core snd_pcm snd_timer processor shpchp snd soundcore pci_hotplug xhci snd_page_alloc pcspkr ext3 jbd mbcache dm_snapshot thermal fan thermal_sys 8021q garp stp pata_via megaraid_sas 3w_xxxx 3w_9xxx uhci_hcd ehci_hcd usbcore nls_base qlge ixgbe dca sata_nv via686a ahci mptctl mptsas scsi_transport_sas mptspi mptscsih mptbase dm_crypt raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid0 raid1 md_mod dm_mirror dm_region_hash dm_log sata_via ata_piix sata_sis pata_sis libata sym53c8xx megaraid aic7xxx scsi_transport_s
Jul 19 18:04:17 server kernel: i atl1 sky2 skge r8169 e1000e e1000 via_rhine sis900 8139too e100 mii [last unloaded: scsi_wait_scan]
Jul 19 18:04:17 server kernel: Pid: 2533, comm: mountall Not tainted 2.6.32-4-pve #1
Jul 19 18:04:17 server kernel: Call Trace:
Jul 19 18:04:17 server kernel:  [<ffffffff810bd03f>] ? __alloc_pages_nodemask+0x183/0x6a8
Jul 19 18:04:17 server kernel:  [<ffffffff810bd03f>] ? __alloc_pages_nodemask+0x183/0x6a8
Jul 19 18:04:17 server kernel:  [<ffffffff8104e21c>] ? warn_slowpath_common+0x77/0xa3
Jul 19 18:04:17 server kernel:  [<ffffffff810bd03f>] ? __alloc_pages_nodemask+0x183/0x6a8
Jul 19 18:04:17 server kernel:  [<ffffffff810e99d3>] ? new_slab+0x104/0x236
Jul 19 18:04:17 server kernel:  [<ffffffff81100761>] ? __d_path+0x116/0x1e0
Jul 19 18:04:17 server kernel:  [<ffffffff810bc4e9>] ? __get_free_pages+0x9/0x46
Jul 19 18:04:17 server kernel:  [<ffffffff810e9461>] ? __kmalloc+0x3f/0x17f
Jul 19 18:04:17 server kernel:  [<ffffffff81109350>] ? seq_read+0x226/0x388
Jul 19 18:04:17 server kernel:  [<ffffffff810f214e>] ? vfs_read+0xa6/0xff
Jul 19 18:04:17 server kernel:  [<ffffffff810f22c1>] ? sys_read+0x49/0xc4
Jul 19 18:04:17 server kernel:  [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
Jul 19 18:04:17 server kernel: ---[ end trace 657b1fa50e8d82ad ]---

We are unable to clearly identify what makes the server reboot, as it can reboot 6 times during the night (no activity at all / nearly no services started) while being OK during a whole day (jboss / tomcat / mysql / apache / ... started & running). However we have seen that starting the services in container 101 gives a very good chance of seeing the whole machine reboot in the next seconds (containers hosts a Tomcat server, I would evaluate the chances of a full reboot at 80%) while others containers seems ok (but again, it's not a 100% accurate rule).

Any idea anyone?

Thanks a lot

tom · Jul 19, 2011

use the 2.6.18 kernel branch for stable OpenVZ.
see http://pve.proxmox.com/wiki/Proxmox_VE_Kernel#Kernel_2.6.18

and as a side-note, your pveversion shows missing packages, so how did you install?

nmartin. · Jul 19, 2011

sadly it's not hosted in our premises and we do not have the choice of the proxmox version when choosing the server, this is the version proposed by default.

tom · Jul 19, 2011

then ask the guys who installed your server why the installed probably the wrong packages ...

nmartin. · Jul 19, 2011

Those guys are OVH. The company is ranked 6th worldwide in the hosting business ... 1st in Europe ... I've asked them if we could have 2.6.18 when installing though.

Is there anything else I can do? (install other packages / etc)

tom · Jul 19, 2011

nmartin. said:
... (install other packages / etc)

thats exactly what I recommended.

nmartin. · Jul 20, 2011

By this I mean "without reinstalling everything" as it needs to run quite quickly.
Am I able to "update" this mountall package proxmox is complaining about?

I've tried various things, without success so far.

tom · Jul 20, 2011

nmartin. said:
By this I mean "without reinstalling everything"

I did not tell you to reinstall everything, I recommend your re-read the thread.

nmartin. · Jul 20, 2011

Finally I "downgraded" the kernel by following this procedure (which looks specific to the proxmox version OVH is installing):

http://forum.proxmox.com/threads/65...e-to-2.6.32-4-pve-failed...?p=37476#post37476

It apparently works fine now, the stack trace in the logs is gone and the server has not rebooted yet ...

Keeping fingers crossed, Thanks a lot Tom.

Search

Search

Crash in proxmox with pve-kernel-2.6.32-4-pve

nmartin.

New Member

tom

Proxmox Staff Member

nmartin.

New Member

tom

Proxmox Staff Member

nmartin.

New Member

tom

Proxmox Staff Member

nmartin.

New Member

tom

Proxmox Staff Member

nmartin.

New Member

We value your privacy