Hi,
I have a Proxmox server running Proxmox with kernel "Linux 2.6.32-4-pve #1 SMP Mon May 9 12:59:57 CEST 2011"
# pveversion -v
pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.32-4-pve
pve-kernel-2.6.32-4-pve: 2.6.32-33
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.28-1pve1
vzdump: 1.2-14
vzprocps: 2.0.11-2
vzquota: 3.0.11-1dso1
The server is quite new and has recent and powerful components (2xSSD hard drives / ...)
We have 3 containers running (Ubuntu 10.04.2), and it appears the whole server is randomly rebooting. The following trace can be seen in the logs:
We are unable to clearly identify what makes the server reboot, as it can reboot 6 times during the night (no activity at all / nearly no services started) while being OK during a whole day (jboss / tomcat / mysql / apache / ... started & running). However we have seen that starting the services in container 101 gives a very good chance of seeing the whole machine reboot in the next seconds (containers hosts a Tomcat server, I would evaluate the chances of a full reboot at 80%) while others containers seems ok (but again, it's not a 100% accurate rule).
Any idea anyone?
Thanks a lot
I have a Proxmox server running Proxmox with kernel "Linux 2.6.32-4-pve #1 SMP Mon May 9 12:59:57 CEST 2011"
# pveversion -v
pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.32-4-pve
pve-kernel-2.6.32-4-pve: 2.6.32-33
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.28-1pve1
vzdump: 1.2-14
vzprocps: 2.0.11-2
vzquota: 3.0.11-1dso1
The server is quite new and has recent and powerful components (2xSSD hard drives / ...)
We have 3 containers running (Ubuntu 10.04.2), and it appears the whole server is randomly rebooting. The following trace can be seen in the logs:
Code:
Jul 19 18:04:17 server kernel: warning: `vzctl' uses 32-bit capabilities (legacy support in use)
Jul 19 18:04:17 server kernel: CT: 101: started
Jul 19 18:04:17 server kernel: ------------[ cut here ]------------
Jul 19 18:04:17 server kernel: WARNING: at mm/page_alloc.c:1828 __alloc_pages_nodemask+0x183/0x6a8()
Jul 19 18:04:17 server kernel: Hardware name:
Jul 19 18:04:17 server kernel: Modules linked in: vzethdev vznetdev simfs vzrst vzcpt vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_tcpudp xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables x_tables vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dummy bridge evdev psmouse serio_raw button i2c_i801 i2c_core snd_pcm snd_timer processor shpchp snd soundcore pci_hotplug xhci snd_page_alloc pcspkr ext3 jbd mbcache dm_snapshot thermal fan thermal_sys 8021q garp stp pata_via megaraid_sas 3w_xxxx 3w_9xxx uhci_hcd ehci_hcd usbcore nls_base qlge ixgbe dca sata_nv via686a ahci mptctl mptsas scsi_transport_sas mptspi mptscsih mptbase dm_crypt raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid0 raid1 md_mod dm_mirror dm_region_hash dm_log sata_via ata_piix sata_sis pata_sis libata sym53c8xx megaraid aic7xxx scsi_transport_s
Jul 19 18:04:17 server kernel: i atl1 sky2 skge r8169 e1000e e1000 via_rhine sis900 8139too e100 mii [last unloaded: scsi_wait_scan]
Jul 19 18:04:17 server kernel: Pid: 2533, comm: mountall Not tainted 2.6.32-4-pve #1
Jul 19 18:04:17 server kernel: Call Trace:
Jul 19 18:04:17 server kernel: [<ffffffff810bd03f>] ? __alloc_pages_nodemask+0x183/0x6a8
Jul 19 18:04:17 server kernel: [<ffffffff810bd03f>] ? __alloc_pages_nodemask+0x183/0x6a8
Jul 19 18:04:17 server kernel: [<ffffffff8104e21c>] ? warn_slowpath_common+0x77/0xa3
Jul 19 18:04:17 server kernel: [<ffffffff810bd03f>] ? __alloc_pages_nodemask+0x183/0x6a8
Jul 19 18:04:17 server kernel: [<ffffffff810e99d3>] ? new_slab+0x104/0x236
Jul 19 18:04:17 server kernel: [<ffffffff81100761>] ? __d_path+0x116/0x1e0
Jul 19 18:04:17 server kernel: [<ffffffff810bc4e9>] ? __get_free_pages+0x9/0x46
Jul 19 18:04:17 server kernel: [<ffffffff810e9461>] ? __kmalloc+0x3f/0x17f
Jul 19 18:04:17 server kernel: [<ffffffff81109350>] ? seq_read+0x226/0x388
Jul 19 18:04:17 server kernel: [<ffffffff810f214e>] ? vfs_read+0xa6/0xff
Jul 19 18:04:17 server kernel: [<ffffffff810f22c1>] ? sys_read+0x49/0xc4
Jul 19 18:04:17 server kernel: [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
Jul 19 18:04:17 server kernel: ---[ end trace 657b1fa50e8d82ad ]---
We are unable to clearly identify what makes the server reboot, as it can reboot 6 times during the night (no activity at all / nearly no services started) while being OK during a whole day (jboss / tomcat / mysql / apache / ... started & running). However we have seen that starting the services in container 101 gives a very good chance of seeing the whole machine reboot in the next seconds (containers hosts a Tomcat server, I would evaluate the chances of a full reboot at 80%) while others containers seems ok (but again, it's not a 100% accurate rule).
Any idea anyone?
Thanks a lot