Proxmox crash

Marek Panek

New Member
Dec 11, 2010
7
0
1
Hi,

today I encountered two crashes in one machine:

Aug 31 08:29:18 vps5 kernel: general protection fault: 0000 [#1] SMP
Aug 31 08:29:18 vps5 kernel: last sysfs file: /sys/kernel/mm/ksm/run
Aug 31 08:29:18 vps5 kernel: CPU 0
Aug 31 08:29:18 vps5 kernel: Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc vhost_net kvm_intel kvm ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp snd_pcm snd_timer snd soundcore psmouse tpm_tis snd_page_alloc tpm tpm_bios serio_raw i2c_i801 i7core_edac pcspkr edac_core joydev ioatdma dca usbhid hid megaraid_sas e1000e [last unloaded: scsi_wait_scan]
Aug 31 08:29:18 vps5 kernel:
Aug 31 08:29:18 vps5 kernel: Pid: 51, comm: events/0 Not tainted 2.6.35-1-pve #1 X8DTT-H/X8DTT-H
Aug 31 08:29:18 vps5 kernel: RIP: 0010:[<ffffffffa02604d4>] [<ffffffffa02604d4>] kvm_set_irq+0x65/0x109 [kvm]
Aug 31 08:29:18 vps5 kernel: RSP: 0018:ffff8803321e9d20 EFLAGS: 00010246
Aug 31 08:29:18 vps5 kernel: RAX: ffff88004d2e0cc0 RBX: ffff88004d2e0960 RCX: 0000000000000001
Aug 31 08:29:18 vps5 kernel: RDX: 2f736b6e696c6572 RSI: 0000000000000000 RDI: ffff88004d2e0cc0
Aug 31 08:29:18 vps5 kernel: RBP: ffff8803321e9e00 R08: ffff8803321e8000 R09: 00000000ffffffff
Aug 31 08:29:18 vps5 kernel: R10: ffff880001e15800 R11: ffff88032e4d96f0 R12: 000000000000001b
Aug 31 08:29:18 vps5 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: ffff8803321e0000
Aug 31 08:29:18 vps5 kernel: FS: 0000000000000000(0000) GS:ffff880001e00000(0000) knlGS:0000000000000000
Aug 31 08:29:18 vps5 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 31 08:29:18 vps5 kernel: CR2: 00000000004316d0 CR3: 000000062a42f000 CR4: 00000000000026e0
Aug 31 08:29:18 vps5 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 31 08:29:18 vps5 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 31 08:29:18 vps5 kernel: Process events/0 (pid: 51, threadinfo ffff8803321e8000, task ffff8803321e0000)
Aug 31 08:29:18 vps5 kernel: Stack:
Aug 31 08:29:18 vps5 kernel: ffff880001e15800 ffff8803321e0000 ffff8803321e9d70 ffff88004d2e0cc0
Aug 31 08:29:18 vps5 kernel: <0> ffff880001e15800 ffff88062fa01180 ffff880001e15800 0000000000000000
Aug 31 08:29:18 vps5 kernel: <0> ffff88062fa01180 ffff880001e15800 ffff8803321e9e20 ffffffff814b393d
Aug 31 08:29:18 vps5 kernel: Call Trace:
Aug 31 08:29:18 vps5 kernel: [<ffffffff814b393d>] ? schedule+0x58f/0x5f4
Aug 31 08:29:18 vps5 kernel: [<ffffffff814b554e>] ? common_interrupt+0xe/0x13
Aug 31 08:29:18 vps5 kernel: [<ffffffffa02610d3>] irqfd_inject+0x25/0x3a [kvm]
Aug 31 08:29:18 vps5 kernel: [<ffffffff8106739b>] worker_thread+0x1a9/0x24d
Aug 31 08:29:18 vps5 kernel: [<ffffffff814b393d>] ? schedule+0x58f/0x5f4
Aug 31 08:29:18 vps5 kernel: [<ffffffffa02610ae>] ? irqfd_inject+0x0/0x3a [kvm]
Aug 31 08:29:18 vps5 kernel: [<ffffffff8106b0d0>] ? autoremove_wake_function+0x0/0x3d
Aug 31 08:29:18 vps5 kernel: [<ffffffff810671f2>] ? worker_thread+0x0/0x24d
Aug 31 08:29:18 vps5 kernel: [<ffffffff8106abe8>] kthread+0x82/0x8a
Aug 31 08:29:18 vps5 kernel: [<ffffffff8100ab24>] kernel_thread_helper+0x4/0x10
Aug 31 08:29:18 vps5 kernel: [<ffffffff8106ab66>] ? kthread+0x0/0x8a
Aug 31 08:29:18 vps5 kernel: [<ffffffff8100ab20>] ? kernel_thread_helper+0x0/0x10
Aug 31 08:29:18 vps5 kernel: Code: 85 db 74 19 48 8b 7b 08 44 89 f1 44 89 ea 44 89 e6 ff 13 48 83 c3 10 48 83 3b 00 eb e5 48 8b 85 38 ff ff ff 48 8b 90 38 24 00 00 <44> 3b a2 28 01 00 00 72 0b 31 db 41 83 cc ff 45 31 ff eb 77 44
Aug 31 08:29:18 vps5 kernel: RIP [<ffffffffa02604d4>] kvm_set_irq+0x65/0x109 [kvm]
Aug 31 08:29:18 vps5 kernel: RSP <ffff8803321e9d20>
Aug 31 08:29:18 vps5 kernel: ---[ end trace 3085ba688ddd7b6a ]---




Aug 31 18:50:55 vps5 kernel: BUG: unable to handle kernel paging request at 0000000000002438
Aug 31 18:50:55 vps5 kernel: IP: [<ffffffffa021a4cd>] kvm_set_irq+0x5e/0x109 [kvm]
Aug 31 18:50:55 vps5 kernel: PGD 32ec01067 PUD 32ec02067 PMD 0
Aug 31 18:50:55 vps5 kernel: Oops: 0000 [#1] SMP
Aug 31 18:50:55 vps5 kernel: last sysfs file: /sys/kernel/mm/ksm/run
Aug 31 18:50:55 vps5 kernel: CPU 0
Aug 31 18:50:55 vps5 kernel: Modules linked in: vhost_net kvm_intel kvm ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp snd_pcm snd_timer snd soundcore tpm_tis tpm tpm_bios snd_page_alloc psmouse serio_raw i2c_i801 ioatdma pcspkr joydev dca i7core_edac edac_core usbhid hid megaraid_sas e1000e [last unloaded: scsi_wait_scan]
Aug 31 18:50:55 vps5 kernel:
Aug 31 18:50:55 vps5 kernel: Pid: 51, comm: events/0 Not tainted 2.6.35-1-pve #1 X8DTT-H/X8DTT-H
Aug 31 18:50:55 vps5 kernel: RIP: 0010:[<ffffffffa021a4cd>] [<ffffffffa021a4cd>] kvm_set_irq+0x5e/0x109 [kvm]
Aug 31 18:50:55 vps5 kernel: RSP: 0018:ffff880332211d20 EFLAGS: 00010246
Aug 31 18:50:55 vps5 kernel: RAX: 0000000000000000 RBX: ffff88023c7f5aa0 RCX: 0000000000000001
Aug 31 18:50:55 vps5 kernel: RDX: 000000000000001b RSI: 0000000000000000 RDI: 0000000000000000
Aug 31 18:50:55 vps5 kernel: RBP: ffff880332211e00 R08: ffff880332210000 R09: ffff8803303596f0
Aug 31 18:50:55 vps5 kernel: R10: ffff880001e15800 R11: ffff8803303596f0 R12: 000000000000001b
Aug 31 18:50:55 vps5 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: ffff880332208000
Aug 31 18:50:55 vps5 kernel: FS: 0000000000000000(0000) GS:ffff880001e00000(0000) knlGS:0000000000000000
Aug 31 18:50:55 vps5 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 31 18:50:55 vps5 kernel: CR2: 0000000000002438 CR3: 000000032ec00000 CR4: 00000000000026e0
Aug 31 18:50:55 vps5 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 31 18:50:55 vps5 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 31 18:50:55 vps5 kernel: Process events/0 (pid: 51, threadinfo ffff880332210000, task ffff880332208000)
Aug 31 18:50:55 vps5 kernel: Stack:
Aug 31 18:50:55 vps5 kernel: ffff880001e15800 ffff880332208000 ffff880332211d70 0000000000000000
Aug 31 18:50:55 vps5 kernel: <0> ffff880001e15800 ffff8803310d8e00 ffff880001e15800 0000000000000000
Aug 31 18:50:55 vps5 kernel: <0> ffff8803310d8e00 ffff880001e15800 ffff880332211e20 ffffffff814b393d
Aug 31 18:50:55 vps5 kernel: Call Trace:
Aug 31 18:50:55 vps5 kernel: [<ffffffff814b393d>] ? schedule+0x58f/0x5f4
Aug 31 18:50:55 vps5 kernel: [<ffffffff814b554e>] ? common_interrupt+0xe/0x13
Aug 31 18:50:55 vps5 kernel: [<ffffffffa021b0d3>] irqfd_inject+0x25/0x3a [kvm]
Aug 31 18:50:55 vps5 kernel: [<ffffffff8106739b>] worker_thread+0x1a9/0x24d
Aug 31 18:50:55 vps5 kernel: [<ffffffff814b393d>] ? schedule+0x58f/0x5f4
Aug 31 18:50:55 vps5 kernel: [<ffffffffa021b0ae>] ? irqfd_inject+0x0/0x3a [kvm]
Aug 31 18:50:55 vps5 kernel: [<ffffffff8106b0d0>] ? autoremove_wake_function+0x0/0x3d
Aug 31 18:50:55 vps5 kernel: [<ffffffff810671f2>] ? worker_thread+0x0/0x24d
Aug 31 18:50:55 vps5 kernel: [<ffffffff8106abe8>] kthread+0x82/0x8a
Aug 31 18:50:55 vps5 kernel: [<ffffffff8100ab24>] kernel_thread_helper+0x4/0x10
Aug 31 18:50:55 vps5 kernel: [<ffffffff8106ab66>] ? kthread+0x0/0x8a
Aug 31 18:50:55 vps5 kernel: [<ffffffff8100ab20>] ? kernel_thread_helper+0x0/0x10
Aug 31 18:50:55 vps5 kernel: Code: 8b 1d f8 91 02 00 48 85 db 74 19 48 8b 7b 08 44 89 f1 44 89 ea 44 89 e6 ff 13 48 83 c3 10 48 83 3b 00 eb e5 48 8b 85 38 ff ff ff <48> 8b 90 38 24 00 00 44 3b a2 28 01 00 00 72 0b 31 db 41 83 cc
Aug 31 18:50:55 vps5 kernel: RIP [<ffffffffa021a4cd>] kvm_set_irq+0x5e/0x109 [kvm]
Aug 31 18:50:55 vps5 kernel: RSP <ffff880332211d20>
Aug 31 18:50:55 vps5 kernel: CR2: 0000000000002438
Aug 31 18:50:55 vps5 kernel: ---[ end trace 7e6d3e149ac90278 ]---



vps5:/var/log# pveperf
CPU BOGOMIPS: 68268.86
REGEX/SECOND: 829450
HD SIZE: 94.49 GB (/dev/mapper/pve-root)
BUFFERED READS: 200.81 MB/sec
AVERAGE SEEK TIME: 5.62 ms
FSYNCS/SECOND: 1552.81
DNS EXT: 73.54 ms
DNS INT: 27.99 ms (lh.pl)
vps5:/var/log# pveversion -v
pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.8-11
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.35-1-pve: 2.6.35-11
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.28-1pve1
vzdump: 1.2-14
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.1-1
ksm-control-daemon: 1.0-6
vps5:/var/log# uname -a
Linux vps5 2.6.35-1-pve #1 SMP Tue May 10 14:14:39 CEST 2011 x86_64 GNU/Linux


Can anyone give me a hint to prevent these crashes?
 
I suggest you switch to 2.6.32 kernel (our new upcoming 2.6.32 kernel will also support KSM), better tested, more stable and include newer drivers.
 
I suggest you switch to 2.6.32 kernel (our new upcoming 2.6.32 kernel will also support KSM), better tested, more stable and include newer drivers.
Hmm,
i have five pve-hosts with 2.6.35er kernel and they run without trouble. A time ago the win-smp-io-performance was with 2.6.35 much better than with 2.6.32. Is this with the upcoming 2.6.32 also ok?

Udo
 
we plan to release a new 2.6.32 next week to our pvetest, so it would be nice if you can do such tests.
 
Hi,

Is it a problem with KSM ?
Aug 31 18:50:55 vps5 kernel: BUG: unable to handle kernel paging request at 0000000000002438
...

Aug 31 18:50:55 vps5 kernel: last sysfs file: /sys/kernel/mm/ksm/run

I am also running 4 pve hosts with 2.6.35-1 kernel without problem, but, as they don't use more than 50% of RAM, they don't use KSM....

Alain
 
It looks like that switching to stable 2.6.32 solved the problem. By the way, it is weird because I have 5 other identical servers running 2.6.35 with uptime > 100 days.
 
More or less the same here:

====
...
Sep 8 18:42:17 lxdmz4 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
Sep 8 18:42:17 lxdmz4 kernel: IP: [<ffffffffa02974d4>] kvm_set_irq+0x65/0x109 [kvm]
Sep 8 18:42:17 lxdmz4 kernel: PGD 602b8d067 PUD 602150067 PMD 0
Sep 8 18:42:17 lxdmz4 kernel: Oops: 0000 [#1] SMP
Sep 8 18:42:17 lxdmz4 kernel: last sysfs file: /sys/kernel/mm/ksm/run
Sep 8 18:42:17 lxdmz4 kernel: CPU 0
Sep 8 18:42:17 lxdmz4 kernel: Modules linked in: vhost_net kvm_intel kvm mptctl mptbase ipmi_devintf ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_i
scsi bridge 8021q garp stp bonding snd_pcm ipmi_si snd_timer snd soundcore tpm_tis tpm psmouse snd_page_alloc tpm_bios hpilo ipmi_msghandler pcspkr serio_raw i7core_edac edac_core shpchp bnx2x crc32c libcrc3
2c mdio power_meter usbhid hid cciss tg3 qla2xxx scsi_transport_fc scsi_tgt [last unloaded: scsi_wait_scan]
Sep 8 18:42:17 lxdmz4 kernel:
Sep 8 18:42:17 lxdmz4 kernel: Pid: 51, comm: events/0 Not tainted 2.6.35-1-pve #1 /ProLiant BL460c G6
Sep 8 18:42:17 lxdmz4 kernel: RIP: 0010:[<ffffffffa02974d4>] [<ffffffffa02974d4>] kvm_set_irq+0x65/0x109 [kvm]
Sep 8 18:42:17 lxdmz4 kernel: RSP: 0018:ffff880c04711d20 EFLAGS: 00010246
Sep 8 18:42:17 lxdmz4 kernel: RAX: ffff880b34fa3d40 RBX: ffff880b34fa3020 RCX: 0000000000000001
Sep 8 18:42:17 lxdmz4 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880b34fa3d40
Sep 8 18:42:17 lxdmz4 kernel: RBP: ffff880c04711e00 R08: ffff880c04710000 R09: 0000000000000000
Sep 8 18:42:17 lxdmz4 kernel: R10: ffff880638215800 R11: ffff8805f0b6fb50 R12: 000000000000001a
Sep 8 18:42:17 lxdmz4 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: ffff880c04708000
Sep 8 18:42:17 lxdmz4 kernel: FS: 0000000000000000(0000) GS:ffff880638200000(0000) knlGS:0000000000000000
Sep 8 18:42:17 lxdmz4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 8 18:42:17 lxdmz4 kernel: CR2: 0000000000000128 CR3: 00000005f0874000 CR4: 00000000000026e0
Sep 8 18:42:17 lxdmz4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 8 18:42:17 lxdmz4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 8 18:42:17 lxdmz4 kernel: Process events/0 (pid: 51, threadinfo ffff880c04710000, task ffff880c04708000)
Sep 8 18:42:17 lxdmz4 kernel: Stack:
Sep 8 18:42:17 lxdmz4 kernel: ffff880638215800 ffff880c04708000 ffff880c04711d70 ffff880b34fa3d40
Sep 8 18:42:17 lxdmz4 kernel: <0> ffff880638215800 ffff880603a69f80 ffff880638215800 0000000000000000
Sep 8 18:42:17 lxdmz4 kernel: <0> ffff880603a69f80 ffff880638215800 ffff880c04711e20 ffffffff814b294b
Sep 8 18:42:17 lxdmz4 kernel: Call Trace:
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff814b294b>] ? schedule+0x59d/0x602
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff814b454e>] ? common_interrupt+0xe/0x13
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffffa02980d3>] irqfd_inject+0x25/0x3a [kvm]
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff8106706b>] worker_thread+0x1a9/0x24d
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff814b294b>] ? schedule+0x59d/0x602
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffffa02980ae>] ? irqfd_inject+0x0/0x3a [kvm]
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff8106ada0>] ? autoremove_wake_function+0x0/0x3d
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff81066ec2>] ? worker_thread+0x0/0x24d
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff8106a8b8>] kthread+0x82/0x8a
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff8100ab24>] kernel_thread_helper+0x4/0x10
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff8106a836>] ? kthread+0x0/0x8a
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff8100ab20>] ? kernel_thread_helper+0x0/0x10
Sep 8 18:42:17 lxdmz4 kernel: Code: 85 db 74 19 48 8b 7b 08 44 89 f1 44 89 ea 44 89 e6 ff 13 48 83 c3 10 48 83 3b 00 eb e5 48 8b 85 38 ff ff ff 48 8b 90 38 24 00 00 <44> 3b a2 28 01 00 00 72 0b 31 db 41 83 cc ff 45 31 ff eb 77 44
Sep 8 18:42:17 lxdmz4 kernel: RIP [<ffffffffa02974d4>] kvm_set_irq+0x65/0x109 [kvm]
Sep 8 18:42:17 lxdmz4 kernel: RSP <ffff880c04711d20>
Sep 8 18:42:17 lxdmz4 kernel: CR2: 0000000000000128
Sep 8 18:42:17 lxdmz4 kernel: ---[ end trace 9a5dbc0410980395 ]---
...
=====================

Moving to 2.6.32-6 kernel from pvetest (2.6.32-42) (and updating also to pve and kvm latest packages) solved the problem.
Note: live migration did not work between upgraded nodes and not upgraded ones, so I had to a stop of all VMs.

bye,
rob
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!