Proxmox crash

Marek Panek

New Member
Dec 11, 2010
7
0
1
Hi,

today I encountered two crashes in one machine:

Aug 31 08:29:18 vps5 kernel: general protection fault: 0000 [#1] SMP
Aug 31 08:29:18 vps5 kernel: last sysfs file: /sys/kernel/mm/ksm/run
Aug 31 08:29:18 vps5 kernel: CPU 0
Aug 31 08:29:18 vps5 kernel: Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc vhost_net kvm_intel kvm ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp snd_pcm snd_timer snd soundcore psmouse tpm_tis snd_page_alloc tpm tpm_bios serio_raw i2c_i801 i7core_edac pcspkr edac_core joydev ioatdma dca usbhid hid megaraid_sas e1000e [last unloaded: scsi_wait_scan]
Aug 31 08:29:18 vps5 kernel:
Aug 31 08:29:18 vps5 kernel: Pid: 51, comm: events/0 Not tainted 2.6.35-1-pve #1 X8DTT-H/X8DTT-H
Aug 31 08:29:18 vps5 kernel: RIP: 0010:[<ffffffffa02604d4>] [<ffffffffa02604d4>] kvm_set_irq+0x65/0x109 [kvm]
Aug 31 08:29:18 vps5 kernel: RSP: 0018:ffff8803321e9d20 EFLAGS: 00010246
Aug 31 08:29:18 vps5 kernel: RAX: ffff88004d2e0cc0 RBX: ffff88004d2e0960 RCX: 0000000000000001
Aug 31 08:29:18 vps5 kernel: RDX: 2f736b6e696c6572 RSI: 0000000000000000 RDI: ffff88004d2e0cc0
Aug 31 08:29:18 vps5 kernel: RBP: ffff8803321e9e00 R08: ffff8803321e8000 R09: 00000000ffffffff
Aug 31 08:29:18 vps5 kernel: R10: ffff880001e15800 R11: ffff88032e4d96f0 R12: 000000000000001b
Aug 31 08:29:18 vps5 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: ffff8803321e0000
Aug 31 08:29:18 vps5 kernel: FS: 0000000000000000(0000) GS:ffff880001e00000(0000) knlGS:0000000000000000
Aug 31 08:29:18 vps5 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 31 08:29:18 vps5 kernel: CR2: 00000000004316d0 CR3: 000000062a42f000 CR4: 00000000000026e0
Aug 31 08:29:18 vps5 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 31 08:29:18 vps5 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 31 08:29:18 vps5 kernel: Process events/0 (pid: 51, threadinfo ffff8803321e8000, task ffff8803321e0000)
Aug 31 08:29:18 vps5 kernel: Stack:
Aug 31 08:29:18 vps5 kernel: ffff880001e15800 ffff8803321e0000 ffff8803321e9d70 ffff88004d2e0cc0
Aug 31 08:29:18 vps5 kernel: <0> ffff880001e15800 ffff88062fa01180 ffff880001e15800 0000000000000000
Aug 31 08:29:18 vps5 kernel: <0> ffff88062fa01180 ffff880001e15800 ffff8803321e9e20 ffffffff814b393d
Aug 31 08:29:18 vps5 kernel: Call Trace:
Aug 31 08:29:18 vps5 kernel: [<ffffffff814b393d>] ? schedule+0x58f/0x5f4
Aug 31 08:29:18 vps5 kernel: [<ffffffff814b554e>] ? common_interrupt+0xe/0x13
Aug 31 08:29:18 vps5 kernel: [<ffffffffa02610d3>] irqfd_inject+0x25/0x3a [kvm]
Aug 31 08:29:18 vps5 kernel: [<ffffffff8106739b>] worker_thread+0x1a9/0x24d
Aug 31 08:29:18 vps5 kernel: [<ffffffff814b393d>] ? schedule+0x58f/0x5f4
Aug 31 08:29:18 vps5 kernel: [<ffffffffa02610ae>] ? irqfd_inject+0x0/0x3a [kvm]
Aug 31 08:29:18 vps5 kernel: [<ffffffff8106b0d0>] ? autoremove_wake_function+0x0/0x3d
Aug 31 08:29:18 vps5 kernel: [<ffffffff810671f2>] ? worker_thread+0x0/0x24d
Aug 31 08:29:18 vps5 kernel: [<ffffffff8106abe8>] kthread+0x82/0x8a
Aug 31 08:29:18 vps5 kernel: [<ffffffff8100ab24>] kernel_thread_helper+0x4/0x10
Aug 31 08:29:18 vps5 kernel: [<ffffffff8106ab66>] ? kthread+0x0/0x8a
Aug 31 08:29:18 vps5 kernel: [<ffffffff8100ab20>] ? kernel_thread_helper+0x0/0x10
Aug 31 08:29:18 vps5 kernel: Code: 85 db 74 19 48 8b 7b 08 44 89 f1 44 89 ea 44 89 e6 ff 13 48 83 c3 10 48 83 3b 00 eb e5 48 8b 85 38 ff ff ff 48 8b 90 38 24 00 00 <44> 3b a2 28 01 00 00 72 0b 31 db 41 83 cc ff 45 31 ff eb 77 44
Aug 31 08:29:18 vps5 kernel: RIP [<ffffffffa02604d4>] kvm_set_irq+0x65/0x109 [kvm]
Aug 31 08:29:18 vps5 kernel: RSP <ffff8803321e9d20>
Aug 31 08:29:18 vps5 kernel: ---[ end trace 3085ba688ddd7b6a ]---




Aug 31 18:50:55 vps5 kernel: BUG: unable to handle kernel paging request at 0000000000002438
Aug 31 18:50:55 vps5 kernel: IP: [<ffffffffa021a4cd>] kvm_set_irq+0x5e/0x109 [kvm]
Aug 31 18:50:55 vps5 kernel: PGD 32ec01067 PUD 32ec02067 PMD 0
Aug 31 18:50:55 vps5 kernel: Oops: 0000 [#1] SMP
Aug 31 18:50:55 vps5 kernel: last sysfs file: /sys/kernel/mm/ksm/run
Aug 31 18:50:55 vps5 kernel: CPU 0
Aug 31 18:50:55 vps5 kernel: Modules linked in: vhost_net kvm_intel kvm ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp snd_pcm snd_timer snd soundcore tpm_tis tpm tpm_bios snd_page_alloc psmouse serio_raw i2c_i801 ioatdma pcspkr joydev dca i7core_edac edac_core usbhid hid megaraid_sas e1000e [last unloaded: scsi_wait_scan]
Aug 31 18:50:55 vps5 kernel:
Aug 31 18:50:55 vps5 kernel: Pid: 51, comm: events/0 Not tainted 2.6.35-1-pve #1 X8DTT-H/X8DTT-H
Aug 31 18:50:55 vps5 kernel: RIP: 0010:[<ffffffffa021a4cd>] [<ffffffffa021a4cd>] kvm_set_irq+0x5e/0x109 [kvm]
Aug 31 18:50:55 vps5 kernel: RSP: 0018:ffff880332211d20 EFLAGS: 00010246
Aug 31 18:50:55 vps5 kernel: RAX: 0000000000000000 RBX: ffff88023c7f5aa0 RCX: 0000000000000001
Aug 31 18:50:55 vps5 kernel: RDX: 000000000000001b RSI: 0000000000000000 RDI: 0000000000000000
Aug 31 18:50:55 vps5 kernel: RBP: ffff880332211e00 R08: ffff880332210000 R09: ffff8803303596f0
Aug 31 18:50:55 vps5 kernel: R10: ffff880001e15800 R11: ffff8803303596f0 R12: 000000000000001b
Aug 31 18:50:55 vps5 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: ffff880332208000
Aug 31 18:50:55 vps5 kernel: FS: 0000000000000000(0000) GS:ffff880001e00000(0000) knlGS:0000000000000000
Aug 31 18:50:55 vps5 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 31 18:50:55 vps5 kernel: CR2: 0000000000002438 CR3: 000000032ec00000 CR4: 00000000000026e0
Aug 31 18:50:55 vps5 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 31 18:50:55 vps5 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 31 18:50:55 vps5 kernel: Process events/0 (pid: 51, threadinfo ffff880332210000, task ffff880332208000)
Aug 31 18:50:55 vps5 kernel: Stack:
Aug 31 18:50:55 vps5 kernel: ffff880001e15800 ffff880332208000 ffff880332211d70 0000000000000000
Aug 31 18:50:55 vps5 kernel: <0> ffff880001e15800 ffff8803310d8e00 ffff880001e15800 0000000000000000
Aug 31 18:50:55 vps5 kernel: <0> ffff8803310d8e00 ffff880001e15800 ffff880332211e20 ffffffff814b393d
Aug 31 18:50:55 vps5 kernel: Call Trace:
Aug 31 18:50:55 vps5 kernel: [<ffffffff814b393d>] ? schedule+0x58f/0x5f4
Aug 31 18:50:55 vps5 kernel: [<ffffffff814b554e>] ? common_interrupt+0xe/0x13
Aug 31 18:50:55 vps5 kernel: [<ffffffffa021b0d3>] irqfd_inject+0x25/0x3a [kvm]
Aug 31 18:50:55 vps5 kernel: [<ffffffff8106739b>] worker_thread+0x1a9/0x24d
Aug 31 18:50:55 vps5 kernel: [<ffffffff814b393d>] ? schedule+0x58f/0x5f4
Aug 31 18:50:55 vps5 kernel: [<ffffffffa021b0ae>] ? irqfd_inject+0x0/0x3a [kvm]
Aug 31 18:50:55 vps5 kernel: [<ffffffff8106b0d0>] ? autoremove_wake_function+0x0/0x3d
Aug 31 18:50:55 vps5 kernel: [<ffffffff810671f2>] ? worker_thread+0x0/0x24d
Aug 31 18:50:55 vps5 kernel: [<ffffffff8106abe8>] kthread+0x82/0x8a
Aug 31 18:50:55 vps5 kernel: [<ffffffff8100ab24>] kernel_thread_helper+0x4/0x10
Aug 31 18:50:55 vps5 kernel: [<ffffffff8106ab66>] ? kthread+0x0/0x8a
Aug 31 18:50:55 vps5 kernel: [<ffffffff8100ab20>] ? kernel_thread_helper+0x0/0x10
Aug 31 18:50:55 vps5 kernel: Code: 8b 1d f8 91 02 00 48 85 db 74 19 48 8b 7b 08 44 89 f1 44 89 ea 44 89 e6 ff 13 48 83 c3 10 48 83 3b 00 eb e5 48 8b 85 38 ff ff ff <48> 8b 90 38 24 00 00 44 3b a2 28 01 00 00 72 0b 31 db 41 83 cc
Aug 31 18:50:55 vps5 kernel: RIP [<ffffffffa021a4cd>] kvm_set_irq+0x5e/0x109 [kvm]
Aug 31 18:50:55 vps5 kernel: RSP <ffff880332211d20>
Aug 31 18:50:55 vps5 kernel: CR2: 0000000000002438
Aug 31 18:50:55 vps5 kernel: ---[ end trace 7e6d3e149ac90278 ]---



vps5:/var/log# pveperf
CPU BOGOMIPS: 68268.86
REGEX/SECOND: 829450
HD SIZE: 94.49 GB (/dev/mapper/pve-root)
BUFFERED READS: 200.81 MB/sec
AVERAGE SEEK TIME: 5.62 ms
FSYNCS/SECOND: 1552.81
DNS EXT: 73.54 ms
DNS INT: 27.99 ms (lh.pl)
vps5:/var/log# pveversion -v
pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.8-11
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.35-1-pve: 2.6.35-11
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.28-1pve1
vzdump: 1.2-14
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.1-1
ksm-control-daemon: 1.0-6
vps5:/var/log# uname -a
Linux vps5 2.6.35-1-pve #1 SMP Tue May 10 14:14:39 CEST 2011 x86_64 GNU/Linux


Can anyone give me a hint to prevent these crashes?
 
I suggest you switch to 2.6.32 kernel (our new upcoming 2.6.32 kernel will also support KSM), better tested, more stable and include newer drivers.
 
I suggest you switch to 2.6.32 kernel (our new upcoming 2.6.32 kernel will also support KSM), better tested, more stable and include newer drivers.
Hmm,
i have five pve-hosts with 2.6.35er kernel and they run without trouble. A time ago the win-smp-io-performance was with 2.6.35 much better than with 2.6.32. Is this with the upcoming 2.6.32 also ok?

Udo
 
we plan to release a new 2.6.32 next week to our pvetest, so it would be nice if you can do such tests.
 
Hi,

Is it a problem with KSM ?
Aug 31 18:50:55 vps5 kernel: BUG: unable to handle kernel paging request at 0000000000002438
...

Aug 31 18:50:55 vps5 kernel: last sysfs file: /sys/kernel/mm/ksm/run

I am also running 4 pve hosts with 2.6.35-1 kernel without problem, but, as they don't use more than 50% of RAM, they don't use KSM....

Alain
 
It looks like that switching to stable 2.6.32 solved the problem. By the way, it is weird because I have 5 other identical servers running 2.6.35 with uptime > 100 days.
 
More or less the same here:

====
...
Sep 8 18:42:17 lxdmz4 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
Sep 8 18:42:17 lxdmz4 kernel: IP: [<ffffffffa02974d4>] kvm_set_irq+0x65/0x109 [kvm]
Sep 8 18:42:17 lxdmz4 kernel: PGD 602b8d067 PUD 602150067 PMD 0
Sep 8 18:42:17 lxdmz4 kernel: Oops: 0000 [#1] SMP
Sep 8 18:42:17 lxdmz4 kernel: last sysfs file: /sys/kernel/mm/ksm/run
Sep 8 18:42:17 lxdmz4 kernel: CPU 0
Sep 8 18:42:17 lxdmz4 kernel: Modules linked in: vhost_net kvm_intel kvm mptctl mptbase ipmi_devintf ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_i
scsi bridge 8021q garp stp bonding snd_pcm ipmi_si snd_timer snd soundcore tpm_tis tpm psmouse snd_page_alloc tpm_bios hpilo ipmi_msghandler pcspkr serio_raw i7core_edac edac_core shpchp bnx2x crc32c libcrc3
2c mdio power_meter usbhid hid cciss tg3 qla2xxx scsi_transport_fc scsi_tgt [last unloaded: scsi_wait_scan]
Sep 8 18:42:17 lxdmz4 kernel:
Sep 8 18:42:17 lxdmz4 kernel: Pid: 51, comm: events/0 Not tainted 2.6.35-1-pve #1 /ProLiant BL460c G6
Sep 8 18:42:17 lxdmz4 kernel: RIP: 0010:[<ffffffffa02974d4>] [<ffffffffa02974d4>] kvm_set_irq+0x65/0x109 [kvm]
Sep 8 18:42:17 lxdmz4 kernel: RSP: 0018:ffff880c04711d20 EFLAGS: 00010246
Sep 8 18:42:17 lxdmz4 kernel: RAX: ffff880b34fa3d40 RBX: ffff880b34fa3020 RCX: 0000000000000001
Sep 8 18:42:17 lxdmz4 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880b34fa3d40
Sep 8 18:42:17 lxdmz4 kernel: RBP: ffff880c04711e00 R08: ffff880c04710000 R09: 0000000000000000
Sep 8 18:42:17 lxdmz4 kernel: R10: ffff880638215800 R11: ffff8805f0b6fb50 R12: 000000000000001a
Sep 8 18:42:17 lxdmz4 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: ffff880c04708000
Sep 8 18:42:17 lxdmz4 kernel: FS: 0000000000000000(0000) GS:ffff880638200000(0000) knlGS:0000000000000000
Sep 8 18:42:17 lxdmz4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 8 18:42:17 lxdmz4 kernel: CR2: 0000000000000128 CR3: 00000005f0874000 CR4: 00000000000026e0
Sep 8 18:42:17 lxdmz4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 8 18:42:17 lxdmz4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 8 18:42:17 lxdmz4 kernel: Process events/0 (pid: 51, threadinfo ffff880c04710000, task ffff880c04708000)
Sep 8 18:42:17 lxdmz4 kernel: Stack:
Sep 8 18:42:17 lxdmz4 kernel: ffff880638215800 ffff880c04708000 ffff880c04711d70 ffff880b34fa3d40
Sep 8 18:42:17 lxdmz4 kernel: <0> ffff880638215800 ffff880603a69f80 ffff880638215800 0000000000000000
Sep 8 18:42:17 lxdmz4 kernel: <0> ffff880603a69f80 ffff880638215800 ffff880c04711e20 ffffffff814b294b
Sep 8 18:42:17 lxdmz4 kernel: Call Trace:
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff814b294b>] ? schedule+0x59d/0x602
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff814b454e>] ? common_interrupt+0xe/0x13
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffffa02980d3>] irqfd_inject+0x25/0x3a [kvm]
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff8106706b>] worker_thread+0x1a9/0x24d
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff814b294b>] ? schedule+0x59d/0x602
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffffa02980ae>] ? irqfd_inject+0x0/0x3a [kvm]
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff8106ada0>] ? autoremove_wake_function+0x0/0x3d
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff81066ec2>] ? worker_thread+0x0/0x24d
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff8106a8b8>] kthread+0x82/0x8a
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff8100ab24>] kernel_thread_helper+0x4/0x10
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff8106a836>] ? kthread+0x0/0x8a
Sep 8 18:42:17 lxdmz4 kernel: [<ffffffff8100ab20>] ? kernel_thread_helper+0x0/0x10
Sep 8 18:42:17 lxdmz4 kernel: Code: 85 db 74 19 48 8b 7b 08 44 89 f1 44 89 ea 44 89 e6 ff 13 48 83 c3 10 48 83 3b 00 eb e5 48 8b 85 38 ff ff ff 48 8b 90 38 24 00 00 <44> 3b a2 28 01 00 00 72 0b 31 db 41 83 cc ff 45 31 ff eb 77 44
Sep 8 18:42:17 lxdmz4 kernel: RIP [<ffffffffa02974d4>] kvm_set_irq+0x65/0x109 [kvm]
Sep 8 18:42:17 lxdmz4 kernel: RSP <ffff880c04711d20>
Sep 8 18:42:17 lxdmz4 kernel: CR2: 0000000000000128
Sep 8 18:42:17 lxdmz4 kernel: ---[ end trace 9a5dbc0410980395 ]---
...
=====================

Moving to 2.6.32-6 kernel from pvetest (2.6.32-42) (and updating also to pve and kvm latest packages) solved the problem.
Note: live migration did not work between upgraded nodes and not upgraded ones, so I had to a stop of all VMs.

bye,
rob