Hi all,
My proxmox server became unresponsive for a few minutes and the VMs weren't working. When I managed to log back on (after 10 minutes or so), /var/log/messages was full of:
Then I got a:
pveversion -v:
The web site is still down, but I can SSH into the box now. This is very bad news as this is my production server and the master for the cluster.
Help
My proxmox server became unresponsive for a few minutes and the VMs weren't working. When I managed to log back on (after 10 minutes or so), /var/log/messages was full of:
Code:
Jul 18 13:59:45 pvemaster kernel: __ratelimit: 3204726 callbacks suppressed
Jul 18 13:59:50 pvemaster kernel: __ratelimit: 3268999 callbacks suppressed
Jul 18 13:59:55 pvemaster kernel: __ratelimit: 3194078 callbacks suppressed
Jul 18 14:00:00 pvemaster kernel: __ratelimit: 3179136 callbacks suppressed
Jul 18 14:00:05 pvemaster kernel: __ratelimit: 3137230 callbacks suppressed
Jul 18 14:00:10 pvemaster kernel: __ratelimit: 3359240 callbacks suppressed
Jul 18 14:00:15 pvemaster kernel: __ratelimit: 3346463 callbacks suppressed
Jul 18 14:00:20 pvemaster kernel: __ratelimit: 3169146 callbacks suppressed
Jul 18 14:00:25 pvemaster kernel: __ratelimit: 2857914 callbacks suppressed
etc. for hours
Then I got a:
Code:
Jul 18 17:24:40 pvemaster kernel: kjournald starting. Commit interval 5 seconds
Jul 18 17:24:40 pvemaster kernel: EXT3 FS on dm-83, internal journal
Jul 18 17:24:40 pvemaster kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 18 17:28:00 pvemaster kernel: kjournald starting. Commit interval 5 seconds
Jul 18 17:28:00 pvemaster kernel: EXT3 FS on dm-83, internal journal
Jul 18 17:28:00 pvemaster kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 18 17:32:41 pvemaster kernel: kvm D 000000004020ae46 0 14111 1 0x00000000
Jul 18 17:32:41 pvemaster kernel: ffff88041cdedbc8 0000000000000082 0000000000000000 ffff8801faf12860
Jul 18 17:32:41 pvemaster kernel: 0000000000000001 000ffffffffff000 ffff88041cdedb98 000fffffffe00000
Jul 18 17:32:41 pvemaster kernel: ffff88041cdedbf8 000000000000fb08 ffff88041cdedfd8 ffff8804d792c4d0
Jul 18 17:32:41 pvemaster kernel: Call Trace:
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8156fed5>] __down_write_nested+0x95/0xd0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8156ff1b>] __down_write+0xb/0x10
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8156f11e>] down_write+0x1e/0x30
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01bc414>] kvm_set_memory_region+0x34/0x70 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01bc46d>] kvm_vm_ioctl_set_memory_region+0x1d/0x30 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01bd650>] kvm_vm_ioctl+0x400/0xf60 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa021428e>] ? vmx_vcpu_put+0xe/0x10 [kvm_intel]
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01c090b>] ? kvm_arch_vcpu_put+0x1b/0x50 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01bcd3b>] ? vcpu_put+0x2b/0x40 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01cd957>] ? kvm_arch_vcpu_ioctl_run+0x3c7/0xd50 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81096843>] ? futex_wake+0x123/0x140
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01ba1a0>] ? kvm_vcpu_ioctl+0x160/0x640 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8104c9e6>] ? update_stats_wait_end+0xb6/0xf0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff810117dd>] ? __switch_to+0xcd/0x320
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81154486>] vfs_ioctl+0x36/0xb0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8115462a>] do_vfs_ioctl+0x8a/0x5b0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff810999f9>] ? sys_futex+0x89/0x160
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81154bf1>] sys_ioctl+0xa1/0xb0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff810131f2>] system_call_fastpath+0x16/0x1b
Jul 18 17:32:41 pvemaster kernel: kvm D ffffffffa01bb342 0 14113 1 0x00000000
Jul 18 17:32:41 pvemaster kernel: ffff88056c68dc78 0000000000000082 0000000000000000 00007fff0f5b6e50
Jul 18 17:32:41 pvemaster kernel: ffff88056c68dc98 ffffffff8156d728 ffff88056c68dbf8 000000000016fbf9
Jul 18 17:32:41 pvemaster kernel: 0000000000000000 000000000000fb08 ffff88056c68dfd8 ffff8804d7928000
Jul 18 17:32:41 pvemaster kernel: Call Trace:
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8156d728>] ? thread_return+0x51/0x6d9
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8156ffb5>] __down_read+0x95/0xce
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8156f14e>] down_read+0x1e/0x30
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01cdf94>] kvm_arch_vcpu_ioctl_run+0xa04/0xd50 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81096843>] ? futex_wake+0x123/0x140
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8113580a>] ? kfree+0xca/0x110
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01ba332>] kvm_vcpu_ioctl+0x2f2/0x640 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8101195d>] ? __switch_to+0x24d/0x320
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81013cee>] ? apic_timer_interrupt+0xe/0x20
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81154486>] vfs_ioctl+0x36/0xb0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8115460f>] ? do_vfs_ioctl+0x6f/0x5b0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8115462a>] do_vfs_ioctl+0x8a/0x5b0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff810999f9>] ? sys_futex+0x89/0x160
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81154bf1>] sys_ioctl+0xa1/0xb0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81143fd7>] ? sys_lseek+0x57/0x90
Jul 18 17:32:41 pvemaster kernel: [<ffffffff810131f2>] system_call_fastpath+0x16/0x1b
pveversion -v:
Code:
pve-manager: 1.5-10 (pve-manager/1.5/4822)
running kernel: 2.6.32-2-pve
proxmox-ve-2.6.32: 1.5-7
pve-kernel-2.6.32-2-pve: 2.6.32-7
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-16
pve-firmware: 1.0-5
libpve-storage-perl: 1.0-13
vncterm: 0.9-2
vzctl: 3.0.23-1pve11
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.12.4-1
ksm-control-daemon: 1.0-3
The web site is still down, but I can SSH into the box now. This is very bad news as this is my production server and the master for the cluster.
Help