Proxmox server died - lots of "__ratelimit" in kernel log

yatesco

Well-Known Member
Sep 25, 2009
211
5
58
Hi all,

My proxmox server became unresponsive for a few minutes and the VMs weren't working. When I managed to log back on (after 10 minutes or so), /var/log/messages was full of:

Code:
Jul 18 13:59:45 pvemaster kernel: __ratelimit: 3204726 callbacks suppressed
Jul 18 13:59:50 pvemaster kernel: __ratelimit: 3268999 callbacks suppressed
Jul 18 13:59:55 pvemaster kernel: __ratelimit: 3194078 callbacks suppressed
Jul 18 14:00:00 pvemaster kernel: __ratelimit: 3179136 callbacks suppressed
Jul 18 14:00:05 pvemaster kernel: __ratelimit: 3137230 callbacks suppressed
Jul 18 14:00:10 pvemaster kernel: __ratelimit: 3359240 callbacks suppressed
Jul 18 14:00:15 pvemaster kernel: __ratelimit: 3346463 callbacks suppressed
Jul 18 14:00:20 pvemaster kernel: __ratelimit: 3169146 callbacks suppressed
Jul 18 14:00:25 pvemaster kernel: __ratelimit: 2857914 callbacks suppressed
etc. for hours

Then I got a:

Code:
Jul 18 17:24:40 pvemaster kernel: kjournald starting.  Commit interval 5 seconds
Jul 18 17:24:40 pvemaster kernel: EXT3 FS on dm-83, internal journal
Jul 18 17:24:40 pvemaster kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 18 17:28:00 pvemaster kernel: kjournald starting.  Commit interval 5 seconds
Jul 18 17:28:00 pvemaster kernel: EXT3 FS on dm-83, internal journal
Jul 18 17:28:00 pvemaster kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 18 17:32:41 pvemaster kernel: kvm           D 000000004020ae46     0 14111      1 0x00000000
Jul 18 17:32:41 pvemaster kernel: ffff88041cdedbc8 0000000000000082 0000000000000000 ffff8801faf12860
Jul 18 17:32:41 pvemaster kernel: 0000000000000001 000ffffffffff000 ffff88041cdedb98 000fffffffe00000
Jul 18 17:32:41 pvemaster kernel: ffff88041cdedbf8 000000000000fb08 ffff88041cdedfd8 ffff8804d792c4d0
Jul 18 17:32:41 pvemaster kernel: Call Trace:
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8156fed5>] __down_write_nested+0x95/0xd0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8156ff1b>] __down_write+0xb/0x10
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8156f11e>] down_write+0x1e/0x30
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01bc414>] kvm_set_memory_region+0x34/0x70 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01bc46d>] kvm_vm_ioctl_set_memory_region+0x1d/0x30 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01bd650>] kvm_vm_ioctl+0x400/0xf60 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa021428e>] ? vmx_vcpu_put+0xe/0x10 [kvm_intel]
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01c090b>] ? kvm_arch_vcpu_put+0x1b/0x50 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01bcd3b>] ? vcpu_put+0x2b/0x40 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01cd957>] ? kvm_arch_vcpu_ioctl_run+0x3c7/0xd50 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81096843>] ? futex_wake+0x123/0x140
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01ba1a0>] ? kvm_vcpu_ioctl+0x160/0x640 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8104c9e6>] ? update_stats_wait_end+0xb6/0xf0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff810117dd>] ? __switch_to+0xcd/0x320
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81154486>] vfs_ioctl+0x36/0xb0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8115462a>] do_vfs_ioctl+0x8a/0x5b0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff810999f9>] ? sys_futex+0x89/0x160
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81154bf1>] sys_ioctl+0xa1/0xb0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff810131f2>] system_call_fastpath+0x16/0x1b
Jul 18 17:32:41 pvemaster kernel: kvm           D ffffffffa01bb342     0 14113      1 0x00000000
Jul 18 17:32:41 pvemaster kernel: ffff88056c68dc78 0000000000000082 0000000000000000 00007fff0f5b6e50
Jul 18 17:32:41 pvemaster kernel: ffff88056c68dc98 ffffffff8156d728 ffff88056c68dbf8 000000000016fbf9
Jul 18 17:32:41 pvemaster kernel: 0000000000000000 000000000000fb08 ffff88056c68dfd8 ffff8804d7928000
Jul 18 17:32:41 pvemaster kernel: Call Trace:
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8156d728>] ? thread_return+0x51/0x6d9
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8156ffb5>] __down_read+0x95/0xce
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8156f14e>] down_read+0x1e/0x30
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01cdf94>] kvm_arch_vcpu_ioctl_run+0xa04/0xd50 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81096843>] ? futex_wake+0x123/0x140
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8113580a>] ? kfree+0xca/0x110
Jul 18 17:32:41 pvemaster kernel: [<ffffffffa01ba332>] kvm_vcpu_ioctl+0x2f2/0x640 [kvm]
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8101195d>] ? __switch_to+0x24d/0x320
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81013cee>] ? apic_timer_interrupt+0xe/0x20
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81154486>] vfs_ioctl+0x36/0xb0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8115460f>] ? do_vfs_ioctl+0x6f/0x5b0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff8115462a>] do_vfs_ioctl+0x8a/0x5b0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff810999f9>] ? sys_futex+0x89/0x160
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81154bf1>] sys_ioctl+0xa1/0xb0
Jul 18 17:32:41 pvemaster kernel: [<ffffffff81143fd7>] ? sys_lseek+0x57/0x90
Jul 18 17:32:41 pvemaster kernel: [<ffffffff810131f2>] system_call_fastpath+0x16/0x1b

pveversion -v:
Code:
pve-manager: 1.5-10 (pve-manager/1.5/4822)
running kernel: 2.6.32-2-pve
proxmox-ve-2.6.32: 1.5-7
pve-kernel-2.6.32-2-pve: 2.6.32-7
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-16
pve-firmware: 1.0-5
libpve-storage-perl: 1.0-13
vncterm: 0.9-2
vzctl: 3.0.23-1pve11
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.12.4-1
ksm-control-daemon: 1.0-3

The web site is still down, but I can SSH into the box now. This is very bad news as this is my production server and the master for the cluster.

Help :)
 
The only additional thing I was doing was doing a full VM backup on all three nodes to a remote NFS server.
 
I just had the same problem
It occured during backup to nfs share

The thing is that 192.168.0.83 is address of localhost
here is output of mount | grep backup
Code:
192.168.0.83:/vms/backup on /mnt/pve/kraz_backup type nfs (rw,addr=192.168.0.83)
here is contents of /etc/exports file
Code:
/vms/backup        192.168.0.0/24(rw,sync,no_root_squash)

pveversion -v
Code:
pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-47
pve-kernel-2.6.32-6-pve: 2.6.32-47
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-2pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6
log files attached
View attachment debug.txt
View attachment messages.txt.zip
View attachment dmesg.txt.zip
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!