Hello, I have seen this problem happening two times on two different and unrelated servers.
Here is my configuration:
Dedicated server with proxmox 4.2
only 1 VM running inside, with Debian 8.4, that takes ~80% RAM and 6 of 8 physical cores.
At 3:00 AM a NFS backup starts; after that I can no longer reach the website hosted on the VM.
Symptoms are these:
-zabbix agent inside the VM reports low cpu usage.
-ram is not full
-proxmox server views 100% for each core (so 600% CPU usage)
-no KVM connection possible
-no TCP connection possible (network is natted and I used KVM port forwarding)
-after KVM stop and restart, everything is back to normal.
In my log I see lines like:
Jun 3 10:24:10 ns326125 kernel: [ 480.345124] lzop D ffff88082904fa58 0 1950 1926 0x00000000
Jun 3 10:24:10 ns326125 kernel: [ 480.345127] ffff88082904fa58 ffffffffffffffff ffff88085b9c1b80 ffff88084d15d280
Jun 3 10:24:10 ns326125 kernel: [ 480.345129] ffff880829050000 ffff88087fc97180 7fffffffffffffff ffffffff81845170
Jun 3 10:24:10 ns326125 kernel: [ 480.345130] ffff88082904fbb0 ffff88082904fa70 ffffffff81844885 0000000000000000
Jun 3 10:24:10 ns326125 kernel: [ 480.345131] Call Trace:
Jun 3 10:24:10 ns326125 kernel: [ 480.345137] [<ffffffff81845170>] ? bit_wait_timeout+0xa0/0xa0
Jun 3 10:24:10 ns326125 kernel: [ 480.345139] [<ffffffff81844885>] schedule+0x35/0x80
Jun 3 10:24:10 ns326125 kernel: [ 480.345141] [<ffffffff81847ac5>] schedule_timeout+0x235/0x2d0
Jun 3 10:24:10 ns326125 kernel: [ 480.345143] [<ffffffff811eadf1>] ? ___slab_alloc+0x341/0x430
Jun 3 10:24:10 ns326125 kernel: [ 480.345146] [<ffffffff810f568c>] ? ktime_get+0x3c/0xb0
Jun 3 10:24:10 ns326125 kernel: [ 480.345147] [<ffffffff81845170>] ? bit_wait_timeout+0xa0/0xa0
Jun 3 10:24:10 ns326125 kernel: [ 480.345149] [<ffffffff81843d9b>] io_schedule_timeout+0xbb/0x140
Jun 3 10:24:10 ns326125 kernel: [ 480.345150] [<ffffffff8184518b>] bit_wait_io+0x1b/0x70
Jun 3 10:24:10 ns326125 kernel: [ 480.345151] [<ffffffff81844c3f>] __wait_on_bit+0x5f/0x90
Jun 3 10:24:10 ns326125 kernel: [ 480.345153] [<ffffffff81845170>] ? bit_wait_timeout+0xa0/0xa0
Jun 3 10:24:10 ns326125 kernel: [ 480.345154] [<ffffffff81844cf1>] out_of_line_wait_on_bit+0x81/0xb0
Jun 3 10:24:10 ns326125 kernel: [ 480.345157] [<ffffffff810c3dc0>] ? autoremove_wake_function+0x40/0x40
Jun 3 10:24:10 ns326125 kernel: [ 480.345168] [<ffffffffc08f7c94>] nfs_wait_on_request+0x34/0x40 [nfs]
Jun 3 10:24:10 ns326125 kernel: [ 480.345174] [<ffffffffc08fc9ce>] nfs_updatepage+0x15e/0x920 [nfs]
Jun 3 10:24:10 ns326125 kernel: [ 480.345179] [<ffffffffc08ecdd4>] nfs_write_end+0x154/0x500 [nfs]
Jun 3 10:24:10 ns326125 kernel: [ 480.345181] [<ffffffff813fed9f>] ? iov_iter_copy_from_user_atomic+0x8f/0x230
Jun 3 10:24:10 ns326125 kernel: [ 480.345184] [<ffffffff8118cf94>] generic_perform_write+0x114/0x1c0
Jun 3 10:24:10 ns326125 kernel: [ 480.345186] [<ffffffff8118f126>] __generic_file_write_iter+0x1a6/0x1f0
Jun 3 10:24:10 ns326125 kernel: [ 480.345188] [<ffffffff812297b3>] ? touch_atime+0x33/0xd0
Jun 3 10:24:10 ns326125 kernel: [ 480.345189] [<ffffffff8118f254>] generic_file_write_iter+0xe4/0x1e0
Jun 3 10:24:10 ns326125 kernel: [ 480.345193] [<ffffffffc08ec54a>] nfs_file_write+0x9a/0x160 [nfs]
Jun 3 10:24:10 ns326125 kernel: [ 480.345195] [<ffffffff8120c1bb>] new_sync_write+0x9b/0xe0
Jun 3 10:24:10 ns326125 kernel: [ 480.345197] [<ffffffff8120c226>] __vfs_write+0x26/0x40
Jun 3 10:24:10 ns326125 kernel: [ 480.345198] [<ffffffff8120c899>] vfs_write+0xa9/0x190
Jun 3 10:24:10 ns326125 kernel: [ 480.345199] [<ffffffff8120c7d4>] ? vfs_read+0x114/0x130
Jun 3 10:24:10 ns326125 kernel: [ 480.345201] [<ffffffff8120d675>] SyS_write+0x55/0xc0
Jun 3 10:24:10 ns326125 kernel: [ 480.345203] [<ffffffff818489b6>] entry_SYSCALL_64_fastpath+0x16/0x75
Jun 3 04:47:25 ns326125 kernel: [574672.937159] audit: type=1400 audit(1464922045.521:227): apparmor="DENIED" operation="sendmsg" profile="/usr/sbin/named" name="/run/systemd/journal/dev-log" pid=1567 comm="named" requested_mask="w" denied_mask="w" fsuid=109 ouid=0
Jun 3 04:47:26 ns326125 kernel: [574674.029074] audit: type=1400 audit(1464922046.613:228): apparmor="DENIED" operation="sendmsg" profile="/usr/sbin/named" name="/run/systemd/journal/dev-log" pid=1568 comm="named" requested_mask="w" denied_mask="w" fsuid=109 ouid=0
any idea??
thanks
Here is my configuration:
Dedicated server with proxmox 4.2
only 1 VM running inside, with Debian 8.4, that takes ~80% RAM and 6 of 8 physical cores.
At 3:00 AM a NFS backup starts; after that I can no longer reach the website hosted on the VM.
Symptoms are these:
-zabbix agent inside the VM reports low cpu usage.
-ram is not full
-proxmox server views 100% for each core (so 600% CPU usage)
-no KVM connection possible
-no TCP connection possible (network is natted and I used KVM port forwarding)
-after KVM stop and restart, everything is back to normal.
In my log I see lines like:
Jun 3 10:24:10 ns326125 kernel: [ 480.345124] lzop D ffff88082904fa58 0 1950 1926 0x00000000
Jun 3 10:24:10 ns326125 kernel: [ 480.345127] ffff88082904fa58 ffffffffffffffff ffff88085b9c1b80 ffff88084d15d280
Jun 3 10:24:10 ns326125 kernel: [ 480.345129] ffff880829050000 ffff88087fc97180 7fffffffffffffff ffffffff81845170
Jun 3 10:24:10 ns326125 kernel: [ 480.345130] ffff88082904fbb0 ffff88082904fa70 ffffffff81844885 0000000000000000
Jun 3 10:24:10 ns326125 kernel: [ 480.345131] Call Trace:
Jun 3 10:24:10 ns326125 kernel: [ 480.345137] [<ffffffff81845170>] ? bit_wait_timeout+0xa0/0xa0
Jun 3 10:24:10 ns326125 kernel: [ 480.345139] [<ffffffff81844885>] schedule+0x35/0x80
Jun 3 10:24:10 ns326125 kernel: [ 480.345141] [<ffffffff81847ac5>] schedule_timeout+0x235/0x2d0
Jun 3 10:24:10 ns326125 kernel: [ 480.345143] [<ffffffff811eadf1>] ? ___slab_alloc+0x341/0x430
Jun 3 10:24:10 ns326125 kernel: [ 480.345146] [<ffffffff810f568c>] ? ktime_get+0x3c/0xb0
Jun 3 10:24:10 ns326125 kernel: [ 480.345147] [<ffffffff81845170>] ? bit_wait_timeout+0xa0/0xa0
Jun 3 10:24:10 ns326125 kernel: [ 480.345149] [<ffffffff81843d9b>] io_schedule_timeout+0xbb/0x140
Jun 3 10:24:10 ns326125 kernel: [ 480.345150] [<ffffffff8184518b>] bit_wait_io+0x1b/0x70
Jun 3 10:24:10 ns326125 kernel: [ 480.345151] [<ffffffff81844c3f>] __wait_on_bit+0x5f/0x90
Jun 3 10:24:10 ns326125 kernel: [ 480.345153] [<ffffffff81845170>] ? bit_wait_timeout+0xa0/0xa0
Jun 3 10:24:10 ns326125 kernel: [ 480.345154] [<ffffffff81844cf1>] out_of_line_wait_on_bit+0x81/0xb0
Jun 3 10:24:10 ns326125 kernel: [ 480.345157] [<ffffffff810c3dc0>] ? autoremove_wake_function+0x40/0x40
Jun 3 10:24:10 ns326125 kernel: [ 480.345168] [<ffffffffc08f7c94>] nfs_wait_on_request+0x34/0x40 [nfs]
Jun 3 10:24:10 ns326125 kernel: [ 480.345174] [<ffffffffc08fc9ce>] nfs_updatepage+0x15e/0x920 [nfs]
Jun 3 10:24:10 ns326125 kernel: [ 480.345179] [<ffffffffc08ecdd4>] nfs_write_end+0x154/0x500 [nfs]
Jun 3 10:24:10 ns326125 kernel: [ 480.345181] [<ffffffff813fed9f>] ? iov_iter_copy_from_user_atomic+0x8f/0x230
Jun 3 10:24:10 ns326125 kernel: [ 480.345184] [<ffffffff8118cf94>] generic_perform_write+0x114/0x1c0
Jun 3 10:24:10 ns326125 kernel: [ 480.345186] [<ffffffff8118f126>] __generic_file_write_iter+0x1a6/0x1f0
Jun 3 10:24:10 ns326125 kernel: [ 480.345188] [<ffffffff812297b3>] ? touch_atime+0x33/0xd0
Jun 3 10:24:10 ns326125 kernel: [ 480.345189] [<ffffffff8118f254>] generic_file_write_iter+0xe4/0x1e0
Jun 3 10:24:10 ns326125 kernel: [ 480.345193] [<ffffffffc08ec54a>] nfs_file_write+0x9a/0x160 [nfs]
Jun 3 10:24:10 ns326125 kernel: [ 480.345195] [<ffffffff8120c1bb>] new_sync_write+0x9b/0xe0
Jun 3 10:24:10 ns326125 kernel: [ 480.345197] [<ffffffff8120c226>] __vfs_write+0x26/0x40
Jun 3 10:24:10 ns326125 kernel: [ 480.345198] [<ffffffff8120c899>] vfs_write+0xa9/0x190
Jun 3 10:24:10 ns326125 kernel: [ 480.345199] [<ffffffff8120c7d4>] ? vfs_read+0x114/0x130
Jun 3 10:24:10 ns326125 kernel: [ 480.345201] [<ffffffff8120d675>] SyS_write+0x55/0xc0
Jun 3 10:24:10 ns326125 kernel: [ 480.345203] [<ffffffff818489b6>] entry_SYSCALL_64_fastpath+0x16/0x75
Jun 3 04:47:25 ns326125 kernel: [574672.937159] audit: type=1400 audit(1464922045.521:227): apparmor="DENIED" operation="sendmsg" profile="/usr/sbin/named" name="/run/systemd/journal/dev-log" pid=1567 comm="named" requested_mask="w" denied_mask="w" fsuid=109 ouid=0
Jun 3 04:47:26 ns326125 kernel: [574674.029074] audit: type=1400 audit(1464922046.613:228): apparmor="DENIED" operation="sendmsg" profile="/usr/sbin/named" name="/run/systemd/journal/dev-log" pid=1568 comm="named" requested_mask="w" denied_mask="w" fsuid=109 ouid=0
any idea??
thanks