Kernel-Panic on KVM-Guest on Proxmox 3.4

scaa

Renowned Member
Nov 20, 2015
147
4
83
Hello

sorry for my bad english...

we have problems on two KVM-Machines on a Proxmox-Host 3.4
Two machines freeze every few days - kernel panic (see below).
Apparently there is no longer access to the hard drive?

I've found this:
Our values on the vm are:
vm.dirty_background_bytes = 0
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_writeback_centisecs = 500
vm.dirty_ratio = 20
vm.dirty_background_ratio = 10

could this help?
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5

Can anybody help?

Host:
2x Xeon E5-2630v3 2.4 GHz
Supermicro X10DRI
128 GB RAM
4x 960 GB SSD SM863 Raid-10 (System on /var/lib/vz)
2x 2000 GB SAS Ultrastar 7k4000 Raid-1 (Backup on /mnt/sdb1)


Adaptec ASR8805
Firmware 7.5-0 (32033)

Controller Cache for Raid-10 (SSD)
Read-Cache Status Off
Write-Cache Status Off (write-through)
Write-Cache Mode Disabled (write-through)

Controller Cache for Raid-1 (SAS)
Read-Cache Status On
Write-Cache Status Off (write-through)
Write-Cache Mode Disabled (write-through)

-------------------------------------------

# pveversion --verbose
proxmox-ve-2.6.32: 3.4-164 (running kernel: 2.6.32-41-pve)
pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-2.6.32-39-pve: 2.6.32-157
pve-kernel-2.6.32-41-pve: 2.6.32-164
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-11
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

-------------------------------------------

one of the problems makes machines:
KVM with Debian Jessie 8.2

bootdisk: virtio0
cores: 8
ide2: backup:iso/debian-8.2.0-amd64-netinst.iso,media=cdrom,size=247M
memory: 49152
name: xxx
net0: e1000=7E:FF:6B:8D:0B:88,bridge=vmbr0
net1: e1000=2E:64:01:8D:E6:C2,bridge=vmbr1
numa: 0
onboot: 1
ostype: l26
smbios1: uuid=158680c0-7d14-4f4a-93fe-b3f0cebbb0cf
sockets: 1
virtio0: local:107/vm-107-disk-1.qcow2,format=qcow2,size=900G

-------------------------------------------

Nov 20 10:23:33 vm107 kernel: [294480.772112] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
Nov 20 10:23:33 vm107 kernel: [294480.773123] Not tainted 3.16.0-4-amd64 #1
Nov 20 10:23:33 vm107 kernel: [294480.773661] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 10:23:33 vm107 kernel: [294480.774672] kworker/u16:0 D ffff880c0eb6a4a8 0 6 2 0x00000000
Nov 20 10:23:33 vm107 kernel: [294480.775354] Workqueue: writeback bdi_writeback_workfn (flush-254:0)
Nov 20 10:23:34 vm107 kernel: [294480.775984] ffff880c0eb6a050 0000000000000046 0000000000012f00 ffff880c0eb83fd8
Nov 20 10:23:34 vm107 kernel: [294480.777026] 0000000000012f00 ffff880c0eb6a050 ffff880c3fc537b0 ffff880c3ff80060
Nov 20 10:23:34 vm107 kernel: [294480.778040] ffff880c0eb837f0 0000000000000002 ffffffff811d6b10 ffff880c09f07ab8
Nov 20 10:23:34 vm107 kernel: [294480.779052] Call Trace:
Nov 20 10:23:34 vm107 kernel: [294480.779496] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Nov 20 10:23:34 vm107 kernel: [294480.780113] [<ffffffff8150e159>] ? io_schedule+0x99/0x120
Nov 20 10:23:34 vm107 kernel: [294480.780696] [<ffffffff811d6b1a>] ? sleep_on_buffer+0xa/0x10
Nov 20 10:23:34 vm107 kernel: [294480.781282] [<ffffffff8150e5e1>] ? __wait_on_bit_lock+0x41/0xa0
Nov 20 10:23:34 vm107 kernel: [294480.781879] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Nov 20 10:23:34 vm107 kernel: [294480.782480] [<ffffffff8150e6b7>] ? out_of_line_wait_on_bit_lock+0x77/0x90
Nov 20 10:23:34 vm107 kernel: [294480.783123] [<ffffffff810a7a70>] ? autoremove_wake_function+0x30/0x30
Nov 20 10:23:34 vm107 kernel: [294480.783771] [<ffffffffa0141a20>] ? do_get_write_access+0x260/0x4e0 [jbd2]
Nov 20 10:23:34 vm107 kernel: [294480.784434] [<ffffffffa0141cc2>] ? jbd2_journal_get_write_access+0x22/0x40 [jbd2]
Nov 20 10:23:34 vm107 kernel: [294480.785471] [<ffffffffa018b066>] ? __ext4_journal_get_write_access+0x36/0x80 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.786478] [<ffffffffa0191a7a>] ? ext4_mb_mark_diskspace_used+0x6a/0x4c0 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.787472] [<ffffffffa018c986>] ? ext4_mb_use_preallocated+0x256/0x270 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.788479] [<ffffffffa018cf33>] ? ext4_mb_initialize_context+0x73/0x190 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.789474] [<ffffffffa01931d2>] ? ext4_mb_new_blocks+0x292/0x4f0 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.790112] [<ffffffffa0188923>] ? ext4_ext_map_blocks+0x653/0x10a0 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.790756] [<ffffffffa015e99c>] ? ext4_map_blocks+0x15c/0x530 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.791382] [<ffffffffa0161b86>] ? ext4_writepages+0x606/0xd00 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.792027] [<ffffffff811ce539>] ? __writeback_single_inode+0x39/0x220
Nov 20 10:23:34 vm107 kernel: [294480.792657] [<ffffffff811cf1a4>] ? writeback_sb_inodes+0x1a4/0x3e0
Nov 20 10:23:34 vm107 kernel: [294480.793268] [<ffffffff811cf476>] ? __writeback_inodes_wb+0x96/0xc0
Nov 20 10:23:34 vm107 kernel: [294480.793877] [<ffffffff811cf6e3>] ? wb_writeback+0x243/0x2d0
Nov 20 10:23:34 vm107 kernel: [294480.794463] [<ffffffff811d193c>] ? bdi_writeback_workfn+0x1bc/0x420
Nov 20 10:23:34 vm107 kernel: [294480.795099] [<ffffffff81081662>] ? process_one_work+0x172/0x420
Nov 20 10:23:34 vm107 kernel: [294480.795701] [<ffffffff81081cf3>] ? worker_thread+0x113/0x4f0
Nov 20 10:23:34 vm107 kernel: [294480.796305] [<ffffffff81081be0>] ? rescuer_thread+0x2d0/0x2d0
Nov 20 10:23:34 vm107 kernel: [294480.796900] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
Nov 20 10:23:34 vm107 kernel: [294480.797458] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
Nov 20 10:23:34 vm107 kernel: [294480.798079] [<ffffffff81511618>] ? ret_from_fork+0x58/0x90
Nov 20 10:23:34 vm107 kernel: [294480.798662] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
Nov 20 10:23:34 vm107 kernel: [294480.799301] INFO: task jbd2/vda1-8:156 blocked for more than 120 seconds.
Nov 20 10:23:34 vm107 kernel: [294480.799929] Not tainted 3.16.0-4-amd64 #1
Nov 20 10:23:34 vm107 kernel: [294480.800472] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 10:23:34 vm107 kernel: [294480.801480] jbd2/vda1-8 D ffff880c0a3cce38 0 156 2 0x00000000
Nov 20 10:23:34 vm107 kernel: [294480.802156] ffff880c0a3cc9e0 0000000000000046 0000000000012f00 ffff880c0e697fd8
Nov 20 10:23:34 vm107 kernel: [294480.803168] 0000000000012f00 ffff880c0a3cc9e0 ffff880c3fc137b0 ffff880c3ff861f8
Nov 20 10:23:34 vm107 kernel: [294480.804195] 0000000000000002 ffffffff811d6b10 ffff880c0e697c80 ffff880c0a1d2398
Nov 20 10:23:34 vm107 kernel: [294480.805219] Call Trace:
Nov 20 10:23:34 vm107 kernel: [294480.805657] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Nov 20 10:23:34 vm107 kernel: [294480.806260] [<ffffffff8150e159>] ? io_schedule+0x99/0x120
Nov 20 10:23:34 vm107 kernel: [294480.806828] [<ffffffff811d6b1a>] ? sleep_on_buffer+0xa/0x10
Nov 20 10:23:34 vm107 kernel: [294480.807409] [<ffffffff8150e4dc>] ? __wait_on_bit+0x5c/0x90
Nov 20 10:23:34 vm107 kernel: [294480.807980] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Nov 20 10:23:34 vm107 kernel: [294480.808597] [<ffffffff8150e587>] ? out_of_line_wait_on_bit+0x77/0x90
Nov 20 10:23:34 vm107 kernel: [294480.809222] [<ffffffff810a7a70>] ? autoremove_wake_function+0x30/0x30
Nov 20 10:23:34 vm107 kernel: [294480.809846] [<ffffffffa014450e>] ? jbd2_journal_commit_transaction+0x175e/0x1950 [jbd2]
Nov 20 10:23:34 vm107 kernel: [294480.810862] [<ffffffff810a2b01>] ? pick_next_task_fair+0x6e1/0x820
Nov 20 10:23:34 vm107 kernel: [294480.811477] [<ffffffffa0147bc2>] ? kjournald2+0xb2/0x240 [jbd2]
Nov 20 10:23:34 vm107 kernel: [294480.812087] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Nov 20 10:23:34 vm107 kernel: [294480.812700] [<ffffffffa0147b10>] ? commit_timeout+0x10/0x10 [jbd2]
Nov 20 10:23:34 vm107 kernel: [294480.813313] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
Nov 20 10:23:34 vm107 kernel: [294480.813867] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
Nov 20 10:23:34 vm107 kernel: [294480.814491] [<ffffffff81511618>] ? ret_from_fork+0x58/0x90
Nov 20 10:23:34 vm107 kernel: [294480.815082] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
Nov 20 10:23:34 vm107 kernel: [294480.815762] INFO: task mysqld:18992 blocked for more than 120 seconds.
Nov 20 10:23:34 vm107 kernel: [294480.816401] Not tainted 3.16.0-4-amd64 #1
Nov 20 10:23:34 vm107 kernel: [294480.816928] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 10:23:34 vm107 kernel: [294480.817919] mysqld D ffff880c0a8859c8 0 18992 691 0x00000000
Nov 20 10:23:34 vm107 kernel: [294480.818565] ffff880c0a885570 0000000000000086 0000000000012f00 ffff880a7d183fd8
Nov 20 10:23:34 vm107 kernel: [294480.819566] 0000000000012f00 ffff880c0a885570 ffff880c0a1d2000 0000000000128138
Nov 20 10:23:34 vm107 kernel: [294480.820592] ffff880c0a1d2088 ffff880c0a1d2024 ffff880a7d183ed0 ffff880c0a1d20a0
Nov 20 10:23:34 vm107 kernel: [294480.821587] Call Trace:
Nov 20 10:23:34 vm107 kernel: [294480.822021] [<ffffffffa0147605>] ? jbd2_log_wait_commit+0x95/0x100 [jbd2]
Nov 20 10:23:34 vm107 kernel: [294480.822651] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Nov 20 10:23:34 vm107 kernel: [294480.823264] [<ffffffffa0159770>] ? ext4_sync_file+0x280/0x310 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.823875] [<ffffffff811d53fb>] ? do_fsync+0x4b/0x70
Nov 20 10:23:34 vm107 kernel: [294480.824445] [<ffffffff811d566c>] ? SyS_fsync+0xc/0x10
Nov 20 10:23:34 vm107 kernel: [294480.825017] [<ffffffff815116cd>] ? system_call_fast_compare_end+0x10/0x15
Nov 20 10:23:34 vm107 kernel: [294480.825649] INFO: task master:1599 blocked for more than 120 seconds.
Nov 20 10:23:34 vm107 kernel: [294480.826261] Not tainted 3.16.0-4-amd64 #1
Nov 20 10:23:34 vm107 kernel: [294480.826778] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 10:23:34 vm107 kernel: [294480.827766] master D ffff880c0b51e7e8 0 1599 1 0x00000004
Nov 20 10:23:34 vm107 kernel: [294480.828433] ffff880c0b51e390 0000000000000082 0000000000012f00 ffff880c07c77fd8
Nov 20 10:23:34 vm107 kernel: [294480.829431] 0000000000012f00 ffff880c0b51e390 ffff880c3fd937b0 ffff880c3ff8f060
Nov 20 10:23:34 vm107 kernel: [294480.830425] ffff880c07c77be0 0000000000000002 ffffffff811d6b10 ffff880c09d2e9e8
Nov 20 10:23:34 vm107 kernel: [294480.831421] Call Trace:
Nov 20 10:23:34 vm107 kernel: [294480.831850] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Nov 20 10:23:34 vm107 kernel: [294480.832481] [<ffffffff8150e159>] ? io_schedule+0x99/0x120
Nov 20 10:23:34 vm107 kernel: [294480.833044] [<ffffffff811d6b1a>] ? sleep_on_buffer+0xa/0x10
Nov 20 10:23:34 vm107 kernel: [294480.833618] [<ffffffff8150e5e1>] ? __wait_on_bit_lock+0x41/0xa0
Nov 20 10:23:34 vm107 kernel: [294480.834210] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Nov 20 10:23:34 vm107 kernel: [294480.834814] [<ffffffff8150e6b7>] ? out_of_line_wait_on_bit_lock+0x77/0x90
Nov 20 10:23:34 vm107 kernel: [294480.835444] [<ffffffff810a7a70>] ? autoremove_wake_function+0x30/0x30
Nov 20 10:23:34 vm107 kernel: [294480.836070] [<ffffffffa0141a20>] ? do_get_write_access+0x260/0x4e0 [jbd2]
Nov 20 10:23:34 vm107 kernel: [294480.836701] [<ffffffff811d830a>] ? __getblk+0x2a/0x2d0
Nov 20 10:23:34 vm107 kernel: [294480.837256] [<ffffffffa0141cc2>] ? jbd2_journal_get_write_access+0x22/0x40 [jbd2]
Nov 20 10:23:34 vm107 kernel: [294480.842481] [<ffffffffa018b066>] ? __ext4_journal_get_write_access+0x36/0x80 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.843475] [<ffffffffa0161388>] ? ext4_reserve_inode_write+0x68/0x90 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.844449] [<ffffffffa01645cb>] ? ext4_dirty_inode+0x3b/0x60 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.845078] [<ffffffffa01613ef>] ? ext4_mark_inode_dirty+0x3f/0x1d0 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.845726] [<ffffffffa01645cb>] ? ext4_dirty_inode+0x3b/0x60 [ext4]
Nov 20 10:23:34 vm107 kernel: [294480.846341] [<ffffffff811cebc2>] ? __mark_inode_dirty+0x172/0x270
Nov 20 10:23:34 vm107 kernel: [294480.846939] [<ffffffff811c1771>] ? update_time+0x81/0xc0
Nov 20 10:23:34 vm107 kernel: [294480.847507] [<ffffffff8106b3a2>] ? current_fs_time+0x12/0x60
Nov 20 10:23:34 vm107 kernel: [294480.848099] [<ffffffff811c1970>] ? file_update_time+0x80/0xd0
Nov 20 10:23:34 vm107 kernel: [294480.848689] [<ffffffff811b0005>] ? pipe_write+0x3b5/0x460
Nov 20 10:23:34 vm107 kernel: [294480.849260] [<ffffffff811a7ad4>] ? new_sync_write+0x74/0xa0
Nov 20 10:23:34 vm107 kernel: [294480.849834] [<ffffffff811a8212>] ? vfs_write+0xb2/0x1f0
Nov 20 10:23:34 vm107 kernel: [294480.850394] [<ffffffff811a8d52>] ? SyS_write+0x42/0xa0
Nov 20 10:23:34 vm107 kernel: [294480.850952] [<ffffffff811bc68d>] ? SyS_poll+0x5d/0xf0
Nov 20 10:23:34 vm107 kernel: [294480.851507] [<ffffffff815116cd>] ? system_call_fast_compare_end+0x10/0x15
Nov 20 10:23:34 vm107 kernel: [294480.852161] INFO: task apache2:5693 blocked for more than 120 seconds.
Nov 20 10:23:34 vm107 kernel: [294480.852771] Not tainted 3.16.0-4-amd64 #1
Nov 20 10:23:34 vm107 kernel: [294480.853294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 10:23:34 vm107 kernel: [294480.854277] apache2 D ffff880036c5cdb8 0 5693 1032 0x00000000
Nov 20 10:23:34 vm107 kernel: [294480.854943] ffff880036c5c960 0000000000000082 0000000000012f00 ffff880b58dd3fd8
Nov 20 10:23:34 vm107 kernel: [294480.855932] 0000000000012f00 ffff880036c5c960 ffff880c0a36d748 ffff880b58dd3f20
Nov 20 10:23:34 vm107 kernel: [294480.856937] ffff880c0a36d74c ffff880036c5c960 00000000ffffffff ffff880c0a36d750
Nov 20 10:23:34 vm107 kernel: [294480.857928] Call Trace:
Nov 20 10:23:34 vm107 kernel: [294480.858365] [<ffffffff8150e2a5>] ? schedule_preempt_disabled+0x25/0x70
Nov 20 10:23:34 vm107 kernel: [294480.858977] [<ffffffff8150fd53>] ? __mutex_lock_slowpath+0xd3/0x1c0
Nov 20 10:23:34 vm107 kernel: [294480.859591] [<ffffffff8150fe5b>] ? mutex_lock+0x1b/0x2a
Nov 20 10:23:34 vm107 kernel: [294480.860173] [<ffffffff811c43ed>] ? __fdget_pos+0x3d/0x50
Nov 20 10:23:34 vm107 kernel: [294480.860737] [<ffffffff811a8d2a>] ? SyS_write+0x1a/0xa0
Nov 20 10:23:34 vm107 kernel: [294480.861294] [<ffffffff8106b59e>] ? SyS_gettimeofday+0x2e/0x80
Nov 20 10:23:34 vm107 kernel: [294480.861871] [<ffffffff815116cd>] ? system_call_fast_compare_end+0x10/0x15
Nov 20 10:23:34 vm107 kernel: [294480.862525] INFO: task apache2:5707 blocked for more than 120 seconds.
Nov 20 10:23:34 vm107 kernel: [294480.863135] Not tainted 3.16.0-4-amd64 #1
Nov 20 10:23:34 vm107 kernel: [294480.863655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 10:23:34 vm107 kernel: [294480.864689] apache2 D ffff880c0a1f5138 0 5707 1032 0x00000000
Nov 20 10:23:34 vm107 kernel: [294480.865360] ffff880c0a1f4ce0 0000000000000082 0000000000012f00 ffff880036e1ffd8
Nov 20 10:23:34 vm107 kernel: [294480.866350] 0000000000012f00 ffff880c0a1f4ce0 ffff880c0a36d748 ffff880036e1ff20
Nov 20 10:23:34 vm107 kernel: [294480.867340] ffff880c0a36d74c ffff880c0a1f4ce0 00000000ffffffff ffff880c0a36d750
Nov 20 10:23:34 vm107 kernel: [294480.868340] Call Trace:
Nov 20 10:23:34 vm107 kernel: [294480.868769] [<ffffffff8150e2a5>] ? schedule_preempt_disabled+0x25/0x70
Nov 20 10:23:34 vm107 kernel: [294480.869384] [<ffffffff8150fd53>] ? __mutex_lock_slowpath+0xd3/0x1c0
Nov 20 10:23:34 vm107 kernel: [294480.869990] [<ffffffff8150fe5b>] ? mutex_lock+0x1b/0x2a
Nov 20 10:23:34 vm107 kernel: [294480.870550] [<ffffffff811c43ed>] ? __fdget_pos+0x3d/0x50
Nov 20 10:23:34 vm107 kernel: [294480.871112] [<ffffffff811a8d2a>] ? SyS_write+0x1a/0xa0
Nov 20 10:23:34 vm107 kernel: [294480.871666] [<ffffffff8106b59e>] ? SyS_gettimeofday+0x2e/0x80
Nov 20 10:23:34 vm107 kernel: [294480.872291] [<ffffffff815116cd>] ? system_call_fast_compare_end+0x10/0x15
Nov 20 10:23:34 vm107 kernel: [294480.872916] INFO: task apache2:5711 blocked for more than 120 seconds.
Nov 20 10:23:34 vm107 kernel: [294480.873531] Not tainted 3.16.0-4-amd64 #1
Nov 20 10:23:34 vm107 kernel: [294480.874049] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 10:23:34 vm107 kernel: [294480.875055] apache2 D ffff880c0b15a768 0 5711 1032 0x00000000
Nov 20 10:23:34 vm107 kernel: [294480.875701] ffff880c0b15a310 0000000000000082 0000000000012f00 ffff880036cfffd8
Nov 20 10:23:34 vm107 kernel: [294480.876707] 0000000000012f00 ffff880c0b15a310 ffff880c0a36d748 ffff880036cfff20
Nov 20 10:23:34 vm107 kernel: [294480.877701] ffff880c0a36d74c ffff880c0b15a310 00000000ffffffff ffff880c0a36d750
Nov 20 10:23:34 vm107 kernel: [294480.878686] Call Trace:
Nov 20 10:23:34 vm107 kernel: [294480.879114] [<ffffffff8150e2a5>] ? schedule_preempt_disabled+0x25/0x70
Nov 20 10:23:34 vm107 kernel: [294480.879727] [<ffffffff8150fd53>] ? __mutex_lock_slowpath+0xd3/0x1c0
Nov 20 10:23:34 vm107 kernel: [294480.880344] [<ffffffff8150fe5b>] ? mutex_lock+0x1b/0x2a
Nov 20 10:23:34 vm107 kernel: [294480.880899] [<ffffffff811c43ed>] ? __fdget_pos+0x3d/0x50
Nov 20 10:23:34 vm107 kernel: [294480.881458] [<ffffffff811a8d2a>] ? SyS_write+0x1a/0xa0
Nov 20 10:23:34 vm107 kernel: [294480.882008] [<ffffffff8106b59e>] ? SyS_gettimeofday+0x2e/0x80
Nov 20 10:23:34 vm107 kernel: [294480.882587] [<ffffffff815116cd>] ? system_call_fast_compare_end+0x10/0x15
Nov 20 10:23:34 vm107 kernel: [294480.883214] INFO: task apache2:5723 blocked for more than 120 seconds.
Nov 20 10:23:34 vm107 kernel: [294480.883819] Not tainted 3.16.0-4-amd64 #1
Nov 20 10:23:34 vm107 kernel: [294480.884355] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 10:23:34 vm107 kernel: [294480.885354] apache2 D ffff880c0aa064a8 0 5723 1032 0x00000000
Nov 20 10:23:34 vm107 kernel: [294480.885998] ffff880c0aa06050 0000000000000082 0000000000012f00 ffff8800bba7ffd8
Nov 20 10:23:34 vm107 kernel: [294480.886985] 0000000000012f00 ffff880c0aa06050 ffff880c0a36d748 ffff8800bba7ff20
Nov 20 10:23:34 vm107 kernel: [294480.887971] ffff880c0a36d74c ffff880c0aa06050 00000000ffffffff ffff880c0a36d750
Nov 20 10:23:34 vm107 kernel: [294480.888975] Call Trace:
Nov 20 10:23:34 vm107 kernel: [294480.889404] [<ffffffff8150e2a5>] ? schedule_preempt_disabled+0x25/0x70
Nov 20 10:23:34 vm107 kernel: [294480.890015] [<ffffffff8150fd53>] ? __mutex_lock_slowpath+0xd3/0x1c0
Nov 20 10:23:34 vm107 kernel: [294480.890615] [<ffffffff8150fe5b>] ? mutex_lock+0x1b/0x2a
Nov 20 10:23:34 vm107 kernel: [294480.891171] [<ffffffff811c43ed>] ? __fdget_pos+0x3d/0x50
Nov 20 10:23:34 vm107 kernel: [294480.891727] [<ffffffff811a8d2a>] ? SyS_write+0x1a/0xa0
Nov 20 10:23:34 vm107 kernel: [294480.892296] [<ffffffff8106b59e>] ? SyS_gettimeofday+0x2e/0x80
Nov 20 10:23:34 vm107 kernel: [294480.892876] [<ffffffff815116cd>] ? system_call_fast_compare_end+0x10/0x15
Nov 20 10:23:34 vm107 kernel: [294480.893504] INFO: task apache2:5734 blocked for more than 120 seconds.
Nov 20 10:23:34 vm107 kernel: [294480.894327] Not tainted 3.16.0-4-amd64 #1
Nov 20 10:23:34 vm107 kernel: [294480.894903] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 10:23:34 vm107 kernel: [294480.895915] apache2 D ffff880c08007808 0 5734 1032 0x00000000
Nov 20 10:23:34 vm107 kernel: [294480.896593] ffff880c080073b0 0000000000000082 0000000000012f00 ffff8800bba8bfd8
Nov 20 10:23:34 vm107 kernel: [294480.897593] 0000000000012f00 ffff880c080073b0 ffff880c0a36d748 ffff8800bba8bf20
Nov 20 10:23:34 vm107 kernel: [294480.898593] ffff880c0a36d74c ffff880c080073b0 00000000ffffffff ffff880c0a36d750
Nov 20 10:23:34 vm107 kernel: [294480.899591] Call Trace:
Nov 20 10:23:34 vm107 kernel: [294480.900039] [<ffffffff8150e2a5>] ? schedule_preempt_disabled+0x25/0x70
Nov 20 10:23:34 vm107 kernel: [294480.900664] [<ffffffff8150fd53>] ? __mutex_lock_slowpath+0xd3/0x1c0
Nov 20 10:23:34 vm107 kernel: [294480.901281] [<ffffffff8150fe5b>] ? mutex_lock+0x1b/0x2a
Nov 20 10:23:34 vm107 kernel: [294480.901842] [<ffffffff811c43ed>] ? __fdget_pos+0x3d/0x50
Nov 20 10:23:34 vm107 kernel: [294480.902408] [<ffffffff811a8d2a>] ? SyS_write+0x1a/0xa0
Nov 20 10:23:34 vm107 kernel: [294480.902967] [<ffffffff8106b59e>] ? SyS_gettimeofday+0x2e/0x80
Nov 20 10:23:34 vm107 kernel: [294480.903551] [<ffffffff815116cd>] ? system_call_fast_compare_end+0x10/0x15
Nov 20 10:23:34 vm107 kernel: [294480.904203] INFO: task apache2:5704 blocked for more than 120 seconds.
Nov 20 10:23:34 vm107 kernel: [294480.904819] Not tainted 3.16.0-4-amd64 #1
Nov 20 10:23:34 vm107 kernel: [294480.905363] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 10:23:34 vm107 kernel: [294480.906356] apache2 D ffff880c0d77b078 0 5704 1032 0x00000000
Nov 20 10:23:34 vm107 kernel: [294480.907007] ffff880c0d77ac20 0000000000000086 0000000000012f00 ffff8800bb9fffd8
Nov 20 10:23:34 vm107 kernel: [294480.908022] 0000000000012f00 ffff880c0d77ac20 ffff880c0a36d748 ffff8800bb9fff20
Nov 20 10:23:34 vm107 kernel: [294480.909022] ffff880c0a36d74c ffff880c0d77ac20 00000000ffffffff ffff880c0a36d750
Nov 20 10:23:34 vm107 kernel: [294480.910022] Call Trace:
Nov 20 10:23:34 vm107 kernel: [294480.910457] [<ffffffff8150e2a5>] ? schedule_preempt_disabled+0x25/0x70
Nov 20 10:23:34 vm107 kernel: [294480.911076] [<ffffffff8150fd53>] ? __mutex_lock_slowpath+0xd3/0x1c0
Nov 20 10:23:34 vm107 kernel: [294480.911687] [<ffffffff8150fe5b>] ? mutex_lock+0x1b/0x2a
Nov 20 10:23:34 vm107 kernel: [294480.912267] [<ffffffff811c43ed>] ? __fdget_pos+0x3d/0x50
Nov 20 10:23:34 vm107 kernel: [294480.912833] [<ffffffff811a8d2a>] ? SyS_write+0x1a/0xa0
Nov 20 10:23:34 vm107 kernel: [294480.913393] [<ffffffff8106b59e>] ? SyS_gettimeofday+0x2e/0x80
Nov 20 10:23:34 vm107 kernel: [294480.913977] [<ffffffff815116cd>] ? system_call_fast_compare_end+0x10/0x15
 
Hi, this is not a real kernel panic, this is your vms kernel which send error message, because the storage is not responding or too slow. Maybe your storage is overloaded when this happen ?
 
I think no overloaded storage.
Its a local Raid-10 mit Adaptec ASR8805 and 4x 960 GB SSD SM863 (Samsung).
Cache-Settings on the controller:
Read-Cache Status Off
Write-Cache Status Off (write-through)
Write-Cache Mode Disabled (write-through)
I think this ist corrcet with SSD

All other VMs running fine in this situation.
 
Hey scaa,

we've the same problem here.

Maybe i have some interesting facts for you:

Our Server-Hardware:
X10DRI
2x Intel® Xeon® Processor E5-2620 v3
64GB DDR4 RAM

Storage:
1x RAID-1 with 2x Intel SSDs
1x RAID-1 with 2x Seagate 3TB SAS
With Storage Controller Avago MegaRAID 9361-4i SAS3

Bought about 6 Months ago.

We only have this problem on Debian systems with both, SSD and HDD Raid. Productive System, too - very critical!

If i set the storage caching to directsync, the VM-System becomes more instable. First the Load Avg. on the VM goes up to maximum 24.0 and then becomes unresponsive with the same error.

Now we have the following configuration on one VM and it's working fine:

vm.dirty_ratio = 30
vm.dirty_background_ratio = 5

Here the background processes will start writing right away when it hits that 5% ceiling but the system won’t force synchronous I/O until it gets to 30% full. It works for now, but it's not the solution.

Maybe the devs have an idea!Bildschirmfoto 2015-11-24 um 16.51.26.jpg
 

Attachments

  • Bildschirmfoto 2015-11-24 um 16.33.31.png
    Bildschirmfoto 2015-11-24 um 16.33.31.png
    74.1 KB · Views: 11
We have changed this for 3 day ago:

Default:
vm.dirty_ratio = 20
vm.dirty_background_ratio = 10

new:
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5

... waiting & hope :)

Link1
Link2
Link3
 
the kernel.log from another VM (KVM/Jessie) with these problems:

Nov 28 00:08:00 vm110 kernel: [3826800.252143] INFO: task jbd2/vda1-8:121 blocked for more than 120 seconds.
Nov 28 00:08:00 vm110 kernel: [3826800.252544] Not tainted 3.16.0-4-amd64 #1
Nov 28 00:08:00 vm110 kernel: [3826800.252756] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 28 00:08:00 vm110 kernel: [3826800.253113] jbd2/vda1-8 D ffff880234512728 0 121 2 0x00000000
Nov 28 00:08:00 vm110 kernel: [3826800.253456] ffff8802345122d0 0000000000000046 0000000000012f00 ffff88023403bfd8
Nov 28 00:08:00 vm110 kernel: [3826800.253823] 0000000000012f00 ffff8802345122d0 ffff88023fc937b0 ffff88023ffc71b8
Nov 28 00:08:00 vm110 kernel: [3826800.254184] 0000000000000002 ffffffff811d6b10 ffff88023403bc80 ffff880234584398
Nov 28 00:08:00 vm110 kernel: [3826800.254551] Call Trace:
Nov 28 00:08:00 vm110 kernel: [3826800.254677] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Nov 28 00:08:00 vm110 kernel: [3826800.254955] [<ffffffff8150e139>] ? io_schedule+0x99/0x120
Nov 28 00:08:00 vm110 kernel: [3826800.255207] [<ffffffff811d6b1a>] ? sleep_on_buffer+0xa/0x10
Nov 28 00:08:00 vm110 kernel: [3826800.255464] [<ffffffff8150e4bc>] ? __wait_on_bit+0x5c/0x90
Nov 28 00:08:00 vm110 kernel: [3826800.255722] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Nov 28 00:08:00 vm110 kernel: [3826800.255999] [<ffffffff8150e567>] ? out_of_line_wait_on_bit+0x77/0x90
Nov 28 00:08:00 vm110 kernel: [3826800.256643] [<ffffffff810a7a70>] ? autoremove_wake_function+0x30/0x30
Nov 28 00:08:00 vm110 kernel: [3826800.257960] [<ffffffffa013650e>] ? jbd2_journal_commit_transaction+0x175e/0x1950 [jbd2]
Nov 28 00:08:00 vm110 kernel: [3826800.259237] [<ffffffff810a2b01>] ? pick_next_task_fair+0x6e1/0x820
Nov 28 00:08:00 vm110 kernel: [3826800.259842] [<ffffffffa0139bc2>] ? kjournald2+0xb2/0x240 [jbd2]
Nov 28 00:08:00 vm110 kernel: [3826800.260471] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Nov 28 00:08:00 vm110 kernel: [3826800.261158] [<ffffffffa0139b10>] ? commit_timeout+0x10/0x10 [jbd2]
Nov 28 00:08:00 vm110 kernel: [3826800.261771] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
Nov 28 00:08:00 vm110 kernel: [3826800.262323] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
Nov 28 00:08:00 vm110 kernel: [3826800.262927] [<ffffffff815115d8>] ? ret_from_fork+0x58/0x90
Nov 28 00:08:00 vm110 kernel: [3826800.263492] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
Nov 28 00:08:00 vm110 kernel: [3826800.264143] INFO: task mysqld:13038 blocked for more than 120 seconds.
Nov 28 00:08:00 vm110 kernel: [3826800.264879] Not tainted 3.16.0-4-amd64 #1
Nov 28 00:08:00 vm110 kernel: [3826800.265410] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 28 00:08:00 vm110 kernel: [3826800.266392] mysqld D ffff8802354464e8 0 13038 25023 0x00000000
Nov 28 00:08:00 vm110 kernel: [3826800.267353] ffff880235446090 0000000000000086 0000000000012f00 ffff880008b17fd8
Nov 28 00:08:00 vm110 kernel: [3826800.268452] 0000000000012f00 ffff880235446090 ffff880234584000 00000000001700d1
Nov 28 00:08:00 vm110 kernel: [3826800.269452] ffff880234584088 ffff880234584024 ffff880008b17ed0 ffff8802345840a0
Nov 28 00:08:00 vm110 kernel: [3826800.270444] Call Trace:
Nov 28 00:08:00 vm110 kernel: [3826800.270879] [<ffffffffa0139605>] ? jbd2_log_wait_commit+0x95/0x100 [jbd2]
Nov 28 00:08:00 vm110 kernel: [3826800.271509] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Nov 28 00:08:00 vm110 kernel: [3826800.272153] [<ffffffffa015d770>] ? ext4_sync_file+0x280/0x310 [ext4]
Nov 28 00:08:00 vm110 kernel: [3826800.272873] [<ffffffff811d53fb>] ? do_fsync+0x4b/0x70
Nov 28 00:08:00 vm110 kernel: [3826800.273422] [<ffffffff811d566c>] ? SyS_fsync+0xc/0x10
Nov 28 00:08:00 vm110 kernel: [3826800.273967] [<ffffffff8151168d>] ? system_call_fast_compare_end+0x10/0x15
Nov 28 00:08:00 vm110 kernel: [3826800.274642] INFO: task postdrop:21417 blocked for more than 120 seconds.
Nov 28 00:08:00 vm110 kernel: [3826800.275257] Not tainted 3.16.0-4-amd64 #1
Nov 28 00:08:00 vm110 kernel: [3826800.275772] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 28 00:08:00 vm110 kernel: [3826800.276836] postdrop D ffff88021eceb178 0 21417 21416 0x00000000
Nov 28 00:08:00 vm110 kernel: [3826800.277910] ffff88021ecead20 0000000000000086 0000000000012f00 ffff88000816bfd8
Nov 28 00:08:00 vm110 kernel: [3826800.278893] 0000000000012f00 ffff88021ecead20 ffff880234584000 00000000001700d2
Nov 28 00:08:00 vm110 kernel: [3826800.279876] ffff880234584088 ffff880234584024 ffff88000816bed0 ffff8802345840a0
Nov 28 00:08:00 vm110 kernel: [3826800.280951] Call Trace:
Nov 28 00:08:00 vm110 kernel: [3826800.281409] [<ffffffffa0139605>] ? jbd2_log_wait_commit+0x95/0x100 [jbd2]
Nov 28 00:08:00 vm110 kernel: [3826800.282044] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Nov 28 00:08:00 vm110 kernel: [3826800.282651] [<ffffffffa015d770>] ? ext4_sync_file+0x280/0x310 [ext4]
Nov 28 00:08:00 vm110 kernel: [3826800.283265] [<ffffffff811d53fb>] ? do_fsync+0x4b/0x70
Nov 28 00:08:00 vm110 kernel: [3826800.283810] [<ffffffff811d566c>] ? SyS_fsync+0xc/0x10
Nov 28 00:08:00 vm110 kernel: [3826800.284398] [<ffffffff8151168d>] ? system_call_fast_compare_end+0x10/0x15
Nov 28 00:10:00 vm110 kernel: [3826920.284128] INFO: task jbd2/vda1-8:121 blocked for more than 120 seconds.
Nov 28 00:10:00 vm110 kernel: [3826920.284831] Not tainted 3.16.0-4-amd64 #1
Nov 28 00:10:00 vm110 kernel: [3826920.285362] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 28 00:10:00 vm110 kernel: [3826920.286344] jbd2/vda1-8 D ffff880234512728 0 121 2 0x00000000
Nov 28 00:10:00 vm110 kernel: [3826920.287308] ffff8802345122d0 0000000000000046 0000000000012f00 ffff88023403bfd8
Nov 28 00:10:00 vm110 kernel: [3826920.288324] 0000000000012f00 ffff8802345122d0 ffff88023fc937b0 ffff88023ffc71b8
Nov 28 00:10:00 vm110 kernel: [3826920.289308] 0000000000000002 ffffffff811d6b10 ffff88023403bc80 ffff880234584398
Nov 28 00:10:00 vm110 kernel: [3826920.290313] Call Trace:
Nov 28 00:10:00 vm110 kernel: [3826920.290759] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Nov 28 00:10:00 vm110 kernel: [3826920.291357] [<ffffffff8150e139>] ? io_schedule+0x99/0x120
Nov 28 00:10:00 vm110 kernel: [3826920.291929] [<ffffffff811d6b1a>] ? sleep_on_buffer+0xa/0x10
Nov 28 00:10:00 vm110 kernel: [3826920.292526] [<ffffffff8150e4bc>] ? __wait_on_bit+0x5c/0x90
Nov 28 00:10:00 vm110 kernel: [3826920.293093] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Nov 28 00:10:00 vm110 kernel: [3826920.293682] [<ffffffff8150e567>] ? out_of_line_wait_on_bit+0x77/0x90
Nov 28 00:10:00 vm110 kernel: [3826920.294291] [<ffffffff810a7a70>] ? autoremove_wake_function+0x30/0x30
Nov 28 00:10:00 vm110 kernel: [3826920.294923] [<ffffffffa013650e>] ? jbd2_journal_commit_transaction+0x175e/0x1950 [jbd2]
Nov 28 00:10:00 vm110 kernel: [3826920.295948] [<ffffffff810a2b01>] ? pick_next_task_fair+0x6e1/0x820
Nov 28 00:10:00 vm110 kernel: [3826920.296566] [<ffffffffa0139bc2>] ? kjournald2+0xb2/0x240 [jbd2]
Nov 28 00:10:00 vm110 kernel: [3826920.297154] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Nov 28 00:10:00 vm110 kernel: [3826920.297767] [<ffffffffa0139b10>] ? commit_timeout+0x10/0x10 [jbd2]
Nov 28 00:10:00 vm110 kernel: [3826920.298374] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
Nov 28 00:10:00 vm110 kernel: [3826920.298920] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
Nov 28 00:10:00 vm110 kernel: [3826920.299536] [<ffffffff815115d8>] ? ret_from_fork+0x58/0x90
Nov 28 00:10:00 vm110 kernel: [3826920.300118] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
Nov 28 00:10:00 vm110 kernel: [3826920.300753] INFO: task mysqld:13038 blocked for more than 120 seconds.
Nov 28 00:10:00 vm110 kernel: [3826920.301362] Not tainted 3.16.0-4-amd64 #1
Nov 28 00:10:00 vm110 kernel: [3826920.301879] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 28 00:10:00 vm110 kernel: [3826920.302893] mysqld D ffff8802354464e8 0 13038 25023 0x00000000
Nov 28 00:10:00 vm110 kernel: [3826920.303857] ffff880235446090 0000000000000086 0000000000012f00 ffff880008b17fd8
Nov 28 00:10:00 vm110 kernel: [3826920.304869] 0000000000012f00 ffff880235446090 ffff880234584000 00000000001700d1
Nov 28 00:10:00 vm110 kernel: [3826920.305869] ffff880234584088 ffff880234584024 ffff880008b17ed0 ffff8802345840a0
Nov 28 00:10:00 vm110 kernel: [3826920.306859] Call Trace:
Nov 28 00:10:00 vm110 kernel: [3826920.307295] [<ffffffffa0139605>] ? jbd2_log_wait_commit+0x95/0x100 [jbd2]
Nov 28 00:10:00 vm110 kernel: [3826920.307917] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Nov 28 00:10:00 vm110 kernel: [3826920.308534] [<ffffffffa015d770>] ? ext4_sync_file+0x280/0x310 [ext4]
Nov 28 00:10:00 vm110 kernel: [3826920.309143] [<ffffffff811d53fb>] ? do_fsync+0x4b/0x70
Nov 28 00:10:00 vm110 kernel: [3826920.309696] [<ffffffff811d566c>] ? SyS_fsync+0xc/0x10
Nov 28 00:10:00 vm110 kernel: [3826920.310254] [<ffffffff8151168d>] ? system_call_fast_compare_end+0x10/0x15
Nov 28 00:10:00 vm110 kernel: [3826920.310919] INFO: task postdrop:21417 blocked for more than 120 seconds.
Nov 28 00:10:00 vm110 kernel: [3826920.311543] Not tainted 3.16.0-4-amd64 #1
Nov 28 00:10:00 vm110 kernel: [3826920.312083] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 28 00:10:00 vm110 kernel: [3826920.313060] postdrop D ffff88021eceb178 0 21417 21416 0x00000000
Nov 28 00:10:00 vm110 kernel: [3826920.314021] ffff88021ecead20 0000000000000086 0000000000012f00 ffff88000816bfd8
Nov 28 00:10:00 vm110 kernel: [3826920.315007] 0000000000012f00 ffff88021ecead20 ffff880234584000 00000000001700d2
Nov 28 00:10:00 vm110 kernel: [3826920.316017] ffff880234584088 ffff880234584024 ffff88000816bed0 ffff8802345840a0
Nov 28 00:10:00 vm110 kernel: [3826920.316999] Call Trace:
Nov 28 00:10:00 vm110 kernel: [3826920.317437] [<ffffffffa0139605>] ? jbd2_log_wait_commit+0x95/0x100 [jbd2]
Nov 28 00:10:00 vm110 kernel: [3826920.318064] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Nov 28 00:10:00 vm110 kernel: [3826920.318664] [<ffffffffa015d770>] ? ext4_sync_file+0x280/0x310 [ext4]
Nov 28 00:10:00 vm110 kernel: [3826920.319273] [<ffffffff811d53fb>] ? do_fsync+0x4b/0x70
Nov 28 00:10:00 vm110 kernel: [3826920.319822] [<ffffffff811d566c>] ? SyS_fsync+0xc/0x10
Nov 28 00:10:00 vm110 kernel: [3826920.320387] [<ffffffff8151168d>] ? system_call_fast_compare_end+0x10/0x15
Nov 28 00:10:00 vm110 kernel: [3826920.321009] INFO: task cleanup:21418 blocked for more than 120 seconds.
Nov 28 00:10:00 vm110 kernel: [3826920.321620] Not tainted 3.16.0-4-amd64 #1
Nov 28 00:10:00 vm110 kernel: [3826920.322135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 28 00:10:00 vm110 kernel: [3826920.323157] cleanup D ffff8800079f6528 0 21418 1586 0x00000000
Nov 28 00:10:00 vm110 kernel: [3826920.324132] ffff8800079f60d0 0000000000000082 0000000000012f00 ffff880007b73fd8
Nov 28 00:10:00 vm110 kernel: [3826920.325115] 0000000000012f00 ffff8800079f60d0 ffff880234584000 00000000001700d2
Nov 28 00:10:00 vm110 kernel: [3826920.326110] ffff880234584088 ffff880234584024 ffff880007b73ed0 ffff8802345840a0
Nov 28 00:10:00 vm110 kernel: [3826920.327096] Call Trace:
Nov 28 00:10:00 vm110 kernel: [3826920.327532] [<ffffffffa0139605>] ? jbd2_log_wait_commit+0x95/0x100 [jbd2]
Nov 28 00:10:00 vm110 kernel: [3826920.328193] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Nov 28 00:10:00 vm110 kernel: [3826920.328792] [<ffffffffa015d770>] ? ext4_sync_file+0x280/0x310 [ext4]
Nov 28 00:10:00 vm110 kernel: [3826920.329410] [<ffffffff811d53fb>] ? do_fsync+0x4b/0x70
Nov 28 00:10:00 vm110 kernel: [3826920.329959] [<ffffffff811d566c>] ? SyS_fsync+0xc/0x10
Nov 28 00:10:00 vm110 kernel: [3826920.330506] [<ffffffff8151168d>] ? system_call_fast_compare_end+0x10/0x15
Nov 28 00:10:00 vm110 kernel: [3826920.331128] INFO: task cleanup:21456 blocked for more than 120 seconds.
Nov 28 00:10:00 vm110 kernel: [3826920.331740] Not tainted 3.16.0-4-amd64 #1
Nov 28 00:10:00 vm110 kernel: [3826920.332279] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 28 00:10:00 vm110 kernel: [3826920.333252] cleanup D ffff8801041910b8 0 21456 1586 0x00000000
Nov 28 00:10:00 vm110 kernel: [3826920.334203] ffff880104190c60 0000000000000082 0000000000012f00 ffff8800051fffd8
Nov 28 00:10:00 vm110 kernel: [3826920.335177] 0000000000012f00 ffff880104190c60 ffff880234584000 00000000001700d2
Nov 28 00:10:00 vm110 kernel: [3826920.336181] ffff880234584088 ffff880234584024 ffff8800051ffed0 ffff8802345840a0
Nov 28 00:10:00 vm110 kernel: [3826920.337155] Call Trace:
Nov 28 00:10:00 vm110 kernel: [3826920.337586] [<ffffffffa0139605>] ? jbd2_log_wait_commit+0x95/0x100 [jbd2]
Nov 28 00:10:00 vm110 kernel: [3826920.338216] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Nov 28 00:10:00 vm110 kernel: [3826920.338817] [<ffffffffa015d770>] ? ext4_sync_file+0x280/0x310 [ext4]
Nov 28 00:10:00 vm110 kernel: [3826920.339422] [<ffffffff811d53fb>] ? do_fsync+0x4b/0x70
Nov 28 00:10:00 vm110 kernel: [3826920.339965] [<ffffffff811d566c>] ? SyS_fsync+0xc/0x10
Nov 28 00:10:00 vm110 kernel: [3826920.340530] [<ffffffff8151168d>] ? system_call_fast_compare_end+0x10/0x15
Nov 28 00:12:00 vm110 kernel: [3827040.340122] INFO: task jbd2/vda1-8:121 blocked for more than 120 seconds.
Nov 28 00:12:00 vm110 kernel: [3827040.340855] Not tainted 3.16.0-4-amd64 #1
Nov 28 00:12:00 vm110 kernel: [3827040.341382] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 28 00:12:00 vm110 kernel: [3827040.342363] jbd2/vda1-8 D ffff880234512728 0 121 2 0x00000000
Nov 28 00:12:00 vm110 kernel: [3827040.343335] ffff8802345122d0 0000000000000046 0000000000012f00 ffff88023403bfd8
Nov 28 00:12:00 vm110 kernel: [3827040.344344] 0000000000012f00 ffff8802345122d0 ffff88023fc937b0 ffff88023ffc71b8
Nov 28 00:12:00 vm110 kernel: [3827040.345344] 0000000000000002 ffffffff811d6b10 ffff88023403bc80 ffff880234584398
Nov 28 00:12:00 vm110 kernel: [3827040.346350] Call Trace:
Nov 28 00:12:00 vm110 kernel: [3827040.346789] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Nov 28 00:12:00 vm110 kernel: [3827040.347381] [<ffffffff8150e139>] ? io_schedule+0x99/0x120
Nov 28 00:12:00 vm110 kernel: [3827040.347941] [<ffffffff811d6b1a>] ? sleep_on_buffer+0xa/0x10
Nov 28 00:12:00 vm110 kernel: [3827040.348537] [<ffffffff8150e4bc>] ? __wait_on_bit+0x5c/0x90
Nov 28 00:12:00 vm110 kernel: [3827040.349103] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Nov 28 00:12:00 vm110 kernel: [3827040.349691] [<ffffffff8150e567>] ? out_of_line_wait_on_bit+0x77/0x90
Nov 28 00:12:00 vm110 kernel: [3827040.350310] [<ffffffff810a7a70>] ? autoremove_wake_function+0x30/0x30
Nov 28 00:12:00 vm110 kernel: [3827040.350936] [<ffffffffa013650e>] ? jbd2_journal_commit_transaction+0x175e/0x1950 [jbd2]
Nov 28 00:12:00 vm110 kernel: [3827040.351949] [<ffffffff810a2b01>] ? pick_next_task_fair+0x6e1/0x820
Nov 28 00:12:00 vm110 kernel: [3827040.352570] [<ffffffffa0139bc2>] ? kjournald2+0xb2/0x240 [jbd2]
Nov 28 00:12:00 vm110 kernel: [3827040.357659] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Nov 28 00:12:00 vm110 kernel: [3827040.358298] [<ffffffffa0139b10>] ? commit_timeout+0x10/0x10 [jbd2]
Nov 28 00:12:00 vm110 kernel: [3827040.358927] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
Nov 28 00:12:00 vm110 kernel: [3827040.359495] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
Nov 28 00:12:00 vm110 kernel: [3827040.360144] [<ffffffff815115d8>] ? ret_from_fork+0x58/0x90
Nov 28 00:12:00 vm110 kernel: [3827040.360736] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
Nov 28 00:12:00 vm110 kernel: [3827040.361402] INFO: task mysqld:13038 blocked for more than 120 seconds.
Nov 28 00:12:00 vm110 kernel: [3827040.362048] Not tainted 3.16.0-4-amd64 #1
Nov 28 00:12:00 vm110 kernel: [3827040.362592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 28 00:12:00 vm110 kernel: [3827040.363573] mysqld D ffff8802354464e8 0 13038 25023 0x00000000
Nov 28 00:12:00 vm110 kernel: [3827040.364546] ffff880235446090 0000000000000086 0000000000012f00 ffff880008b17fd8
Nov 28 00:12:00 vm110 kernel: [3827040.365538] 0000000000012f00 ffff880235446090 ffff880234584000 00000000001700d1
Nov 28 00:12:00 vm110 kernel: [3827040.366526] ffff880234584088 ffff880234584024 ffff880008b17ed0 ffff8802345840a0
Nov 28 00:12:00 vm110 kernel: [3827040.367512] Call Trace:
Nov 28 00:12:00 vm110 kernel: [3827040.367947] [<ffffffffa0139605>] ? jbd2_log_wait_commit+0x95/0x100 [jbd2]
Nov 28 00:12:00 vm110 kernel: [3827040.368582] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Nov 28 00:12:00 vm110 kernel: [3827040.369188] [<ffffffffa015d770>] ? ext4_sync_file+0x280/0x310 [ext4]
Nov 28 00:12:00 vm110 kernel: [3827040.369798] [<ffffffff811d53fb>] ? do_fsync+0x4b/0x70
Nov 28 00:12:00 vm110 kernel: [3827040.370349] [<ffffffff811d566c>] ? SyS_fsync+0xc/0x10
Nov 28 00:12:00 vm110 kernel: [3827040.370896] [<ffffffff8151168d>] ? system_call_fast_compare_end+0x10/0x15
 
Now, 39 day later it happens again :-(((

VM-Guest: KVM/Jessie-Server
bootdisk: virtio0
cores: 4
ide2: backup:iso/debian-8.2.0-amd64-netinst.iso,media=cdrom
memory: 12288
name: xxx
net0: e1000=xxx,bridge=vmbr0
net1: e1000=xxx,bridge=vmbr1
numa: 0
ostype: l26
smbios1: uuid=614f0f55-e1c5-4a02-8829-49507738929e
sockets: 1
virtio0: local:109/vm-109-disk-1.qcow2,format=qcow2,size=150G


These values are set:
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5

Kernel.log

Dec 15 17:53:15 vm109 kernel: [2176264.688056] Peer 0000:0000:0000:0000:0000:ffff:57a0:3f32:49483/80 unexpectedly shrunk window 2205238651:2205238887 (repaired)
Dec 15 18:10:10 vm109 kernel: [2177280.412138] INFO: task jbd2/vda1-8:122 blocked for more than 120 seconds.
Dec 15 18:10:10 vm109 kernel: [2177280.412890] Not tainted 3.16.0-4-amd64 #1
Dec 15 18:10:10 vm109 kernel: [2177280.413432] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 15 18:10:10 vm109 kernel: [2177280.414441] jbd2/vda1-8 D ffff88032ff4ef38 0 122 2 0x00000000
Dec 15 18:10:10 vm109 kernel: [2177280.415443] ffff88032ff4eae0 0000000000000046 0000000000012f00 ffff88032fb3bfd8
Dec 15 18:10:10 vm109 kernel: [2177280.416487] 0000000000012f00 ffff88032ff4eae0 ffff88033fc937b0 ffff88033ffac778
Dec 15 18:10:10 vm109 kernel: [2177280.417498] 0000000000000002 ffffffff811d6b10 ffff88032fb3bc80 ffff88032fe78398
Dec 15 18:10:10 vm109 kernel: [2177280.418520] Call Trace:
Dec 15 18:10:10 vm109 kernel: [2177280.418972] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Dec 15 18:10:10 vm109 kernel: [2177280.419585] [<ffffffff8150e159>] ? io_schedule+0x99/0x120
Dec 15 18:10:10 vm109 kernel: [2177280.420175] [<ffffffff811d6b1a>] ? sleep_on_buffer+0xa/0x10
Dec 15 18:10:10 vm109 kernel: [2177280.420766] [<ffffffff8150e4dc>] ? __wait_on_bit+0x5c/0x90
Dec 15 18:10:10 vm109 kernel: [2177280.421348] [<ffffffff811d6b10>] ? generic_block_bmap+0x50/0x50
Dec 15 18:10:10 vm109 kernel: [2177280.421952] [<ffffffff8150e587>] ? out_of_line_wait_on_bit+0x77/0x90
Dec 15 18:10:10 vm109 kernel: [2177280.422591] [<ffffffff810a7a70>] ? autoremove_wake_function+0x30/0x30
Dec 15 18:10:10 vm109 kernel: [2177280.423248] [<ffffffffa00fe50e>] ? jbd2_journal_commit_transaction+0x175e/0x1950 [jbd2]
Dec 15 18:10:10 vm109 kernel: [2177280.424281] [<ffffffff810a2b01>] ? pick_next_task_fair+0x6e1/0x820
Dec 15 18:10:10 vm109 kernel: [2177280.424918] [<ffffffffa0101bc2>] ? kjournald2+0xb2/0x240 [jbd2]
Dec 15 18:10:10 vm109 kernel: [2177280.425523] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Dec 15 18:10:10 vm109 kernel: [2177280.426143] [<ffffffffa0101b10>] ? commit_timeout+0x10/0x10 [jbd2]
Dec 15 18:10:10 vm109 kernel: [2177280.426777] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
Dec 15 18:10:10 vm109 kernel: [2177280.427339] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
Dec 15 18:10:10 vm109 kernel: [2177280.427972] [<ffffffff81511618>] ? ret_from_fork+0x58/0x90
Dec 15 18:10:10 vm109 kernel: [2177280.428568] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
Dec 15 18:10:10 vm109 kernel: [2177280.429216] INFO: task mysqld:22898 blocked for more than 120 seconds.
Dec 15 18:10:10 vm109 kernel: [2177280.429843] Not tainted 3.16.0-4-amd64 #1
Dec 15 18:10:10 vm109 kernel: [2177280.430385] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 15 18:10:10 vm109 kernel: [2177280.431396] mysqld D ffff8801d72cbac8 0 22898 695 0x00000000
Dec 15 18:10:10 vm109 kernel: [2177280.432392] ffff8801d72cb670 0000000000000086 0000000000012f00 ffff8801d71b3fd8
Dec 15 18:10:10 vm109 kernel: [2177280.433406] 0000000000012f00 ffff8801d72cb670 ffff88032fe78000 00000000003158ad
Dec 15 18:10:10 vm109 kernel: [2177280.434422] ffff88032fe78088 ffff88032fe78024 ffff8801d71b3ed0 ffff88032fe780a0
Dec 15 18:10:10 vm109 kernel: [2177280.435459] Call Trace:
Dec 15 18:10:10 vm109 kernel: [2177280.435905] [<ffffffffa0101605>] ? jbd2_log_wait_commit+0x95/0x100 [jbd2]
Dec 15 18:10:10 vm109 kernel: [2177280.436559] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
Dec 15 18:10:10 vm109 kernel: [2177280.437181] [<ffffffffa0164770>] ? ext4_sync_file+0x280/0x310 [ext4]
Dec 15 18:10:10 vm109 kernel: [2177280.437811] [<ffffffff811d53fb>] ? do_fsync+0x4b/0x70
Dec 15 18:10:10 vm109 kernel: [2177280.438378] [<ffffffff811d566c>] ? SyS_fsync+0xc/0x10
Dec 15 18:10:10 vm109 kernel: [2177280.438969] [<ffffffff815116cd>] ? system_call_fast_compare_end+0x10/0x15
Dec 15 18:12:10 vm109 kernel: [2177400.436124] INFO: task jbd2/vda1-8:122 blocked for more than 120 seconds.
Dec 15 18:12:10 vm109 kernel: [2177400.436970] Not tainted 3.16.0-4-amd64 #1
Dec 15 18:12:10 vm109 kernel: [2177400.437509] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 15 18:12:10 vm109 kernel: [2177400.438526] jbd2/vda1-8 D ffff88032ff4ef38 0 122 2 0x00000000
Dec 15 18:12:10 vm109 kernel: [2177400.439519] ffff88032ff4eae0 0000000000000046 0000000000012f00 ffff88032fb3bfd8
.....
 
is this a problem?

This VM runs 19 day:
# ps -ef | grep "ksoftirqd"
root 3 2 1 Nov28 ? 05:22:57 [ksoftirqd/0]
root 13 2 0 Nov28 ? 02:12:37 [ksoftirqd/1]
root 18 2 0 Nov28 ? 02:14:31 [ksoftirqd/2]
root 23 2 0 Nov28 ? 02:08:06 [ksoftirqd/3]

this is a sever without Proxmox, which rund about 900 day:
root 4 2 0 2013 ? 00:22:55 [ksoftirqd/0]
root 7 2 0 2013 ? 00:13:44 [ksoftirqd/1]
root 10 2 0 2013 ? 00:06:05 [ksoftirqd/2]
root 13 2 0 2013 ? 00:04:41 [ksoftirqd/3]
root 29873 29827 0 16:09 pts/0 00:00:00 grep ksoftirqd

this one without Proxmox runs 44 day:
root 3 2 0 Nov04 ? 00:01:37 [ksoftirqd/0]
root 10 2 0 Nov04 ? 00:00:38 [ksoftirqd/1]
root 15 2 0 Nov04 ? 00:00:20 [ksoftirqd/2]
root 19 2 0 Nov04 ? 00:00:15 [ksoftirqd/3]
root 334 32730 0 16:10 pts/0 00:00:00 grep ksoftirqd

Google:
Your computer communicates with the devices attached to it through IRQs (interrupt requests). When an interrupt comes from a device, the operating system pauses what it was doing and starts addressing that interrupt.

In some situations IRQs come very very fast one after the other and the operating system cannot finish servicing one before another one arrives. This can happen when a high speed network card receives a very large number of packets in a short time frame.

Because the operating system cannot handle IRQs as they arrive (because they arrive too fast one after the other), the operating system queues them for later processing by a special internal process named ksoftirqd.

If ksoftirqd is taking more than a tiny percentage of CPU time, this indicates the machine is under heavy interrupt load.

--> If ksoftirqd is taking more than a tiny percentage of CPU time, this indicates the machine is under heavy interrupt load.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!