Hello
Yesterday we swapped out an osd , and most of our test kvm's had disk issues.
from dmesg on 2 systems:
and another system:
pveversion -v
today i got more of the same when I had had taken a node off for maintenance. set noout was done 1st.
here is settings for 2 of the the vm's
Yesterday we swapped out an osd , and most of our test kvm's had disk issues.
from dmesg on 2 systems:
Code:
[61499.871239] sd 2:0:0:2: [sdb] abort
[61499.871297] sd 2:0:0:2: [sdb] abort
[63670.824251] sd 2:0:0:2: [sdb] abort
[63840.080115] INFO: task jbd2/sda1-8:133 blocked for more than 120 seconds.
[63840.080821] Tainted: P O 3.16.0-4-amd64 #1
[63840.081199] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[63840.081861] jbd2/sda1-8 D ffff880139834668 0 133 2 0x00000000
[63840.081867] ffff880139834210 0000000000000046 0000000000012f40 ffff880037267fd8
[63840.081869] 0000000000012f40 ffff880139834210 ffff88013fd937f0 ffff88013ffca8e0
[63840.081872] 0000000000000002 ffffffff811d9e70 ffff880037267c80 ffff880019fa32e0
[63840.081875] Call Trace:
[63840.081883] [<ffffffff811d9e70>] ? generic_block_bmap+0x50/0x50
[63840.081888] [<ffffffff81517959>] ? io_schedule+0x99/0x120
[63840.081891] [<ffffffff811d9e7a>] ? sleep_on_buffer+0xa/0x10
[63840.081893] [<ffffffff81517cdc>] ? __wait_on_bit+0x5c/0x90
[63840.081896] [<ffffffff811d9e70>] ? generic_block_bmap+0x50/0x50
[63840.081912] [<ffffffff81517d87>] ? out_of_line_wait_on_bit+0x77/0x90
[63840.081917] [<ffffffff810a95f0>] ? autoremove_wake_function+0x30/0x30
[63840.081932] [<ffffffffa0154be1>] ? jbd2_journal_commit_transaction+0xe91/0x1a30 [jbd2]
[63840.081938] [<ffffffff810a4323>] ? pick_next_task_fair+0x3e3/0x820
[63840.081943] [<ffffffffa0158d92>] ? kjournald2+0xb2/0x240 [jbd2]
[63840.081946] [<ffffffff810a95c0>] ? prepare_to_wait_event+0xf0/0xf0
[63840.081950] [<ffffffffa0158ce0>] ? commit_timeout+0x10/0x10 [jbd2]
[63840.081955] [<ffffffff810894fd>] ? kthread+0xbd/0xe0
[63840.081958] [<ffffffff81089440>] ? kthread_create_on_node+0x180/0x180
[63840.081961] [<ffffffff8151ad98>] ? ret_from_fork+0x58/0x90
[63840.081964] [<ffffffff81089440>] ? kthread_create_on_node+0x180/0x180
..
[63840.081983] INFO: task rs:main Q:Reg:629 blocked for more than 120 seconds.
[63840.082425] Tainted: P O 3.16.0-4-amd64 #1
[63840.082822] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[63840.083507] rs:main Q:Reg D ffff880139f7d848 0 629 1 0x00000000
[63840.083511] ffff880139f7d3f0 0000000000000086 0000000000012f40 ffff88013a30bfd8
[63840.083514] 0000000000012f40 ffff880139f7d3f0 ffff88013fc937f0 ffff88013ffb0d08
[63840.083516] 0000000000000002 ffffffffa01516c0 ffff88013a30bb30 ffff88010eb59d28
[63840.083518] Call Trace:
..
[63840.083653] INFO: task kworker/u8:2:6335 blocked for more than 120 seconds.
[63840.084130] Tainted: P O 3.16.0-4-amd64 #1
[63840.084530] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[63840.085276] kworker/u8:2 D ffff880139efe528 0 6335 2 0x00000000
[63840.085283] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
[63840.085286] ffff880139efe0d0 0000000000000046 0000000000012f40 ffff8800ba3f7fd8
[63840.085288] 0000000000012f40 ffff880139efe0d0 ffff88013fd937f0 ffff88013ffa7f60
[63840.085291] 0000000000000002 ffffffffa01516c0 ffff8800ba3f7780 ffff88013b381d90
..
[63840.085447] INFO: task kworker/u8:0:14095 blocked for more than 120 seconds.
[63840.085921] Tainted: P O 3.16.0-4-amd64 #1
[63840.086303] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[63840.086836] kworker/u8:0 D ffff880036c56f38 0 14095 2 0x00000000
[63840.086849] Workqueue: scsi_tmf_2 scmd_eh_abort_handler [scsi_mod]
[63840.086850] ffff880036c56ae0 0000000000000046 0000000000012f40 ffff880065977fd8
[63840.086852] 0000000000012f40 ffff880036c56ae0 ffff880065977dc8 ffff880065977d60
[63840.086854] ffff880065977dc0 ffff880036c56ae0 0000000000002003 0000000000000100
and another system:
Code:
[ 3.473724] random: nonblocking pool is initialized
[ 51.880058] sd 2:0:0:0: [sda] abort
[ 240.032080] INFO: task kworker/u2:0:6 blocked for more than 120 seconds.
[ 240.032451] Not tainted 3.16.0-4-amd64 #1
[ 240.032728] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.033083] kworker/u2:0 D ffff88001e5be4a8 0 6 2 0x00000000
[ 240.033107] Workqueue: scsi_tmf_2 scmd_eh_abort_handler [scsi_mod]
[ 240.033109] ffff88001e5be050 0000000000000046 0000000000012f40 ffff88001e5d7fd8
[ 240.033112] 0000000000012f40 ffff88001e5be050 ffff88001e5d7dc8 ffff88001e5d7d60
[ 240.033115] ffff88001e5d7dc0 ffff88001e5be050 0000000000002003 0000000000000040
..
[ 240.033175] INFO: task kworker/u2:1:33 blocked for more than 120 seconds.
[ 240.033497] Not tainted 3.16.0-4-amd64 #1
[ 240.033744] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.034099] kworker/u2:1 D ffff88001a67a6e8 0 33 2 0x00000000
[ 240.034107] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
[ 240.034110] ffff88001a67a290 0000000000000046 0000000000012f40 ffff88001a6a7fd8
[ 240.034112] 0000000000012f40 ffff88001a67a290 ffff88001fc137f0 ffff88001ffae728
[ 240.034114] 0000000000000002 ffffffff811d9e70 ffff88001a6a7670 0000000000000000
[ 240.034304] INFO: task jbd2/sda1-8:100 blocked for more than 120 seconds.
[ 240.034638] Not tainted 3.16.0-4-amd64 #1
[ 240.034883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.035242] jbd2/sda1-8 D ffff88001f74aeb8 0 100 2 0x00000000
[ 240.035245] ffff88001f74aa60 0000000000000046 0000000000012f40 ffff88001a737fd8
[ 240.035247] 0000000000012f40 ffff88001f74aa60 ffff88001fc137f0 ffff88001ffa0490
[ 240.035250] 0000000000000002 ffffffff8113ee30 ffff88001a737bd0 ffff88001a737cb0
pveversion -v
Code:
proxmox-ve: 4.4-82 (running kernel: 4.4.40-1-pve)
pve-manager: 4.4-12 (running version: 4.4-12/e71b7a74)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.40-1-pve: 4.4.40-82
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-109
pve-firmware: 1.1-10
libpve-common-perl: 4.0-92
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-1
pve-docs: 4.4-3
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-94
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-3
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
ceph: 10.2.5-1~bpo80+1
today i got more of the same when I had had taken a node off for maintenance. set noout was done 1st.
here is settings for 2 of the the vm's
Code:
balloon: 1024
bootdisk: scsi0
cores: 2
memory: 2048
name: imap2
net0: virtio=62:65:36:65:30:38,bridge=vmbr0,tag=3
numa: 0
onboot: 1
ostype: l26
protection: 1
scsi0: ceph-kvm:vm-8110-disk-1,discard=on,size=50G
scsihw: virtio-scsi-pci
smbios1: uuid=195cf837-ebaa-49c2-95e9-5ba7a0869cb0
sockets: 1
Code:
boot: c
bootdisk: scsi0
cores: 1
memory: 4096
name: rsnapshot-2017
net0: virtio=32:ED:A6:09:95:B0,bridge=vmbr0,tag=3
numa: 0
onboot: 1
ostype: l26
scsi0: ceph-kvm:vm-150-disk-2,discard=on,size=4G
scsi2: ceph-kvm:vm-150-disk-3,discard=on,size=500G
scsihw: virtio-scsi-pci
smbios1: uuid=9035f5f9-60e8-42fa-b6ff-5ab38a160365
sockets: 4