we have 10 OSD's . all the same model drive Intel SSD DC S3520 .
the OSD's run on pve systems that have no vm's or mons.
the network switch is 10G. the nics on the nodes are 1G.
I've just 2 test kvm's running on ceph.
from vm's out side of ceph we test by running rsync to one and dovecot backup to the other vm.
after approx 30 minutes the systems hang .
from syslog:
*from system sending data to test2 system:
kvm conf:
ceph: 10.2.6-1~bpo80+1
the systems are 4-drive supermicro X10SLM and X9SCi-LN4F . they have 32GB ecc ram.
Does anyone have some suggestion to solve this issue?
the OSD's run on pve systems that have no vm's or mons.
the network switch is 10G. the nics on the nodes are 1G.
I've just 2 test kvm's running on ceph.
from vm's out side of ceph we test by running rsync to one and dovecot backup to the other vm.
after approx 30 minutes the systems hang .
from syslog:
Code:
Mar 30 14:55:45 ceph-test1 kernel: [ 2089.832107] sd 2:0:0:1: [sdb] abort
Mar 30 14:57:01 ceph-test1 kernel: [ 2165.630246] sd 2:0:0:1: [sdb] abort
Mar 30 14:58:55 ceph-test1 kernel: [ 2280.028104] INFO: task jbd2/sdb1-8:725 blocked for more than 120 seconds.
Mar 30 14:58:55 ceph-test1 kernel: [ 2280.028532] Not tainted 3.16.0-4-amd64 #1
Mar 30 14:58:55 ceph-test1 kernel: [ 2280.028742] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 30 14:58:55 ceph-test1 kernel: [ 2280.029143] jbd2/sdb1-8 D ffff88001a6c3a88 0 725 2 0x00000000
Mar 30 14:58:55 ceph-test1 kernel: [ 2280.029147] ffff88001a6c3630 0000000000000046 0000000000012f40 ffff88001004bfd8
Mar 30 14:58:55 ceph-test1 kernel: [ 2280.029149] 0000000000012f40 ffff88001a6c3630 ffff88001fc137f0 ffff88001ff9e3f0
Mar 30 14:58:55 ceph-test1 kernel: [ 2280.029150] 0000000000000002 ffffffff8113ee30 ffff88001004bbd0 ffff88001004bcb8
Mar 30 14:55:53 ceph-test2 kernel: [ 2090.856057] sd 2:0:0:1: [sdb] abort
Mar 30 14:59:02 ceph-test2 kernel: [ 2280.028096] INFO: task kworker/u2:0:6 blocked for more than 120 seconds.
Mar 30 14:59:02 ceph-test2 kernel: [ 2280.028799] Not tainted 3.16.0-4-amd64 #1
Mar 30 14:59:02 ceph-test2 kernel: [ 2280.029147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 30 14:59:02 ceph-test2 kernel: [ 2280.029828] kworker/u2:0 D ffff88001e5be4a8 0 6 2 0x00000000
Mar 30 14:59:02 ceph-test2 kernel: [ 2280.029852] Workqueue: scsi_tmf_2 scmd_eh_abort_handler [scsi_mod]
Mar 30 14:59:02 ceph-test2 kernel: [ 2280.029855] ffff88001e5be050 0000000000000046 0000000000012f40 ffff88001e5d7fd8
Mar 30 14:59:02 ceph-test2 kernel: [ 2280.029857] 0000000000012f40 ffff88001e5be050 ffff88001e5d7dc8 ffff88001e5d7d60
Mar 30 14:59:02 ceph-test2 kernel: [ 2280.029860] ffff88001e5d7dc0 ffff88001e5be050 0000000000002003 0000000000000040
Code:
# doveadm backup -A remote:10.1.3.105
dsync-local(user1): Error: dsync(localhost.localdomain): I/O has stalled, no activity for 600 seconds
dsync-local(user1): Error: Timeout during state=sync_mails (send=mails recv=recv_last_common)
dsync-local(user1): Error: Remote command process isn't dying, killing it
kvm conf:
Code:
boot: c
bootdisk: scsi0
cores: 1
memory: 512
name: ceph-test1
net0: virtio=1A:64:14:A6:16:3A,bridge=vmbr0,tag=3
numa: 0
ostype: l26
protection: 1
scsi0: ceph-kvm:vm-9001-disk-1,discard=on,size=4G
scsi1: ceph-kvm:vm-9001-disk-2,discard=on,size=200G
scsihw: virtio-scsi-pci
smbios1: uuid=9035f5f9-60e8-42fa-b6ff-5ab38a160365
sockets: 1
boot: c
bootdisk: scsi0
cores: 1
memory: 512
name: ceph-test2
net0: virtio=E2:20:3B:C0:72:F1,bridge=vmbr0,tag=3
numa: 0
ostype: l26
protection: 1
scsi0: ceph-kvm:vm-9002-disk-1,discard=on,size=4G
scsi1: ceph-kvm:vm-9002-disk-2,discard=on,size=50G
scsihw: virtio-scsi-pci
smbios1: uuid=9035f5f9-60e8-42fa-b6ff-5ab38a160365
sockets: 1
ceph: 10.2.6-1~bpo80+1
the systems are 4-drive supermicro X10SLM and X9SCi-LN4F . they have 32GB ecc ram.
Does anyone have some suggestion to solve this issue?
Last edited: