Hi
I'm running a hyper converged cluster of 4 nodes, with each 3 disks for Ceph. In the Ceph logs, I'm getting these errors. Sometimes the errors disappears for a few hours and then start coming back again.
Does someone knows where these errors are coming from? I have looked in the OSD logs, but they don't have much more information. All disks are new SSDs.
I'm running a hyper converged cluster of 4 nodes, with each 3 disks for Ceph. In the Ceph logs, I'm getting these errors. Sometimes the errors disappears for a few hours and then start coming back again.
Does someone knows where these errors are coming from? I have looked in the OSD logs, but they don't have much more information. All disks are new SSDs.
Code:
2019-04-20 20:46:48.621869 osd.1 osd.1 10.0.15.1:6808/4634 45 : cluster [WRN] slow request 30.781274 seconds old, received at 2019-04-20 20:46:17.840538: osd_op(client.330079.0:2826 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 1635328~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:46:48.621872 osd.1 osd.1 10.0.15.1:6808/4634 46 : cluster [WRN] slow request 30.780455 seconds old, received at 2019-04-20 20:46:17.841356: osd_op(client.330079.0:2827 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 2159616~516096] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:46:48.621874 osd.1 osd.1 10.0.15.1:6808/4634 47 : cluster [WRN] slow request 30.779906 seconds old, received at 2019-04-20 20:46:17.841905: osd_op(client.330079.0:2828 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 2675712~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:46:48.621877 osd.1 osd.1 10.0.15.1:6808/4634 48 : cluster [WRN] slow request 30.779369 seconds old, received at 2019-04-20 20:46:17.842442: osd_op(client.330079.0:2829 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 3200000~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:46:48.621879 osd.1 osd.1 10.0.15.1:6808/4634 49 : cluster [WRN] slow request 30.778854 seconds old, received at 2019-04-20 20:46:17.842957: osd_op(client.330079.0:2830 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 3724288~8192] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:47:18.484178 osd.1 osd.1 10.0.15.1:6808/4634 50 : cluster [WRN] 8 slow requests, 5 included below; oldest blocked for > 2398.468450 secs
2019-04-20 20:47:18.484187 osd.1 osd.1 10.0.15.1:6808/4634 51 : cluster [WRN] slow request 60.643557 seconds old, received at 2019-04-20 20:46:17.840538: osd_op(client.330079.0:2826 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 1635328~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:47:18.484205 osd.1 osd.1 10.0.15.1:6808/4634 52 : cluster [WRN] slow request 60.642739 seconds old, received at 2019-04-20 20:46:17.841356: osd_op(client.330079.0:2827 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 2159616~516096] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:47:18.484214 osd.1 osd.1 10.0.15.1:6808/4634 53 : cluster [WRN] slow request 60.642190 seconds old, received at 2019-04-20 20:46:17.841905: osd_op(client.330079.0:2828 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 2675712~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:47:18.484222 osd.1 osd.1 10.0.15.1:6808/4634 54 : cluster [WRN] slow request 60.641652 seconds old, received at 2019-04-20 20:46:17.842442: osd_op(client.330079.0:2829 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 3200000~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:47:18.484227 osd.1 osd.1 10.0.15.1:6808/4634 55 : cluster [WRN] slow request 60.641138 seconds old, received at 2019-04-20 20:46:17.842957: osd_op(client.330079.0:2830 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 3724288~8192] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:47:44.131137 osd.0 osd.0 10.0.15.1:6804/4458 21 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 480.162172 secs
2019-04-20 20:47:44.131141 osd.0 osd.0 10.0.15.1:6804/4458 22 : cluster [WRN] slow request 480.162172 seconds old, received at 2019-04-20 20:39:43.968933: osd_op(client.317399.0:83947 1.13e 1:7ce8dce9:::rbd_data.cade6b8b4567.000000000000047d:head [write 3903488~8192] snapc 0=[] ondisk+write+known_if_redirected e128) currently sub_op_commit_rec from 8
2019-04-20 20:47:38.969191 osd.5 osd.5 10.0.15.2:6800/4117 47 : cluster [WRN] 3 slow requests, 1 included below; oldest blocked for > 2418.973605 secs
2019-04-20 20:47:38.969206 osd.5 osd.5 10.0.15.2:6800/4117 48 : cluster [WRN] slow request 480.606888 seconds old, received at 2019-04-20 20:39:38.362245: osd_op(client.317399.0:83856 1.b5 1:ad3befa2:::rbd_data.cade6b8b4567.0000000000000aff:head [write 3215360~8192] snapc 0=[] ondisk+write+known_if_redirected e128) currently sub_op_commit_rec from 7
2019-04-20 20:48:10.564216 osd.11 osd.11 10.0.15.4:6800/3880 17 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 480.697435 secs
2019-04-20 20:48:10.564223 osd.11 osd.11 10.0.15.4:6800/3880 18 : cluster [WRN] slow request 480.697435 seconds old, received at 2019-04-20 20:40:09.866755: osd_op(client.317399.0:83959 1.167 1:e68d1179:::rbd_data.cade6b8b4567.0000000000000b03:head [write 1552384~4096] snapc 0=[] ondisk+write+known_if_redirected e128) currently sub_op_commit_rec from 6
2019-04-20 20:48:18.209207 osd.1 osd.1 10.0.15.1:6808/4634 56 : cluster [WRN] 8 slow requests, 5 included below; oldest blocked for > 2458.193526 secs
2019-04-20 20:48:18.209210 osd.1 osd.1 10.0.15.1:6808/4634 57 : cluster [WRN] slow request 120.368633 seconds old, received at 2019-04-20 20:46:17.840538: osd_op(client.330079.0:2826 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 1635328~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:48:18.209213 osd.1 osd.1 10.0.15.1:6808/4634 58 : cluster [WRN] slow request 120.367814 seconds old, received at 2019-04-20 20:46:17.841356: osd_op(client.330079.0:2827 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 2159616~516096] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:48:18.209215 osd.1 osd.1 10.0.15.1:6808/4634 59 : cluster [WRN] slow request 120.367266 seconds old, received at 2019-04-20 20:46:17.841905: osd_op(client.330079.0:2828 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 2675712~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:48:18.209219 osd.1 osd.1 10.0.15.1:6808/4634 60 : cluster [WRN] slow request 120.366728 seconds old, received at 2019-04-20 20:46:17.842442: osd_op(client.330079.0:2829 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 3200000~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:48:18.209222 osd.1 osd.1 10.0.15.1:6808/4634 61 : cluster [WRN] slow request 120.366214 seconds old, received at 2019-04-20 20:46:17.842957: osd_op(client.330079.0:2830 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 3724288~8192] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
2019-04-20 20:48:31.307597 osd.7 osd.7 10.0.15.3:6800/4070 46 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 480.182633 secs
2019-04-20 20:48:31.307617 osd.7 osd.7 10.0.15.3:6800/4070 47 : cluster [WRN] slow request 480.182633 seconds old, received at 2019-04-20 20:40:31.124905: osd_op(client.316949.0:27762 1.20 1:0478ad06:::rbd_data.a41f6b8b4567.000000000000105c:head [write 3428352~4096] snapc 0=[] ondisk+write+known_if_redirected e128) currently sub_op_commit_rec from 2
Code:
proxmox-ve: 5.4-1 (running kernel: 4.15.18-12-pve)
pve-manager: 5.4-3 (running version: 5.4-3/0a6eaa62)
pve-kernel-4.15: 5.3-3
pve-kernel-4.15.18-12-pve: 4.15.18-35
ceph: 12.2.11-pve1
Code:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 20.95789 root default
-3 5.23947 host anr1-a-hcz01s01
0 ssd 1.74649 osd.0 up 1.00000 1.00000
1 ssd 1.74649 osd.1 up 1.00000 1.00000
2 ssd 1.74649 osd.2 up 1.00000 1.00000
-5 5.23947 host anr1-a-hcz01s02
3 ssd 1.74649 osd.3 up 1.00000 1.00000
4 ssd 1.74649 osd.4 up 1.00000 1.00000
5 ssd 1.74649 osd.5 up 1.00000 1.00000
-7 5.23947 host anr1-a-hcz01s03
6 ssd 1.74649 osd.6 up 1.00000 1.00000
7 ssd 1.74649 osd.7 up 1.00000 1.00000
8 ssd 1.74649 osd.8 up 1.00000 1.00000
-9 5.23947 host anr1-a-hcz01s04
9 ssd 1.74649 osd.9 up 1.00000 1.00000
10 ssd 1.74649 osd.10 up 1.00000 1.00000
11 ssd 1.74649 osd.11 up 1.00000 1.00000
Code:
HEALTH_ERR 10 slow requests are blocked > 32 sec. Implicated osds 0,7,11; 6 stuck requests are blocked > 4096 sec. Implicated osds 1,5,8
REQUEST_SLOW 10 slow requests are blocked > 32 sec. Implicated osds 0,7,11
9 ops are blocked > 2097.15 sec
1 ops are blocked > 262.144 sec
osds 0,7,11 have blocked requests > 2097.15 sec
REQUEST_STUCK 6 stuck requests are blocked > 4096 sec. Implicated osds 1,5,8
6 ops are blocked > 4194.3 sec
osds 1,5,8 have stuck requests > 4194.3 sec
Code:
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.0.15.0/24
fsid = 7d745d6c-5523-4c5e-89dd-a19a4306f8f0
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 10.0.15.0/24
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
[mon.anr1-a-hcz01s02]
host = anr1-a-hcz01s02
mon addr = 10.0.15.2:6789
[mon.anr1-a-hcz01s03]
host = anr1-a-hcz01s03
mon addr = 10.0.15.3:6789
[mon.anr1-a-hcz01s01]
host = anr1-a-hcz01s01
mon addr = 10.0.15.1:6789
Last edited: