osd disk problem ? hmm...

WEGA

Member
Nov 21, 2019
3
0
21
46
Hi,
I have 3 servers (2xDELL R540 & 1xDELL R510) in a cluster with ceph.
Everything works fine but...
I replaced one old machine with a new one and syslog gives such errors:

Nov 29 05:35:22 pve05 ceph-osd[3957]: 2019-11-29 05:35:22.754 7fc5324f2700 -1 bdev(0x55a4dd670000 /var/lib/ceph/osd/ceph-19/block) read stalled read 0x175d7f10000~80000 (direct) since 67661.6s, timeout is 5s
Nov 29 05:35:22 pve05 ceph-osd[3955]: 2019-11-29 05:35:22.758 7f16a64ec700 -1 bdev(0x55808b7c8000 /var/lib/ceph/osd/ceph-18/block) read stalled read 0x4a4165000~1000 (direct) since 67663.1s, timeout is 5s
Nov 29 05:35:22 pve05 ceph-osd[3955]: 2019-11-29 05:35:22.758 7f16a6ced700 -1 bdev(0x55808b7c8000 /var/lib/ceph/osd/ceph-18/block) read stalled read 0x3a68ef8000~1000 (direct) since 67661.7s, timeout is 5s


All machines have the same PROXMOX version.
All machines have updated firmware to the latest.
All Virtual machines work without any problem but errors appear.
SMART disks do not show any errors.
The network connections on which CEPH operates are tested (replacing 10G cards with others, replacing the switch).

I have no idea what to do next ;-(
 
Hi,
are there any other errors? What do ceph -s and ceph -w show? Just to make sure, the OSDs in question (ceph-18, ceph-19) are on the new machine?
 
any other errors ;-( ,,, yes ceph-18, ceph-19 are on new machine (DELL R540)

root@pve05:~# ceph -s
cluster:
id: 68ed1284-ff0b-4ac0-9de9-3e7c2ab6fe9a
health: HEALTH_OK

services:
mon: 2 daemons, quorum pve03,pve04 (age 11m)
mgr: pve03(active, since 8d), standbys: pve05, pve04
mds: backup:1 {0=pve05=up:active} 2 up:standby
osd: 15 osds: 15 up (since 8d), 15 in (since 2w)

data:
pools: 5 pools, 640 pgs
objects: 778.90k objects, 2.9 TiB
usage: 9.0 TiB used, 19 TiB / 28 TiB avail
pgs: 640 active+clean

io:
client: 2.3 KiB/s rd, 6.2 MiB/s wr, 0 op/s rd, 74 op/s wr

-------------
root@pve05:~# ceph -w
cluster:
id: 68ed1284-ff0b-4ac0-9de9-3e7c2ab6fe9a
health: HEALTH_OK

services:
mon: 2 daemons, quorum pve03,pve04 (age 12m)
mgr: pve03(active, since 8d), standbys: pve05, pve04
mds: backup:1 {0=pve05=up:active} 2 up:standby
osd: 15 osds: 15 up (since 8d), 15 in (since 2w)

data:
pools: 5 pools, 640 pgs
objects: 778.90k objects, 2.9 TiB
usage: 9.0 TiB used, 19 TiB / 28 TiB avail
pgs: 640 active+clean

io:
client: 8.3 KiB/s rd, 485 KiB/s wr, 0 op/s rd, 49 op/s wr
 
1. Is it possible that the new machine is configured to use a RAID controller? It is highly recommended to use HBA instead of RAID, since Ceph is designed to handle disks directly (see here, section "Avoid RAID").

2. This thread on the Ceph user mailing list contains a few pointers, especially if you have a non-trivial/modified network configuration.

3. Could you check the logs in /var/log/ceph from around the time and before the stalls happen?
 
1. Is it possible that the new machine is configured to use a RAID controller? It is highly recommended to use HBA instead of RAID, since Ceph is designed to handle disks directly (see here, section "Avoid RAID").

2. This thread on the Ceph user mailing list contains a few pointers, especially if you have a non-trivial/modified network configuration.

3. Could you check the logs in /var/log/ceph from around the time and before the stalls happen?

Hello :)

1) I removed the raid
2) I set up the HBA
3) I installed everything again
4) I connected to cluster
5) I set up Ceph
6) I created OSDs ...
7) 14 hours of synchronization

0 Errors !!
Everything works perfectly !!
Thank you very much :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!