Hi,
I have 3 servers (2xDELL R540 & 1xDELL R510) in a cluster with ceph.
Everything works fine but...
I replaced one old machine with a new one and syslog gives such errors:
Nov 29 05:35:22 pve05 ceph-osd[3957]: 2019-11-29 05:35:22.754 7fc5324f2700 -1 bdev(0x55a4dd670000 /var/lib/ceph/osd/ceph-19/block) read stalled read 0x175d7f10000~80000 (direct) since 67661.6s, timeout is 5s
Nov 29 05:35:22 pve05 ceph-osd[3955]: 2019-11-29 05:35:22.758 7f16a64ec700 -1 bdev(0x55808b7c8000 /var/lib/ceph/osd/ceph-18/block) read stalled read 0x4a4165000~1000 (direct) since 67663.1s, timeout is 5s
Nov 29 05:35:22 pve05 ceph-osd[3955]: 2019-11-29 05:35:22.758 7f16a6ced700 -1 bdev(0x55808b7c8000 /var/lib/ceph/osd/ceph-18/block) read stalled read 0x3a68ef8000~1000 (direct) since 67661.7s, timeout is 5s
All machines have the same PROXMOX version.
All machines have updated firmware to the latest.
All Virtual machines work without any problem but errors appear.
SMART disks do not show any errors.
The network connections on which CEPH operates are tested (replacing 10G cards with others, replacing the switch).
I have no idea what to do next ;-(
I have 3 servers (2xDELL R540 & 1xDELL R510) in a cluster with ceph.
Everything works fine but...
I replaced one old machine with a new one and syslog gives such errors:
Nov 29 05:35:22 pve05 ceph-osd[3957]: 2019-11-29 05:35:22.754 7fc5324f2700 -1 bdev(0x55a4dd670000 /var/lib/ceph/osd/ceph-19/block) read stalled read 0x175d7f10000~80000 (direct) since 67661.6s, timeout is 5s
Nov 29 05:35:22 pve05 ceph-osd[3955]: 2019-11-29 05:35:22.758 7f16a64ec700 -1 bdev(0x55808b7c8000 /var/lib/ceph/osd/ceph-18/block) read stalled read 0x4a4165000~1000 (direct) since 67663.1s, timeout is 5s
Nov 29 05:35:22 pve05 ceph-osd[3955]: 2019-11-29 05:35:22.758 7f16a6ced700 -1 bdev(0x55808b7c8000 /var/lib/ceph/osd/ceph-18/block) read stalled read 0x3a68ef8000~1000 (direct) since 67661.7s, timeout is 5s
All machines have the same PROXMOX version.
All machines have updated firmware to the latest.
All Virtual machines work without any problem but errors appear.
SMART disks do not show any errors.
The network connections on which CEPH operates are tested (replacing 10G cards with others, replacing the switch).
I have no idea what to do next ;-(