OSD have spurious read error

Ting

Member
Oct 19, 2021
99
4
13
56
I have a 4 nodes cluster proxmox 7.1-5, ceph 16.2.6, today ceph report an error:

1 OSD(s) have spurious read errors
osd.3 reads with retries: 1

No sure how to deal with this error, I checked smart for this ssd disk, no error show.

Anybody could possible give me some sugguestion?
 
I guess there is no action need to take, because ceph has self recovered. But, still, I do not know how to reset this warning message, over 24 hours now, ceph health still showing “health warn”.

Anybody could possible offer some suggestions?
 
I am trying to follow up on my last issue, 1 OSD(s) have spurious read errors message.

Now, I had done twice scrub and depp scrub, no more error messages, but this old warning message still there. Anybody can let me know how to get clear or achieve this old warning message. I can disable this error message, but not sure that is correct way to go. Because I do not want disable this forever, in case next time new errors occurs on this osd. any idea?
 
I got the same error message as OP. After checking the OSD/SSD I was then able to clear the error by restarting the OSD using the GUI. I got the suggestion to restart the OSD from ChatGPT.
 
  • Like
Reactions: cetjunior
Be careful with the suggestions made by ChtGPT, in many cases it gives advice from the manual of outdated versions.
In this particular case the advice proved useful, for me too, restarting the osd solved the error message.
The read error was caused when a cluster node fell out of the ceph cluster.
After you make sure that you have restored the node of the OSD and the status of the CEPH disks and redundancy is ok, you can start the osd reset.
 
  • Like
Reactions: flames