Hyperconverged cluster logging seemingly random crc errors

lifeboy

Renowned Member
We have 4 nodes (dual Xeon CPU's, 256G RAM, 4 NVMe SSD's, 4 HDD's and dual Melanox 25Gb/s sfp's) in a cluster. Randomly I have started noticing crc errors in the osd logs.

Node B, osd.6
2025-10-23T10:32:59.808+0200 7f22a75bf700 0 bad crc in data 3330350463 != exp 677417498 from v1:192.168.131.4:0/3121668685
192.168.131.4 is node D

Node B, osd.7
2025-10-23T09:35:12.995+0200 7fbcdbcd7700 0 bad crc in data 3922083958 != exp 3479198006 from v1:192.168.131.2:0/2732728486
192.168.131.2 is node B, which is the node osd.7 is on.

and so there are others on other nodes and osd's. From what I understand this means that data copied from some other osd to this one that logs the error fails the crc test. However, I have taken (as a test) one of these ssd's out of the cluster and it tests just fine. I put it back and no crc errors are logged for it.

Question: Can something else we causing this? A network connector? It seems pretty random so me, so how can I trace this sources of this?