Hello.
I have a Proxmox cluster running 6.2-10 with three PVE instances. I've noticed in dmesg on a few of them the following:
[4567952.775198] blk_update_request: critical medium error, dev sde, sector 447414733 op 0x0READ) flags 0x1000 phys_seg 8 prio class 0
[4567952.814029] XFS (sde1): metadata I/O error in "xlog_recover_do..(read#2)" at daddr 0x1a0af988 len 128 error 61
[4829504.226740] blk_update_request: critical medium error, dev sdh, sector 897316535 op 0x0READ) flags 0x0 phys_seg 1 prio class 0
ceph_health_detail previously reported some inconsistencies, but has been repaired with ceph pg repair and the problem haven't returned.
I have validated that all drives are OK in iLO for the three servers and smartctl command shows that the SMART health status reports OK.
I actually found this while running tar on a guest which returned the following in dmesg:
[68491.300238] XFS (dm-2): Metadata CRC error detected at xfs_dir3_block_read_verify+0x5e/0x110 [xfs], xfs_dir3_block block 0x707d300
[68491.305294] XFS (dm-2): Unmount and run xfs_repair
[68491.308450] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
[68491.320711] XFS (dm-2): metadata I/O error: block 0x707d300 ("xfs_trans_read_buf_map") error 74 numblks
tar itself returned:
/bin/tar: path-to-file: Cannot open: Structure needs cleaning
For sure there is a filesystem inconsistency. I am not sure where to start, but I think I need to try to repair the affected drives with xfs_repair. There is maybe another way to go on with this? Right now I am stuck and don't know what the next steps would be as CEPH isn't my biggest strength. I've googled a lot on this and haven't really found a solution on how to proceed with this. I really appreciate any advice I can get.
Thanks
I have a Proxmox cluster running 6.2-10 with three PVE instances. I've noticed in dmesg on a few of them the following:
[4567952.775198] blk_update_request: critical medium error, dev sde, sector 447414733 op 0x0READ) flags 0x1000 phys_seg 8 prio class 0
[4567952.814029] XFS (sde1): metadata I/O error in "xlog_recover_do..(read#2)" at daddr 0x1a0af988 len 128 error 61
[4829504.226740] blk_update_request: critical medium error, dev sdh, sector 897316535 op 0x0READ) flags 0x0 phys_seg 1 prio class 0
ceph_health_detail previously reported some inconsistencies, but has been repaired with ceph pg repair and the problem haven't returned.
I have validated that all drives are OK in iLO for the three servers and smartctl command shows that the SMART health status reports OK.
I actually found this while running tar on a guest which returned the following in dmesg:
[68491.300238] XFS (dm-2): Metadata CRC error detected at xfs_dir3_block_read_verify+0x5e/0x110 [xfs], xfs_dir3_block block 0x707d300
[68491.305294] XFS (dm-2): Unmount and run xfs_repair
[68491.308450] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
[68491.320711] XFS (dm-2): metadata I/O error: block 0x707d300 ("xfs_trans_read_buf_map") error 74 numblks
tar itself returned:
/bin/tar: path-to-file: Cannot open: Structure needs cleaning
For sure there is a filesystem inconsistency. I am not sure where to start, but I think I need to try to repair the affected drives with xfs_repair. There is maybe another way to go on with this? Right now I am stuck and don't know what the next steps would be as CEPH isn't my biggest strength. I've googled a lot on this and haven't really found a solution on how to proceed with this. I really appreciate any advice I can get.
Thanks