Hi,
I’m facing CRC errors in a 3-node Proxmox ( 6.4-13 ) cluster using Ceph (14.2.22 Nautilus).
bad crc/signature
libceph: data crc != expected
kernel: libceph: osd xxx.xxx.xxx:6811 socket closed (con state OPEN)
Key points:
Inside VMs we see kernel messages like:
INFO: task blocked for more than 120 seconds
ceph config get osd.* bluestore_compression_algorithm → snappy
ceph config get osd.* bluestore_compression_mode → none
*compression is disabled, but still reports as broken
1. How to fix it?
2. Are there known issues with checksum offloading (TSO/GSO/GRO) causing Ceph CRC mismatches?
3. Has anyone seen CRC errors on local Ceph traffic (same node)?
4. Could this be related to Proxmox firewall bridges (fwbr/fwln)?
5. Any known bugs in Nautilus (14.2.22) related to this?
I’m facing CRC errors in a 3-node Proxmox ( 6.4-13 ) cluster using Ceph (14.2.22 Nautilus).
Problem
All nodes report:bad crc/signature
libceph: data crc != expected
kernel: libceph: osd xxx.xxx.xxx:6811 socket closed (con state OPEN)
Key points:
- Happens on all nodes
- Also occurs with local traffic (same host IP)
- Ceph health shows BlueStore compression broken
- Restarting a node temporarily removes CRC errors
Impact
VMs running on the cluster freeze / hang.Inside VMs we see kernel messages like:
INFO: task blocked for more than 120 seconds
Environment
- Proxmox VE cluster (3 nodes, 12 OSDs)
- MTU 9000 (jumbo frames)
- Open vSwitch (ovs-system)
- Interfaces: tap, fwbr, fwln
- Errors occur on all 3 nodes
- Errors also appear when source IP is the same node
- This suggests the issue is not limited to inter-node communication
- Ceph health shows:
- HEALTH_WARN
- BlueStore compression broken on 4 OSD(s)
Compression config
ceph config get osd.* bluestore_compression_algorithm → snappy
ceph config get osd.* bluestore_compression_mode → none
*compression is disabled, but still reports as broken
Checks done
- ethtool -S → no NIC errors
- CRC issues still present
- Errors appear even on same-node traffic
- Suggests issue not at physical NIC level
- CRC errors occur even for local traffic (same host)
- No hardware-level NIC errors detected
- Environment uses:
- jumbo frames (MTU 9000)
- virtual networking stack (OVS + tap + fwbr)
1. How to fix it?
2. Are there known issues with checksum offloading (TSO/GSO/GRO) causing Ceph CRC mismatches?
3. Has anyone seen CRC errors on local Ceph traffic (same node)?
4. Could this be related to Proxmox firewall bridges (fwbr/fwln)?
5. Any known bugs in Nautilus (14.2.22) related to this?
Last edited: