Ceph CRC errors (bad crc/signature) on all nodes + VM freezes

Mcarius

New Member
Apr 20, 2026
1
0
1
Hi,

I’m facing CRC errors in a 3-node Proxmox ( 6.4-13 ) cluster using Ceph (14.2.22 Nautilus).

Problem

All nodes report:

bad crc/signature
libceph: data crc != expected

kernel: libceph: osd xxx.xxx.xxx:6811 socket closed (con state OPEN)

Key points:
  • Happens on all nodes
  • Also occurs with local traffic (same host IP)
  • Ceph health shows BlueStore compression broken
  • Restarting a node temporarily removes CRC errors

Impact

VMs running on the cluster freeze / hang.

Inside VMs we see kernel messages like:
INFO: task blocked for more than 120 seconds

Environment

  • Proxmox VE cluster (3 nodes, 12 OSDs)
  • MTU 9000 (jumbo frames)
  • Open vSwitch (ovs-system)
  • Interfaces: tap, fwbr, fwln
Important observations:
  • Errors occur on all 3 nodes
  • Errors also appear when source IP is the same node
  • This suggests the issue is not limited to inter-node communication
  • Ceph health shows:
    • HEALTH_WARN
    • BlueStore compression broken on 4 OSD(s)

Compression config


ceph config get osd.* bluestore_compression_algorithm → snappy
ceph config get osd.* bluestore_compression_mode → none

*compression is disabled, but still reports as broken

Checks done


  • ethtool -S → no NIC errors
  • CRC issues still present
  • Errors appear even on same-node traffic
  • Suggests issue not at physical NIC level
Observations
  • CRC errors occur even for local traffic (same host)
  • No hardware-level NIC errors detected
  • Environment uses:
    • jumbo frames (MTU 9000)
    • virtual networking stack (OVS + tap + fwbr)
Questions

1. How to fix it?
2. Are there known issues with checksum offloading (TSO/GSO/GRO) causing Ceph CRC mismatches?
3. Has anyone seen CRC errors on local Ceph traffic (same node)?
4. Could this be related to Proxmox firewall bridges (fwbr/fwln)?
5. Any known bugs in Nautilus (14.2.22) related to this?
 
Last edited:
Hi,

your PVE and Ceph versions are already long EOL, you should upgrade soon as possible.

About those CRC errors, these were in the past only warnings and i didn't faced any vm freezes - problem could be somewhere else.
You can read more about it here:

https://bugzilla.proxmox.com/show_bug.cgi?id=5779

Which got set by default with PVE 9.1

Greetz