CEPH bad checksum

James Crook

Well-Known Member
Jul 28, 2017
143
2
58
So when trying to run a backup of containers on ceph storage we get a container backup vzdump just hang, until we kill it with from the host.

After some digging I changed the dump location to NFS (from CIFS) and added the patch talked about below.
https://forum.proxmox.com/threads/lxc-backups-hang-via-nfs-and-cifs.46669/#post-224815
and bug logged here
https://bugzilla.proxmox.com/show_bug.cgi?id=1911

I'm now getting "_verify_csum bad crc32c/0x1000 checksum" but it only happens when doing a backup.

24th Host syslog.1
Oct 23 22:05:16 PM1 ceph-osd[7119]: 2018-10-23 22:05:16.764978 7f0c9a798700 -1 bluestore(/var/lib/ceph/osd/ceph-5) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x69000, got 0x6706be76, expected 0x8bb9d12a, device location [0xa181a29000~1000], logical extent 0x3e9000~1000, object #4:3f53234e:::rbd_data.3ab6c374b0dc51.0000000000000373:head#
Oct 23 22:05:16 PM1 ceph-osd[7119]: 2018-10-23 22:05:16.765098 7f0c9a798700 -1 log_channel(cluster) log [ERR] : 4.fc missing primary copy of 4:3f53234e:::rbd_data.3ab6c374b0dc51.0000000000000373:head, will try copies on 4

24th Host osd-ceph-5
2018-10-23 22:05:16.764978 7f0c9a798700 -1 bluestore(/var/lib/ceph/osd/ceph-5) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x69000, got 0x6706be76, expected 0x8bb9d12a, device location [0xa181a29000~1000], logical extent 0x3e9000~1000, object #4:3f53234e:::rbd_data.3ab6c374b0dc51.0000000000000373:head#
2018-10-23 22:05:16.765098 7f0c9a798700 -1 log_channel(cluster) log [ERR] : 4.fc missing primary copy of 4:3f53234e:::rbd_data.3ab6c374b0dc51.0000000000000373:head, will try copies on 4

26th Host syslog.1
Oct 26 04:01:23 PM1 ceph-osd[7903]: 2018-10-26 04:01:23.663112 7f5c972c8700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x1000, got 0x6706be76, expected 0xcd6157f, device location [0x7d422a1000~1000], logical extent 0x1000~1000, object #4:4b97b22c:::rbd_data.3ab7c374b0dc51.00000000000046ad:head#
Oct 26 04:01:23 PM1 ceph-osd[7903]: 2018-10-26 04:01:23.722940 7f5c972c8700 -1 log_channel(cluster) log [ERR] : 4.1d2 missing primary copy of 4:4b97b22c:::rbd_data.3ab7c374b0dc51.00000000000046ad:head, will try copies on 2

26th Host osd-ceph-0
2018-10-26 03:52:53.716689 7f5c97ac9700 0 log_channel(cluster) log [DBG] : 4.1e scrub starts
2018-10-26 03:52:54.115551 7f5c97ac9700 0 log_channel(cluster) log [DBG] : 4.1e scrub ok
2018-10-26 04:01:23.663112 7f5c972c8700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x1000, got 0x6706be76, expected 0xcd6157f, device location [0x7d422a1000~1000], logical extent 0x1000~1000, object #4:4b97b22c:::rbd_data.3ab7c374b0dc51.00000000000046ad:head#
2018-10-26 04:01:23.722940 7f5c972c8700 -1 log_channel(cluster) log [ERR] : 4.1d2 missing primary copy of 4:4b97b22c:::rbd_data.3ab7c374b0dc51.00000000000046ad:head, will try copies on 2
2018-10-26 04:05:56.797317 7f5c97ac9700 0 log_channel(cluster) log [DBG] : 4.140 scrub starts
2018-10-26 04:05:57.092745 7f5c97ac9700 0 log_channel(cluster) log [DBG] : 4.140 scrub ok
2018-10-26 04:06:57.659367 7f5c9f2d8700 4 rocksdb: [/home/builder/source/ceph-12.2.4/src/rocksdb/db/db_impl_write.cc:684] reusing log 8443 from recycle list

Any ideas ?

I tried talking to the powers above me about using ZFS, but they decided CEPH sounded better to them.
 
This might have an impact. Please also update to Ceph 12.2.8, as there is a known bug that can lead to CRC errors and was fixed with 12.2.8.

Seems 12.2.4 is the latest, gonna see if I can upgrade as I found a slightly confusing wiki about upstream ceph and proxmox ceph...
 
B>I>N>G>O
Seems I was missing /etc/apt/sources.list.d/ceph.list so I've created that with the above info.

Will do a backup of the host and containers before I upgrade, and will report back.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!