So when trying to run a backup of containers on ceph storage we get a container backup vzdump just hang, until we kill it with from the host.
After some digging I changed the dump location to NFS (from CIFS) and added the patch talked about below.
https://forum.proxmox.com/threads/lxc-backups-hang-via-nfs-and-cifs.46669/#post-224815
and bug logged here
https://bugzilla.proxmox.com/show_bug.cgi?id=1911
I'm now getting "_verify_csum bad crc32c/0x1000 checksum" but it only happens when doing a backup.
24th Host syslog.1
Oct 23 22:05:16 PM1 ceph-osd[7119]: 2018-10-23 22:05:16.764978 7f0c9a798700 -1 bluestore(/var/lib/ceph/osd/ceph-5) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x69000, got 0x6706be76, expected 0x8bb9d12a, device location [0xa181a29000~1000], logical extent 0x3e9000~1000, object #4:3f53234e:::rbd_data.3ab6c374b0dc51.0000000000000373:head#
Oct 23 22:05:16 PM1 ceph-osd[7119]: 2018-10-23 22:05:16.765098 7f0c9a798700 -1 log_channel(cluster) log [ERR] : 4.fc missing primary copy of 4:3f53234e:::rbd_data.3ab6c374b0dc51.0000000000000373:head, will try copies on 4
24th Host osd-ceph-5
2018-10-23 22:05:16.764978 7f0c9a798700 -1 bluestore(/var/lib/ceph/osd/ceph-5) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x69000, got 0x6706be76, expected 0x8bb9d12a, device location [0xa181a29000~1000], logical extent 0x3e9000~1000, object #4:3f53234e:::rbd_data.3ab6c374b0dc51.0000000000000373:head#
2018-10-23 22:05:16.765098 7f0c9a798700 -1 log_channel(cluster) log [ERR] : 4.fc missing primary copy of 4:3f53234e:::rbd_data.3ab6c374b0dc51.0000000000000373:head, will try copies on 4
26th Host syslog.1
Oct 26 04:01:23 PM1 ceph-osd[7903]: 2018-10-26 04:01:23.663112 7f5c972c8700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x1000, got 0x6706be76, expected 0xcd6157f, device location [0x7d422a1000~1000], logical extent 0x1000~1000, object #4:4b97b22c:::rbd_data.3ab7c374b0dc51.00000000000046ad:head#
Oct 26 04:01:23 PM1 ceph-osd[7903]: 2018-10-26 04:01:23.722940 7f5c972c8700 -1 log_channel(cluster) log [ERR] : 4.1d2 missing primary copy of 4:4b97b22c:::rbd_data.3ab7c374b0dc51.00000000000046ad:head, will try copies on 2
26th Host osd-ceph-0
2018-10-26 03:52:53.716689 7f5c97ac9700 0 log_channel(cluster) log [DBG] : 4.1e scrub starts
2018-10-26 03:52:54.115551 7f5c97ac9700 0 log_channel(cluster) log [DBG] : 4.1e scrub ok
2018-10-26 04:01:23.663112 7f5c972c8700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x1000, got 0x6706be76, expected 0xcd6157f, device location [0x7d422a1000~1000], logical extent 0x1000~1000, object #4:4b97b22c:::rbd_data.3ab7c374b0dc51.00000000000046ad:head#
2018-10-26 04:01:23.722940 7f5c972c8700 -1 log_channel(cluster) log [ERR] : 4.1d2 missing primary copy of 4:4b97b22c:::rbd_data.3ab7c374b0dc51.00000000000046ad:head, will try copies on 2
2018-10-26 04:05:56.797317 7f5c97ac9700 0 log_channel(cluster) log [DBG] : 4.140 scrub starts
2018-10-26 04:05:57.092745 7f5c97ac9700 0 log_channel(cluster) log [DBG] : 4.140 scrub ok
2018-10-26 04:06:57.659367 7f5c9f2d8700 4 rocksdb: [/home/builder/source/ceph-12.2.4/src/rocksdb/db/db_impl_write.cc:684] reusing log 8443 from recycle list
Any ideas ?
I tried talking to the powers above me about using ZFS, but they decided CEPH sounded better to them.
After some digging I changed the dump location to NFS (from CIFS) and added the patch talked about below.
https://forum.proxmox.com/threads/lxc-backups-hang-via-nfs-and-cifs.46669/#post-224815
and bug logged here
https://bugzilla.proxmox.com/show_bug.cgi?id=1911
I'm now getting "_verify_csum bad crc32c/0x1000 checksum" but it only happens when doing a backup.
24th Host syslog.1
Oct 23 22:05:16 PM1 ceph-osd[7119]: 2018-10-23 22:05:16.764978 7f0c9a798700 -1 bluestore(/var/lib/ceph/osd/ceph-5) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x69000, got 0x6706be76, expected 0x8bb9d12a, device location [0xa181a29000~1000], logical extent 0x3e9000~1000, object #4:3f53234e:::rbd_data.3ab6c374b0dc51.0000000000000373:head#
Oct 23 22:05:16 PM1 ceph-osd[7119]: 2018-10-23 22:05:16.765098 7f0c9a798700 -1 log_channel(cluster) log [ERR] : 4.fc missing primary copy of 4:3f53234e:::rbd_data.3ab6c374b0dc51.0000000000000373:head, will try copies on 4
24th Host osd-ceph-5
2018-10-23 22:05:16.764978 7f0c9a798700 -1 bluestore(/var/lib/ceph/osd/ceph-5) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x69000, got 0x6706be76, expected 0x8bb9d12a, device location [0xa181a29000~1000], logical extent 0x3e9000~1000, object #4:3f53234e:::rbd_data.3ab6c374b0dc51.0000000000000373:head#
2018-10-23 22:05:16.765098 7f0c9a798700 -1 log_channel(cluster) log [ERR] : 4.fc missing primary copy of 4:3f53234e:::rbd_data.3ab6c374b0dc51.0000000000000373:head, will try copies on 4
26th Host syslog.1
Oct 26 04:01:23 PM1 ceph-osd[7903]: 2018-10-26 04:01:23.663112 7f5c972c8700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x1000, got 0x6706be76, expected 0xcd6157f, device location [0x7d422a1000~1000], logical extent 0x1000~1000, object #4:4b97b22c:::rbd_data.3ab7c374b0dc51.00000000000046ad:head#
Oct 26 04:01:23 PM1 ceph-osd[7903]: 2018-10-26 04:01:23.722940 7f5c972c8700 -1 log_channel(cluster) log [ERR] : 4.1d2 missing primary copy of 4:4b97b22c:::rbd_data.3ab7c374b0dc51.00000000000046ad:head, will try copies on 2
26th Host osd-ceph-0
2018-10-26 03:52:53.716689 7f5c97ac9700 0 log_channel(cluster) log [DBG] : 4.1e scrub starts
2018-10-26 03:52:54.115551 7f5c97ac9700 0 log_channel(cluster) log [DBG] : 4.1e scrub ok
2018-10-26 04:01:23.663112 7f5c972c8700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x1000, got 0x6706be76, expected 0xcd6157f, device location [0x7d422a1000~1000], logical extent 0x1000~1000, object #4:4b97b22c:::rbd_data.3ab7c374b0dc51.00000000000046ad:head#
2018-10-26 04:01:23.722940 7f5c972c8700 -1 log_channel(cluster) log [ERR] : 4.1d2 missing primary copy of 4:4b97b22c:::rbd_data.3ab7c374b0dc51.00000000000046ad:head, will try copies on 2
2018-10-26 04:05:56.797317 7f5c97ac9700 0 log_channel(cluster) log [DBG] : 4.140 scrub starts
2018-10-26 04:05:57.092745 7f5c97ac9700 0 log_channel(cluster) log [DBG] : 4.140 scrub ok
2018-10-26 04:06:57.659367 7f5c9f2d8700 4 rocksdb: [/home/builder/source/ceph-12.2.4/src/rocksdb/db/db_impl_write.cc:684] reusing log 8443 from recycle list
Any ideas ?
I tried talking to the powers above me about using ZFS, but they decided CEPH sounded better to them.