ceph bad crc/signature

Harald Treis

New Member
Jun 13, 2018
9
6
1
Germany
Hi,
I find a lot of bad crc/signature entry in dmesg/kern.log on our pve-servers.
Base on this entry (kernel: [91709.672202] libceph: osd3 192.168.16.31:6800 bad crc/signature) I figured out, that PG 2.6c had been involved but none VM/LXC is unsing PG 2.6c. Based on some Information in other groups I already set "bluestore retry disk reads = 3" in the global section of ceph.conf and restartet all servers.
A "ceph pg repair 2.6c" did a deep scrub without any errors.

Should I be worried?

Thank you for help,
Harry

We are running pve-manager 5.2-12 and ceph 12.2.8-pve1 on 4 servers.

ceph osd tree
Code:
ID CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF
-1       14.55432 root default                           
-7        3.63858     host ariel1                         
 2   ssd  1.81929         osd.2       up  1.00000 1.00000
 3   ssd  1.81929         osd.3       up  1.00000 1.00000
-3        3.63858     host ariel2                         
 0   ssd  1.81929         osd.0       up  1.00000 1.00000
 4   ssd  1.81929         osd.4       up  1.00000 1.00000
-9        3.63858     host ariel3                         
 6   ssd  1.81929         osd.6       up  1.00000 1.00000
 7   ssd  1.81929         osd.7       up  1.00000 1.00000
-5        3.63858     host ariel4                         
 1   ssd  1.81929         osd.1       up  1.00000 1.00000
 5   ssd  1.81929         osd.5       up  1.00000 1.00000


/etc/pve/ceph.conf
Code:
[global]
    auth client required = cephx
    auth cluster required = cephx
    auth service required = cephx
         bluestore retry disk reads = 3
    cluster network = 192.168.17.0/24
    fsid = 5070e036-8f6c-4795-a34d-9035472a628d
    keyring = /etc/pve/priv/$cluster.$name.keyring
    mon allow pool delete = true
    osd journal size = 5120
    osd pool default min size = 2
    osd pool default size = 3
    public network = 192.168.16.0/24

[osd]
    keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.ariel2]
    host = ariel2
    mon addr = 192.168.16.32:6789

[mon.ariel1]
    host = ariel1
    mon addr = 192.168.16.31:6789

[mon.ariel4]
    host = ariel4
    mon addr = 192.168.16.34:6789

[osd.0]
    public addr = 192.168.16.32
    cluster addr = 192.168.17.32

[osd.1]
    public addr = 192.168.16.34
    cluster addr = 192.168.17.34

[osd.2]
    public addr = 192.168.16.31
    cluster addr = 192.168.17.31

[osd.3]
    public addr = 192.168.16.31
    cluster addr = 192.168.17.31

[osd.4]
    public addr = 192.168.16.32
    cluster addr = 192.168.17.32

[osd.5]
    public addr = 192.168.16.34
    cluster addr = 192.168.17.34

[osd.6]
    public addr = 192.168.16.33
    cluster addr = 192.168.17.33

[osd.7]
    public addr = 192.168.16.33
    cluster addr = 192.168.17.33

cat /etc/network/interfaces
Code:
auto lo
iface lo inet loopback
iface eth0 inet manual
iface eth1 inet manual
iface eth2 inet manual
iface eth3 inet manual
iface eth4 inet manual
iface eth5 inet manual
auto bond0
iface bond0 inet manual
    bond-slaves eth0 eth1
    bond-miimon 100
    bond-mode active-backup
#frontside

auto bond1
iface bond1 inet static
    address  192.168.16.32
    netmask  255.255.255.0
    bond-slaves eth2 eth3
    bond-miimon 100
    bond-mode active-backup
#corosync

auto bond2
iface bond2 inet static
    address  192.168.17.32
    netmask  255.255.255.0
    bond-slaves eth4 eth5
    bond-miimon 100
    bond-mode active-backup
#ceph

auto vmbr0
iface vmbr0 inet static
    address  192.168.19.32
    netmask  255.255.255.0
    gateway  192.168.19.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
 
"bluestore retry disk reads = 3" in the global section of ceph.conf and restartet all servers.
This was merged after the release of Ceph 12.2.10, so it will not work with 12.2.8.

If you think that your ceph cluster hits this bug (eg. CRC 0x6706be76), then try to lower the bluestore_cache_size and/or disable swap (swapoff). https://tracker.ceph.com/issues/22464
 
This was merged after the release of Ceph 12.2.10, so it will not work with 12.2.8.

If you think that your ceph cluster hits this bug (eg. CRC 0x6706be76), then try to lower the bluestore_cache_size and/or disable swap (swapoff). https://tracker.ceph.com/issues/22464

Hi Alwin,
thank you for this quick reply.
I think, I do not really have a problem with our cluster. I found your reference ( CRC 0x6706be76), too and turned off the swap. Memory should be enough, but I'll check again. Otherwise I will wait until Ceph 12.2.10 will be released.
Thank you,
Harry
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!