ceph bad crc/signature

Harald Treis · Dec 4, 2018

Hi,
I find a lot of bad crc/signature entry in dmesg/kern.log on our pve-servers.
Base on this entry (kernel: [91709.672202] libceph: osd3 192.168.16.31:6800 bad crc/signature) I figured out, that PG 2.6c had been involved but none VM/LXC is unsing PG 2.6c. Based on some Information in other groups I already set "bluestore retry disk reads = 3" in the global section of ceph.conf and restartet all servers.
A "ceph pg repair 2.6c" did a deep scrub without any errors.

Should I be worried?

Thank you for help,
Harry

We are running pve-manager 5.2-12 and ceph 12.2.8-pve1 on 4 servers.

ceph osd tree

Code:

ID CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF
-1       14.55432 root default                           
-7        3.63858     host ariel1                         
 2   ssd  1.81929         osd.2       up  1.00000 1.00000
 3   ssd  1.81929         osd.3       up  1.00000 1.00000
-3        3.63858     host ariel2                         
 0   ssd  1.81929         osd.0       up  1.00000 1.00000
 4   ssd  1.81929         osd.4       up  1.00000 1.00000
-9        3.63858     host ariel3                         
 6   ssd  1.81929         osd.6       up  1.00000 1.00000
 7   ssd  1.81929         osd.7       up  1.00000 1.00000
-5        3.63858     host ariel4                         
 1   ssd  1.81929         osd.1       up  1.00000 1.00000
 5   ssd  1.81929         osd.5       up  1.00000 1.00000

/etc/pve/ceph.conf

Code:

[global]
    auth client required = cephx
    auth cluster required = cephx
    auth service required = cephx
         bluestore retry disk reads = 3
    cluster network = 192.168.17.0/24
    fsid = 5070e036-8f6c-4795-a34d-9035472a628d
    keyring = /etc/pve/priv/$cluster.$name.keyring
    mon allow pool delete = true
    osd journal size = 5120
    osd pool default min size = 2
    osd pool default size = 3
    public network = 192.168.16.0/24

[osd]
    keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.ariel2]
    host = ariel2
    mon addr = 192.168.16.32:6789

[mon.ariel1]
    host = ariel1
    mon addr = 192.168.16.31:6789

[mon.ariel4]
    host = ariel4
    mon addr = 192.168.16.34:6789

[osd.0]
    public addr = 192.168.16.32
    cluster addr = 192.168.17.32

[osd.1]
    public addr = 192.168.16.34
    cluster addr = 192.168.17.34

[osd.2]
    public addr = 192.168.16.31
    cluster addr = 192.168.17.31

[osd.3]
    public addr = 192.168.16.31
    cluster addr = 192.168.17.31

[osd.4]
    public addr = 192.168.16.32
    cluster addr = 192.168.17.32

[osd.5]
    public addr = 192.168.16.34
    cluster addr = 192.168.17.34

[osd.6]
    public addr = 192.168.16.33
    cluster addr = 192.168.17.33

[osd.7]
    public addr = 192.168.16.33
    cluster addr = 192.168.17.33

cat /etc/network/interfaces

Code:

auto lo
iface lo inet loopback
iface eth0 inet manual
iface eth1 inet manual
iface eth2 inet manual
iface eth3 inet manual
iface eth4 inet manual
iface eth5 inet manual
auto bond0
iface bond0 inet manual
    bond-slaves eth0 eth1
    bond-miimon 100
    bond-mode active-backup
#frontside

auto bond1
iface bond1 inet static
    address  192.168.16.32
    netmask  255.255.255.0
    bond-slaves eth2 eth3
    bond-miimon 100
    bond-mode active-backup
#corosync

auto bond2
iface bond2 inet static
    address  192.168.17.32
    netmask  255.255.255.0
    bond-slaves eth4 eth5
    bond-miimon 100
    bond-mode active-backup
#ceph

auto vmbr0
iface vmbr0 inet static
    address  192.168.19.32
    netmask  255.255.255.0
    gateway  192.168.19.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0

Alwin · Dec 4, 2018

Harald Treis said:
"bluestore retry disk reads = 3" in the global section of ceph.conf and restartet all servers.

This was merged after the release of Ceph 12.2.10, so it will not work with 12.2.8.

If you think that your ceph cluster hits this bug (eg. CRC 0x6706be76), then try to lower the bluestore_cache_size and/or disable swap (swapoff). https://tracker.ceph.com/issues/22464

Harald Treis · Dec 4, 2018

Alwin said:
This was merged after the release of Ceph 12.2.10, so it will not work with 12.2.8.

If you think that your ceph cluster hits this bug (eg. CRC 0x6706be76), then try to lower the bluestore_cache_size and/or disable swap (swapoff). https://tracker.ceph.com/issues/22464

Hi Alwin,
thank you for this quick reply.
I think, I do not really have a problem with our cluster. I found your reference ( CRC 0x6706be76), too and turned off the swap. Memory should be enough, but I'll check again. Otherwise I will wait until Ceph 12.2.10 will be released.
Thank you,
Harry

Alwin · Dec 4, 2018

Harald Treis said:
Ceph 12.2.10

It is already released upstream (ceph.com), we will see, if we add this patch to 12.2.10 or wait till 12.2.11 is released.

Search

Search

ceph bad crc/signature

Harald Treis

New Member

Alwin

Proxmox Retired Staff

Harald Treis

New Member

Alwin

Proxmox Retired Staff

We value your privacy