Proxmox + Ceph - kernel: libceph: osd3 (1)192.168.1.212:6811 bad crc/signature

cableguy84

New Member
May 9, 2023
1
0
1
One of my hosts is going 'grey ?' and backups are not running (freezes on LXC indefinitely). A reboot solves the issue for a while, maybe until the next backup job, but haven't confirmed. I have made some changes to CEPH on x.x.x.212 a couple of days ago but all other nodes are fine. Cluster and ceph are all healthy except for this one node. (Status and logs below)


Log of errors:
journalctl -xe
Sep 08 16:58:31 acemagic-1 kernel: libceph: osd3 (1)192.168.1.212:6811 bad crc/signature
Sep 08 16:58:31 acemagic-1 kernel: libceph: read_partial_message 00000000ef285663 signature check failed
Sep 08 16:58:31 acemagic-1 kernel: libceph: osd4 (1)192.168.1.212:6819 bad crc/signature
Sep 08 16:58:31 acemagic-1 kernel: libceph: read_partial_message 00000000f2a8c138 signature check failed
Sep 08 16:58:31 acemagic-1 kernel: libceph: osd3 (1)192.168.1.212:6811 bad crc/signature
Sep 08 16:58:31 acemagic-1 kernel: libceph: read_partial_message 00000000ef285663 signature check failed
Sep 08 16:58:31 acemagic-1 kernel: libceph: osd4 (1)192.168.1.212:6819 bad crc/signature
Sep 08 16:58:31 acemagic-1 kernel: libceph: read_partial_message 00000000f2a8c138 signature check failed
Sep 08 16:58:31 acemagic-1 kernel: libceph: osd3 (1)192.168.1.212:6811 bad crc/signature
Sep 08 16:58:31 acemagic-1 kernel: libceph: read_partial_message 00000000ef285663 signature check failed
Sep 08 16:58:31 acemagic-1 kernel: libceph: osd4 (1)192.168.1.212:6819 bad crc/signature
Sep 08 16:58:31 acemagic-1 kernel: libceph: read_partial_message 00000000f2a8c138 signature check failed
Sep 08 16:58:31 acemagic-1 kernel: libceph: osd3 (1)192.168.1.212:6811 bad crc/signature
Sep 08 16:58:31 acemagic-1 kernel: libceph: read_partial_message 00000000ef285663 signature check failed
Sep 08 16:58:31 acemagic-1 kernel: libceph: osd4 (1)192.168.1.212:6819 bad crc/signature
Sep 08 16:58:31 acemagic-1 kernel: libceph: read_partial_message 00000000f2a8c138 signature check failed





Code:
root@acemagic-1:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
     Active: active (running) since Sat 2024-09-07 11:11:22 EDT; 1 day 5h ago
   Main PID: 1172 (pmxcfs)
      Tasks: 8 (limit: 38096)
     Memory: 82.7M
        CPU: 2min 58.255s
     CGroup: /system.slice/pve-cluster.service
             └─1172 /usr/bin/pmxcfs

Sep 08 16:51:08 acemagic-1 pmxcfs[1172]: [status] notice: received log
Sep 08 16:51:24 acemagic-1 pmxcfs[1172]: [status] notice: received log
Sep 08 16:51:28 acemagic-1 pmxcfs[1172]: [status] notice: received log
Sep 08 16:51:28 acemagic-1 pmxcfs[1172]: [status] notice: received log
Sep 08 16:51:42 acemagic-1 pmxcfs[1172]: [status] notice: received log
Sep 08 16:51:50 acemagic-1 pmxcfs[1172]: [status] notice: received log
Sep 08 16:51:50 acemagic-1 pmxcfs[1172]: [status] notice: received log
Sep 08 16:51:52 acemagic-1 pmxcfs[1172]: [status] notice: received log
Sep 08 16:54:16 acemagic-1 pmxcfs[1172]: [status] notice: received log
Sep 08 16:54:16 acemagic-1 pmxcfs[1172]: [status] notice: received log
root@acemagic-1:~# systemctl status pvedaemon
● pvedaemon.service - PVE API Daemon
     Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; preset: enabled)
     Active: active (running) since Sat 2024-09-07 11:11:24 EDT; 1 day 5h ago
   Main PID: 1407 (pvedaemon)
      Tasks: 9 (limit: 38096)
     Memory: 207.6M
        CPU: 50.662s
     CGroup: /system.slice/pvedaemon.service
             ├─  1407 pvedaemon
             ├─531828 "pvedaemon worker"
             ├─641523 "pvedaemon worker"
             ├─649924 "pvedaemon worker"
             ├─689458 "task UPID:acemagic-1:000A8532:00895FB8:66DDCCBA:vzstart:102:root@pam:"
             ├─689463 lxc-info -n 102 -p
             ├─689468 lxc-info -n 102 -p
             ├─689692 "task UPID:acemagic-1:000A861C:0089858D:66DDCD1B:vzstart:110:root@pam:"
             └─689738 lxc-info -n 110 -p

Notice: journal has been rotated since unit was started, output may be incomplete.
root@acemagic-1:~# systemctl status pvestatd
● pvestatd.service - PVE Status Daemon
     Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; preset: enabled)
     Active: active (running) since Sat 2024-09-07 11:11:23 EDT; 1 day 5h ago
    Process: 691073 ExecReload=/usr/bin/pvestatd restart (code=exited, status=0/SUCCESS)
   Main PID: 1362 (pvestatd)
      Tasks: 2 (limit: 38096)
     Memory: 157.2M
        CPU: 1h 45min 6.245s
     CGroup: /system.slice/pvestatd.service
             ├─  1362 pvestatd
             └─689488 lxc-info -n 102 -p

Notice: journal has been rotated since unit was started, output may be incomplete.
root@acemagic-1:~# pvecm status
Cluster information
-------------------
Name:             prmx-cluster-1
Config Version:   15
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sun Sep  8 16:57:23 2024
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000001
Ring ID:          1.1c2
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.1.210 (local)
0x00000002          1 192.168.1.211
0x00000003          1 192.168.1.212
0x00000004          1 192.168.1.213
0x00000005          1 192.168.1.214
root@acemagic-1:~# ceph osd tree
ID  CLASS  WEIGHT    TYPE NAME            STATUS  REWEIGHT  PRI-AFF
-1         15.54070  root default
-5          1.25078      host acemagic-1
 1    ssd   0.31929          osd.1            up   1.00000  1.00000
 2    ssd   0.93149          osd.2            up   1.00000  1.00000
-7          2.18228      host acemagic-2
 5    ssd   1.71649          osd.5            up   1.00000  1.00000
 6    ssd   0.46579          osd.6            up   1.00000  1.00000
-3          3.16257      host minif-1
 0   nvme   0.36809          osd.0            up   1.00000  1.00000
 3    ssd   0.93149          osd.3            up   1.00000  1.00000
 4    ssd   1.86299          osd.4            up   1.00000  1.00000
-9          8.94507      host pmox-5700g
 7    ssd   3.63869          osd.7            up   1.00000  1.00000
 8    ssd   3.63869          osd.8            up   1.00000  1.00000
10    ssd   1.66769          osd.10           up   1.00000  1.00000
root@acemagic-1:~# ceph osd status
ID  HOST         USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0  minif-1     68.3G   308G      0      819       0        0   exists,up
 1  acemagic-1  51.0G   275G      0        0       0        0   exists,up
 2  acemagic-1   203G   749G      1     10.3k      0        0   exists,up
 3  minif-1      144G   809G      0      819       0        0   exists,up
 4  minif-1      262G  1644G      2     12.0k    291     1164k  exists,up
 5  acemagic-2   338G  1419G      1     8192       0        0   exists,up
 6  acemagic-2  92.5G   384G      0     1638       0        0   exists,up
 7  pmox-5700g   261G  3465G      3     71.1k      1        0   exists,up
 8  pmox-5700g   204G  3521G      3     15.1k      1        0   exists,up
10  pmox-5700g   114G  1593G      0     5734       0        0   exists,up
root@acemagic-1:~# ceph health detail
HEALTH_OK
root@acemagic-1:~#
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!