Backup - libceph: read_partial_message 00000000df61d3e0 signature check failed

itNGO

Well-Known Member
Jun 12, 2020
767
175
53
45
Germany
it-ngo.com
Wir haben ein neues "Problem".
Wir haben einen 3-Node Proxmox-Cluster mit CEPH auf KRBD umgestellt. Das funktioniert auch ganz normal und performant.

Allerdings haben wir das Problem das zumindest auf einem Cluster wenn das Backup startet (Proxmox Backup Server) das syslog zugespamed wird mit folgenden Meldungen.

Code:
Jan  9 22:30:07 RZB-MPVE2 pvescheduler[1414132]: <root@pam> starting task UPID:RZB-MPVE2:001593F6:0222F38F:61DB53DF:vzdump::root@pam:
Jan  9 22:30:07 RZB-MPVE2 pvescheduler[1414134]: INFO: starting new backup job: vzdump --mailnotification failure --quiet 1 --mode snapshot --all 1 --mailto support@it-ngo.com --storage PBS-RZB-BPVE
Jan  9 22:30:07 RZB-MPVE2 pvescheduler[1414134]: INFO: Starting Backup of VM 255034005 (qemu)
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.324906] Key type ceph registered
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.330725] libceph: loaded (mon/osd proto 15/24)
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.333728] rbd: loaded (major 251)
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.344165] libceph: mon1 (1)10.255.179.11:6789 session established
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.347415] libceph: client9958223 fsid 378dde03-3f1b-42e5-962d-76b9ddb0f990
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.363781] libceph: read_partial_message 00000000df61d3e0 signature check failed
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.364935] libceph: osd0 (1)10.255.179.12:6809 bad crc/signature
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.366181] libceph: read_partial_message 00000000df61d3e0 signature check failed
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.366842] libceph: osd0 (1)10.255.179.12:6809 bad crc/signature
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.367829] libceph: read_partial_message 00000000df61d3e0 signature check failed
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.368421] libceph: osd0 (1)10.255.179.12:6809 bad crc/signature
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.369330] libceph: read_partial_message 00000000df61d3e0 signature check failed
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.369938] libceph: osd0 (1)10.255.179.12:6809 bad crc/signature
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.370943] libceph: read_partial_message 00000000df61d3e0 signature check failed
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.371676] libceph: osd0 (1)10.255.179.12:6809 bad crc/signature
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.372737] libceph: read_partial_message 00000000df61d3e0 signature check failed
Jan  9 22:30:08 RZB-MPVE2 kernel: [358449.373675] libceph: osd0 (1)10.255.179.12:6809 bad crc/signature

Das Backup läuft dann nicht und erst wenn wir den Job abbrechen hören die Meldungen auf.
Die VMs und zugriffe sind ansonsten ganz normal. Wir haben für den betroffenen Cluster erstmal wieder das KRBD deaktiviert.

Vielleicht hat da jemand ein Idee zu?
 
Did you found the cause of this ?
Latest PBS and PVE-Versions fixed this somehow..... We are back on KRBD for several month now...
 
For me latest and greatest did nothing.
It seems that this is a known bug in kernel 5.13 and above, so I'm back on 5.11 and see if this persists.
 
For me latest and greatest did nothing.
It seems that this is a known bug in kernel 5.13 and above, so I'm back on 5.11 and see if this persists.
Did you upgrade to 5.15? We have the same problems and running 5.15
 
What version did you pinned? I would like to downgrade to that version then.
My clusters starts with SLOW OSD's and the CRC errors.
 
  • Like
Reactions: brosky
I have Debian + PVE on top so your config may differ.

Install kernel :
apt install pve-kernel-5.11.22-7-pve

Update /etc/default/grub line:
GRUB_DEFAULT="Advanced options for Proxmox VE GNU/Linux>Proxmox VE GNU/Linux, with Linux 5.11.22-7-pve"

update-grub, reboot.


also, don't forget to disable cache for all HDD's:
for i in `lsblk|grep disk|grep sd|awk '{print $1}'`; do hdparm -W0 /dev/$i; done
 
Last edited:
I'm running proxmox 7.4 and looks like i cannot pin kernal 5.11, only 5.13 and higher.
 
you don't have the package available in the repo ?
I can install it, but when i run
proxmox-boot-tool kernel pin pve-kernel-5.11.22-7-pve

I got:
Possible Proxmox kernel versions are:
5.13.19-6-pve
5.15.102-1-pve
5.15.83-1-pve

And i think it has to do with the fact that im running proxmox 7.4
 
I can install it, but when i run
proxmox-boot-tool kernel pin pve-kernel-5.11.22-7-pve

I got:
Possible Proxmox kernel versions are:
5.13.19-6-pve
5.15.102-1-pve
5.15.83-1-pve

And i think it has to do with the fact that im running proxmox 7.4
Ah.. by "pinning" I mean forcing the grub menu to load that specific version:

first, install the package:
apt install pve-kernel-5.11.22-7-pve
then update the grub menu to load it:
GRUB_DEFAULT="Advanced options for Proxmox VE GNU/Linux>Proxmox VE GNU/Linux, with Linux 5.11.22-7-pve"

after updating the grub , on the next reboot you should be using the 5.11 version
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!