Ceph - RBD Mirror unlink error

TecScott

Active Member
Mar 30, 2017
28
0
41
34
Following the introduction of an additional node to the primary cluster, a lot of our snapshots appear to be failing to unlink the peer.

RBD Mirroring has been configured and running without issue for almost a year, however more and more VM's appear to be throwing an error lately:

Snapshot ID: 8306
2024-05-20T04:10:03.439+0100 7f60ea10a700 -1 librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f60c80056f0 handle_unlink_peer: failed to unlink peer: (2) No such file or directory


However, when I check the remote site I can see that the snapshot ID has been mirrored.

I can't see any errors in any logs.
 
Just to add - this seemed to start with about 10 of the 150 jobs and seems to have spread.

I've tried disabling mirroring on the disk and re-enabling but it doesn't seem to make any difference (no error on initial mirror image enable or the first snapshot, but second snapshot throws same error then every snapshot after that)

The snapshot ID's on the remote end match the snapshot that's been taken on the local side and the snapshot time also matches the most recent snapshot - so seems to suggest it thinks it's mirroring it?
 
Last edited:
Looking back over the history of when this started.

We introduced a new node to PVE, installed Ceph the following day, then we started creating OSD's on the node the following day again.

The issue only appears to have been present after the OSD's were created (i.e. it was joined to the PVE cluster and had Ceph installed and still ran without any issues).

This has progressively got worse over time, the first night there was 2 of 150 snapshots with the error, now there's 145 of 150.




rbd snap ls shows 2 snapshots for each VM on the source cluster with the correct ID and timestamp, rbd snap ls shows 1 snapshot for each VM on the remote cluster and includes the correct snapshot number that's copied and time.. so the rbd mirror appears to be working as expected..

So I'm not sure exactly what file/directory isn't being found during the unlink? As it appears to proceed to take the snapshot anyway..
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!