[SOLVED] ceph cluster rebuild - Import bluestore OSDs from old cluster (bad fsid) - OSD dont start. He only stays in down state

vb.asta

Member
Feb 12, 2020
11
0
21
36
ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)

Hello everyone, please we need your help to import the bluestore OSDs to the new cluster, because after trying to import, we noticed that the OSDs do not start. They are only in a "down" state and are imported as filestore instead of bluestore.
screenshot: https://share.cleanshot.com/NWbS7bN3

In the proxmox interface, when I select an OSD and click on the "details" button, the error appears: "error with 'osd metadata': mon_cmd failed - (500)".
screenshot: https://share.cleanshot.com/7CxVbPRZ

We reinstalled ceph on 3 nodes (pxm1, pxm2 and pxm3), with the 4 OSDs being available on only 2 nodes pxm2 and pxm3, with pxm1 only having the function of mon and mgr to compose the cluster.

New cluster information:

Bash:
root@pxm2:~# cat /etc/ceph/ceph.conf

[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = 192.168.0.2/24
    fsid = f4466e33-b57d-4d68-9909-3468afd9e5c2
    mon_allow_pool_delete = true
    mon_host = 192.168.0.2 192.168.0.3 192.168.0.1
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = 192.168.0.0/24

[client]
 #   keyring = /etc/pve/priv/$cluster.$name.keyring
      keyring = /etc/pve/priv/ceph.client.admin.keyring

#[client.crash]
#    keyring = /etc/pve/ceph/$cluster.$name.keyring

[client.crash]
    key = AQAl95NmlvL0HRAAovpivsfHqqokmO0vqIR5Lg==

[client.admin]
    key = AQAk95NmSjMdORAAiAHkTSSMquKkBAGpALjwQA==
    caps mds = "allow *"
    caps mgr = "allow *"
    caps mon = "allow *"
    caps osd = "allow *"

[mon.pxm1]
    public_addr = 192.168.0.1

[mon.pxm2]
    public_addr = 192.168.0.2

[mon.pxm3]
    public_addr = 192.168.0.3

Bash:
root@pxm3:~# ceph fsid
f4466e33-b57d-4d68-9909-3468afd9e5c2

Bash:
root@pxm2:~# ceph health
HEALTH_WARN mon pxm1 is low on available space; 4 osds down; 2 hosts (4 osds) down; 1 root (4 osds) down; 173 daemons have recently crashed

Bash:
root@pxm3:~# ceph -s
  cluster:
    id:     f4466e33-b57d-4d68-9909-3468afd9e5c2
    health: HEALTH_WARN
            mon pxm1 is low on available space
            4 osds down
            2 hosts (4 osds) down
            1 root (4 osds) down
            170 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum pxm2,pxm3,pxm1 (age 3h)
    mgr: pxm2(active, since 3h), standbys: pxm1, pxm3
    osd: 4 osds: 0 up, 4 in (since 2h)
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

Bash:
root@pxm2:~# ceph-crash
INFO:ceph-crash:pinging cluster to exercise our key
  cluster:
    id:     f4466e33-b57d-4d68-9909-3468afd9e5c2
    health: HEALTH_WARN
            mon pxm1 is low on available space
            4 osds down
            2 hosts (4 osds) down
            1 root (4 osds) down
            173 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum pxm2,pxm3,pxm1 (age 75s)
    mgr: pxm2(active, since 3h), standbys: pxm1, pxm3
    osd: 4 osds: 0 up, 4 in (since 2h)
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:  
 
INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s

Furthermore, we saw that the OSD ceph_fsid ("fsid:")remains the old cluster's fsid even after import:
Bash:
root@pxm2:~# cephadm ls
[
    {
        "style": "legacy",
        "name": "osd.0",
        "fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0",
        "systemd_unit": "ceph-osd@0",
        "enabled": true,
        "state": "error",
        "host_version": "18.2.2"
    },
    {
        "style": "legacy",
        "name": "osd.1",
        "fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0",
        "systemd_unit": "ceph-osd@1",
        "enabled": true,
        "state": "error",
        "host_version": "18.2.2"
    },
    {
        "style": "legacy",
        "name": "mon.pxm2",
        "fsid": "f4466e33-b57d-4d68-9909-3468afd9e5c2",
        "systemd_unit": "ceph-mon@pxm2",
        "enabled": true,
        "state": "running",
        "host_version": "18.2.2"
    },
    {
        "style": "legacy",
        "name": "mgr.pxm2",
        "fsid": "f4466e33-b57d-4d68-9909-3468afd9e5c2",
        "systemd_unit": "ceph-mgr@pxm2",
        "enabled": true,
        "state": "running",
        "host_version": "18.2.2"
    }
]


root@pxm3:~# cephadm ls
[
    {
        "style": "legacy",
        "name": "osd.3",
        "fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0",
        "systemd_unit": "ceph-osd@3",
        "enabled": true,
        "state": "error",
        "host_version": "18.2.2"
    },
    {
        "style": "legacy",
        "name": "osd.2",
        "fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0",
        "systemd_unit": "ceph-osd@2",
        "enabled": true,
        "state": "error",
        "host_version": "18.2.2"
    },
    {
        "style": "legacy",
        "name": "mon.pxm3",
        "fsid": "f4466e33-b57d-4d68-9909-3468afd9e5c2",
        "systemd_unit": "ceph-mon@pxm3",
        "enabled": true,
        "state": "running",
        "host_version": "18.2.2"
    },
    {
        "style": "legacy",
        "name": "mgr.pxm3",
        "fsid": "f4466e33-b57d-4d68-9909-3468afd9e5c2",
        "systemd_unit": "ceph-mgr@pxm3",
        "enabled": true,
        "state": "running",
        "host_version": "18.2.2"
    }
]

To import the OSDs, we first increased the epoch of the new cluster, repeating the commands below until the cluster's epoch count number was greater than the OSDs' epoch number:
Bash:
ceph osd set  noin
ceph osd set noout
ceph osd set noup
ceph osd set nodown
ceph osd set norebalance
ceph osd set nobackfill
ceph osd unset  noin
ceph osd unset noout
ceph osd unset noup
ceph osd unset nodown
ceph osd unset norebalance
ceph osd unset nobackfill

OSD volumes:
Code:
pxm2:
-> OSD.0: /dev/ceph-1740d41a-2ae7-4c4d-820f-ec3702e3ba90/osd-block-39f9b32f-c6e7-4b3f-b7f0-9b11a5832aaa
-> OSD.1: /dev/ceph-ad425d70-4aa3-419a-997f-f3a4082c9904/osd-block-bb4df480-2b9b-4604-a44d-6151d5c0cb33

pxm3:
-> OSD.2: /dev/ceph-94682b88-d09c-4eab-9170-c6d31eac79e6/osd-block-3f6756d6-e64b-4c60-9ac2-305c0e71cc51
-> OSD.3: /dev/ceph-d5ffd027-8289-4a1c-9378-6687d9f950ad/osd-block-eece9fc9-44d6-460b-aced-572c79a98be8

ceph-bluestore-tool show-label:

-> osb.0:
Bash:
root@pxm2:~# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0
inferring bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0/block": {
        "osd_uuid": "39f9b32f-c6e7-4b3f-b7f0-9b11a5832aaa",
        "size": 2000397795328,
        "btime": "2024-07-07T21:32:30.861509-0300",
        "description": "main",
        "bfm_blocks": "488378368",
        "bfm_blocks_per_key": "128",
        "bfm_bytes_per_block": "4096",
        "bfm_size": "2000397795328",
        "bluefs": "1",
        "ceph_fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0",
        "ceph_version_when_created": "ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)",
        "created_at": "2024-07-08T00:32:32.459102Z",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "osd_key": "AQCdM4tmHR5tLRAA5AikqvQyMqOoH5MnL8Qdtg==",
        "ready": "ready",
        "require_osd_release": "18",
        "whoami": "0"
    }
}

-> osd.1:
Bash:
root@pxm2:~# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-1
inferring bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-1/block": {
        "osd_uuid": "bb4df480-2b9b-4604-a44d-6151d5c0cb33",
        "size": 2000397795328,
        "btime": "2024-07-07T21:32:43.729638-0300",
        "description": "main",
        "bfm_blocks": "488378368",
        "bfm_blocks_per_key": "128",
        "bfm_bytes_per_block": "4096",
        "bfm_size": "2000397795328",
        "bluefs": "1",
        "ceph_fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0",
        "ceph_version_when_created": "ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)",
        "created_at": "2024-07-08T00:32:45.577456Z",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "osd_key": "AQCqM4tmTy87JxAAJVK1NokBDjdKSe+Z8OjwMA==",
        "ready": "ready",
        "require_osd_release": "18",
        "whoami": "1"
    }
}

-> osd.2
Bash:
root@pxm3:~# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-2
inferring bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-2/block": {
        "osd_uuid": "3f6756d6-e64b-4c60-9ac2-305c0e71cc51",
        "size": 2000397795328,
        "btime": "2024-07-07T21:33:07.812888-0300",
        "description": "main",
        "bfm_blocks": "488378368",
        "bfm_blocks_per_key": "128",
        "bfm_bytes_per_block": "4096",
        "bfm_size": "2000397795328",
        "bluefs": "1",
        "ceph_fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0",
        "ceph_version_when_created": "ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)",
        "created_at": "2024-07-08T00:33:09.404317Z",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "osd_key": "AQDCM4tmu++vLRAAuJOXjTuEHR9VsKz7ShVEPg==",
        "ready": "ready",
        "require_osd_release": "18",
        "whoami": "2"
    }
}

-> osd.3
Bash:
root@pxm3:~# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-3
inferring bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-3/block": {
        "osd_uuid": "eece9fc9-44d6-460b-aced-572c79a98be8",
        "size": 2000397795328,
        "btime": "2024-07-07T21:33:25.725294-0300",
        "description": "main",
        "bfm_blocks": "488378368",
        "bfm_blocks_per_key": "128",
        "bfm_bytes_per_block": "4096",
        "bfm_size": "2000397795328",
        "bluefs": "1",
        "ceph_fsid": "5514a69a-46ba-4a44-bb56-8d3109c6c9e0",
        "ceph_version_when_created": "ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)",
        "created_at": "2024-07-08T00:33:27.323085Z",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "osd_key": "AQDUM4tmagOEKBAAeAfZXcyU1naRkqIE5iVOfw==",
        "ready": "ready",
        "require_osd_release": "18",
        "whoami": "3"
    }
}

Bash:
root@pxm2:~# cat /var/lib/ceph/osd/ceph-0/ceph_fsid
5514a69a-46ba-4a44-bb56-8d3109c6c9e0

root@pxm2:~# cat /var/lib/ceph/osd/ceph-1/ceph_fsid
5514a69a-46ba-4a44-bb56-8d3109c6c9e0

root@pxm3:~# cat /var/lib/ceph/osd/ceph-2/ceph_fsid
5514a69a-46ba-4a44-bb56-8d3109c6c9e0

root@pxm3:~# cat /var/lib/ceph/osd/ceph-3/ceph_fsid
5514a69a-46ba-4a44-bb56-8d3109c6c9e0

Code:
root@pxm2:~# ceph daemon osd.0 status
no valid command found; 10 closest matches:
0
1
2
abort
assert
bluefs debug_inject_read_zeros
bluefs files list
bluefs stats
bluestore allocator dump block
bluestore allocator fragmentation block
admin_socket: invalid command

root@pxm2:~# ceph daemon osd.1 status
no valid command found; 10 closest matches:
0
1
2
abort
assert
bluefs debug_inject_read_zeros
bluefs files list
bluefs stats
bluestore allocator dump block
bluestore allocator fragmentation block
admin_socket: invalid command

root@pxm3:~# ceph daemon osd.2  status
no valid command found; 10 closest matches:
0
1
2
abort
assert
bluefs debug_inject_read_zeros
bluefs files list
bluefs stats
bluestore allocator dump block
bluestore allocator fragmentation block
admin_socket: invalid command

root@pxm3:~# ceph daemon osd.3 status
no valid command found; 10 closest matches:
0
1
2
abort
assert
config diff
config diff get <var>
config get <var>
config help [<var>]
config set <var> <val>...
admin_socket: invalid command

Code:
root@pxm2:~# ceph osd info osd.0
osd.0 down in  weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0)   exists 39f9b32f-c6e7-4b3f-b7f0-9b11a5832aaa

root@pxm2:~# ceph osd info osd.1
osd.1 down in  weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0)   exists bb4df480-2b9b-4604-a44d-6151d5c0cb33

root@pxm3:~# ceph osd info osd.2
osd.2 down in  weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0)   exists 3f6756d6-e64b-4c60-9ac2-305c0e71cc51

root@pxm3:~# ceph osd info osd.3
osd.3 down in  weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0)   exists eece9fc9-44d6-460b-aced-572c79a98be8

Bash:
root@pxm3:~# ceph osd status
ID  HOST   USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0           0      0       0        0       0        0   exists
 1           0      0       0        0       0        0   exists
 2           0      0       0        0       0        0   exists
 3           0      0       0        0       0        0   exists
root@pxm3:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME      STATUS  REWEIGHT  PRI-AFF
-1         7.27759  root default                        
-2         3.63879      host pxm2                        
 0         1.81940          osd.0    down   1.00000  1.00000
 1         1.81940          osd.1    down   1.00000  1.00000
-5         3.63879      host pxm3                        
 3         1.81940          osd.3    down   1.00000  1.00000
 2    ssd  1.81940          osd.2    down   1.00000  1.00000


Bash:
root@pxm2:~# ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE  RAW USE  DATA  OMAP  META  AVAIL  %USE  VAR   PGS  STATUS
 0    ssd  1.81940   1.00000   0 B      0 B   0 B   0 B   0 B    0 B     0  1.00    0    down
 1    ssd  1.81940   1.00000   0 B      0 B   0 B   0 B   0 B    0 B     0  1.00    0    down
 2    ssd  1.81940   1.00000   0 B      0 B   0 B   0 B   0 B    0 B     0  1.00    0    down
 3    ssd  1.81940   1.00000   0 B      0 B   0 B   0 B   0 B    0 B     0  1.00    0    down
                       TOTAL   0 B      0 B   0 B   0 B   0 B    0 B     0                  
MIN/MAX VAR: 1.00/1.00  STDDEV: 0

NOTE: Due to character limit, I send the OSD log in the next message.
 
Last edited:
Starting and check OSD service
Bash:
root@pxm3:~# systemctl reset-failed ceph-osd@2.service ; systemctl stop ceph-osd@2.service ; systemctl start ceph-osd@2.service ; systemctl status ceph-osd@2.service
● ceph-osd@2.service - Ceph object storage daemon osd.2
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: active (running) since Wed 2024-07-17 20:23:15 -03; 9ms ago
    Process: 655968 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 2 (code=exited, status=0/SUCCESS)
   Main PID: 655973 ((ceph-osd))
      Tasks: 1
     Memory: 1008.0K
        CPU: 35ms
     CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@2.service
             └─655973 "(ceph-osd)"

Jul 17 20:23:15 pxm3 systemd[1]: Starting ceph-osd@2.service - Ceph object storage daemon osd.2...
Jul 17 20:23:15 pxm3 systemd[1]: Started ceph-osd@2.service - Ceph object storage daemon osd.2.

Below in the OSDs daemon log, we see the "bad fsid" error which indicates that the OSDs are trying to connect to a different Ceph cluster than the one they were initially configured for. Each Ceph cluster has a unique identifier, the FSID, and if the OSDs detect a different FSID, they cannot connect correctly.
Note: I believe this is the root cause of the problem and the solution. Problem: OSD have fsid from the old cluster.
Solution: change the ceph_fsid (fsid) of the OSDs to the fsid of the new cluster, or recreate a new cluster using the fsid that is recorded in the OSD metadata, that is, recreate the cluster with the fsid of the old cluster. It is possible?)

journalctl:
Bash:
journalctl -u ceph-osd@2:
Jul 17 20:09:40 pxm3 systemd[1]: Starting ceph-osd@2.service - Ceph object storage daemon osd.2...
Jul 17 20:09:40 pxm3 systemd[1]: Started ceph-osd@2.service - Ceph object storage daemon osd.2.
Jul 17 20:09:54 pxm3 ceph-osd[647313]: 2024-07-17T20:09:54.845-0300 702356d616c0 -1 osd.2 71 log_to_monitors true
Jul 17 20:09:55 pxm3 ceph-osd[647313]: 2024-07-17T20:09:55.362-0300 702342c006c0 -1 osd.2 71 ERROR: bad fsid?  i have 5514a69a-46ba-4a44-bb56-8d3109c6c9e0 and inc has f4466e33-b57d-4d68-9909-3468afd9e5c2
Jul 17 20:09:55 pxm3 ceph-osd[647313]: ./src/osd/OSD.cc: In function 'void OSD::handle_osd_map(MOSDMap*)' thread 702342c006c0 time 2024-07-17T20:09:55.363121-0300
Jul 17 20:09:55 pxm3 ceph-osd[647313]: ./src/osd/OSD.cc: 8098: ceph_abort_msg("bad fsid")
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xd4) [0x62edaeab46e>
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  2: (OSD::handle_osd_map(MOSDMap*)+0x384a) [0x62edaec0aeca]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  3: (OSD::ms_dispatch(Message*)+0x62) [0x62edaec0b332]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  4: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xc1) [0x62edaf5eea51]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  5: (DispatchQueue::entry()+0x6cf) [0x62edaf5ed53f]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  6: (DispatchQueue::DispatchThread::entry()+0xd) [0x62edaf40fd5d]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  7: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7023578a8134]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  8: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7023579287dc]
Jul 17 20:09:55 pxm3 ceph-osd[647313]: *** Caught signal (Aborted) **
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  in thread 702342c006c0 thread_name:ms_dispatch
Jul 17 20:09:55 pxm3 ceph-osd[647313]: 2024-07-17T20:09:55.366-0300 702342c006c0 -1 ./src/osd/OSD.cc: In function 'void OSD::handle_osd_map(MOSDMap*)' thread 702342c006c0 time 2024-07-17T20:09:55.363121>
Jul 17 20:09:55 pxm3 ceph-osd[647313]: ./src/osd/OSD.cc: 8098: ceph_abort_msg("bad fsid")
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xd4) [0x62edaeab46e>
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  2: (OSD::handle_osd_map(MOSDMap*)+0x384a) [0x62edaec0aeca]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  3: (OSD::ms_dispatch(Message*)+0x62) [0x62edaec0b332]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  4: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xc1) [0x62edaf5eea51]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  5: (DispatchQueue::entry()+0x6cf) [0x62edaf5ed53f]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  6: (DispatchQueue::DispatchThread::entry()+0xd) [0x62edaf40fd5d]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  7: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7023578a8134]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  8: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7023579287dc]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x70235785b050]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x7023578a9e2c]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  3: gsignal()
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  4: abort()
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  5: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x18a) [0x62edaeab47>
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  6: (OSD::handle_osd_map(MOSDMap*)+0x384a) [0x62edaec0aeca]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  7: (OSD::ms_dispatch(Message*)+0x62) [0x62edaec0b332]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  8: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xc1) [0x62edaf5eea51]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  9: (DispatchQueue::entry()+0x6cf) [0x62edaf5ed53f]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  10: (DispatchQueue::DispatchThread::entry()+0xd) [0x62edaf40fd5d]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  11: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7023578a8134]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7023579287dc]
Jul 17 20:09:55 pxm3 ceph-osd[647313]: 2024-07-17T20:09:55.369-0300 702342c006c0 -1 *** Caught signal (Aborted) **
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  in thread 702342c006c0 thread_name:ms_dispatch
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x70235785b050]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x7023578a9e2c]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  3: gsignal()
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  4: abort()
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  5: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x18a) [0x62edaeab47>
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  6: (OSD::handle_osd_map(MOSDMap*)+0x384a) [0x62edaec0aeca]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  7: (OSD::ms_dispatch(Message*)+0x62) [0x62edaec0b332]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  8: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xc1) [0x62edaf5eea51]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  9: (DispatchQueue::entry()+0x6cf) [0x62edaf5ed53f]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  10: (DispatchQueue::DispatchThread::entry()+0xd) [0x62edaf40fd5d]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  11: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7023578a8134]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7023579287dc]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jul 17 20:09:55 pxm3 ceph-osd[647313]:   -142> 2024-07-17T20:09:54.845-0300 702356d616c0 -1 osd.2 71 log_to_monitors true
Jul 17 20:09:55 pxm3 ceph-osd[647313]:     -2> 2024-07-17T20:09:55.362-0300 702342c006c0 -1 osd.2 71 ERROR: bad fsid?  i have 5514a69a-46ba-4a44-bb56-8d3109c6c9e0 and inc has f4466e33-b57d-4d68-9909-346>
Jul 17 20:09:55 pxm3 ceph-osd[647313]:     -1> 2024-07-17T20:09:55.366-0300 702342c006c0 -1 ./src/osd/OSD.cc: In function 'void OSD::handle_osd_map(MOSDMap*)' thread 702342c006c0 time 2024-07-17T20:09:5>
Jul 17 20:09:55 pxm3 ceph-osd[647313]: ./src/osd/OSD.cc: 8098: ceph_abort_msg("bad fsid")
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xd4) [0x62edaeab46e>
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  2: (OSD::handle_osd_map(MOSDMap*)+0x384a) [0x62edaec0aeca]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  3: (OSD::ms_dispatch(Message*)+0x62) [0x62edaec0b332]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  4: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xc1) [0x62edaf5eea51]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  5: (DispatchQueue::entry()+0x6cf) [0x62edaf5ed53f]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  6: (DispatchQueue::DispatchThread::entry()+0xd) [0x62edaf40fd5d]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  7: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7023578a8134]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  8: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7023579287dc]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:      0> 2024-07-17T20:09:55.369-0300 702342c006c0 -1 *** Caught signal (Aborted) **
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  in thread 702342c006c0 thread_name:ms_dispatch
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x70235785b050]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x7023578a9e2c]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  3: gsignal()
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  4: abort()
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  5: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x18a) [0x62edaeab47>
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  6: (OSD::handle_osd_map(MOSDMap*)+0x384a) [0x62edaec0aeca]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  7: (OSD::ms_dispatch(Message*)+0x62) [0x62edaec0b332]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  8: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xc1) [0x62edaf5eea51]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  9: (DispatchQueue::entry()+0x6cf) [0x62edaf5ed53f]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  10: (DispatchQueue::DispatchThread::entry()+0xd) [0x62edaf40fd5d]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  11: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7023578a8134]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7023579287dc]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jul 17 20:09:55 pxm3 ceph-osd[647313]:   -142> 2024-07-17T20:09:54.845-0300 702356d616c0 -1 osd.2 71 log_to_monitors true
Jul 17 20:09:55 pxm3 ceph-osd[647313]:     -2> 2024-07-17T20:09:55.362-0300 702342c006c0 -1 osd.2 71 ERROR: bad fsid?  i have 5514a69a-46ba-4a44-bb56-8d3109c6c9e0 and inc has f4466e33-b57d-4d68-9909-346>
Jul 17 20:09:55 pxm3 ceph-osd[647313]:     -1> 2024-07-17T20:09:55.366-0300 702342c006c0 -1 ./src/osd/OSD.cc: In function 'void OSD::handle_osd_map(MOSDMap*)' thread 702342c006c0 time 2024-07-17T20:09:5>
Jul 17 20:09:55 pxm3 ceph-osd[647313]: ./src/osd/OSD.cc: 8098: ceph_abort_msg("bad fsid")
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xd4) [0x62edaeab46e>
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  2: (OSD::handle_osd_map(MOSDMap*)+0x384a) [0x62edaec0aeca]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  3: (OSD::ms_dispatch(Message*)+0x62) [0x62edaec0b332]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  4: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xc1) [0x62edaf5eea51]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  5: (DispatchQueue::entry()+0x6cf) [0x62edaf5ed53f]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  6: (DispatchQueue::DispatchThread::entry()+0xd) [0x62edaf40fd5d]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  7: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7023578a8134]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  8: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7023579287dc]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:      0> 2024-07-17T20:09:55.369-0300 702342c006c0 -1 *** Caught signal (Aborted) **
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  in thread 702342c006c0 thread_name:ms_dispatch
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x70235785b050]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x7023578a9e2c]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  3: gsignal()
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  4: abort()
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  5: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x18a) [0x62edaeab47>
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  6: (OSD::handle_osd_map(MOSDMap*)+0x384a) [0x62edaec0aeca]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  7: (OSD::ms_dispatch(Message*)+0x62) [0x62edaec0b332]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  8: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xc1) [0x62edaf5eea51]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  9: (DispatchQueue::entry()+0x6cf) [0x62edaf5ed53f]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  10: (DispatchQueue::DispatchThread::entry()+0xd) [0x62edaf40fd5d]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  11: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7023578a8134]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7023579287dc]
Jul 17 20:09:55 pxm3 ceph-osd[647313]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jul 17 20:09:55 pxm3 systemd[1]: ceph-osd@2.service: Main process exited, code=killed, status=6/ABRT
Jul 17 20:09:55 pxm3 systemd[1]: ceph-osd@2.service: Failed with result 'signal'.
Jul 17 20:09:55 pxm3 systemd[1]: ceph-osd@2.service: Consumed 10.901s CPU time.
 
Bash:
root@pxm2:~# ceph osd dump
epoch 164
fsid f4466e33-b57d-4d68-9909-3468afd9e5c2
created 2024-07-14T13:04:53.354056-0300
modified 2024-07-17T20:24:17.550123-0300
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 63
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client jewel
require_osd_release reef
stretch_mode_enabled false
max_osd 4
osd.0 down in  weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0)   exists 39f9b32f-c6e7-4b3f-b7f0-9b11a5832aaa
osd.1 down in  weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0)   exists bb4df480-2b9b-4604-a44d-6151d5c0cb33
osd.2 down in  weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0)   exists 3f6756d6-e64b-4c60-9ac2-305c0e71cc51
osd.3 down in  weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0)   exists eece9fc9-44d6-460b-aced-572c79a98be8
blocklist 192.168.0.1:0/1377959162 expires 2024-07-18T17:16:21.635340-0300
blocklist 192.168.0.1:6800/49511 expires 2024-07-18T17:16:21.635340-0300
blocklist 192.168.0.1:0/3150227222 expires 2024-07-18T17:16:21.635340-0300
blocklist 192.168.0.2:0/1759809784 expires 2024-07-18T00:56:24.333969-0300
blocklist 192.168.0.1:0/3747372149 expires 2024-07-18T17:16:21.635340-0300
blocklist 192.168.0.1:6801/49511 expires 2024-07-18T17:16:21.635340-0300
blocklist 192.168.0.2:0/1472036822 expires 2024-07-18T00:56:24.333969-0300
blocklist 192.168.0.2:0/571633622 expires 2024-07-18T00:56:24.333969-0300
blocklist 192.168.0.2:6801/10543 expires 2024-07-18T00:56:24.333969-0300
blocklist 192.168.0.2:6800/10543 expires 2024-07-18T00:56:24.333969-0300

Bash:
root@pxm2:~# ceph osd metadata
[
    {
        "id": 0
    },
    {
        "id": 1
    },
    {
        "id": 2
    },
    {
        "id": 3
    }
]

I tried the procedure below to try to force the fsid of the new cluster directly into the OSD metadata, but when restarting the service, or after a repair (
ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-3), the fsid of the old cluster is written back to the file:
Code:
root@pxm3:~# echo "f4466e33-b57d-4d68-9909-3468afd9e5c2" /var/lib/ceph/osd/ceph-3/ceph_fsid
f4466e33-b57d-4d68-9909-3468afd9e5c2 /var/lib/ceph/osd/ceph-3/ceph_fsid
root@pxm3:~# cat /var/lib/ceph/osd/ceph-3/ceph_fsid
5514a69a-46ba-4a44-bb56-8d3109c6c9e0
root@pxm3:~# systemctl reset-failed ceph-osd@3.service ; systemctl stop ceph-osd@3.service ; systemctl start ceph-osd@3.service ; systemctl status ceph-osd@3.service
● ceph-osd@3.service - Ceph object storage daemon osd.3
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: active (running) since Wed 2024-07-17 21:08:04 -03; 9ms ago
    Process: 683018 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 3 (code=exited, status=0/SUCCESS)
   Main PID: 683022 ((ceph-osd))
      Tasks: 1
     Memory: 864.0K
        CPU: 34ms
     CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@3.service
             └─683022 "(ceph-osd)"

Jul 17 21:08:04 pxm3 systemd[1]: Starting ceph-osd@3.service - Ceph object storage daemon osd.3...
Jul 17 21:08:04 pxm3 systemd[1]: Started ceph-osd@3.service - Ceph object storage daemon osd.3.
root@pxm3:~# cat /var/lib/ceph/osd/ceph-3/ceph_fsid
5514a69a-46ba-4a44-bb56-8d3109c6c9e0

Bash:
root@pxm2:~# cat /var/lib/ceph/osd/ceph-0/keyring
[osd.0]
key = AQCdM4tmHR5tLRAA5AikqvQyMqOoH5MnL8Qdtg==

root@pxm2:~# cat /var/lib/ceph/osd/ceph-1/keyring
[osd.1]
key = AQCqM4tmTy87JxAAJVK1NokBDjdKSe+Z8OjwMA==

root@pxm3:~# cat /var/lib/ceph/osd/ceph-2/keyring
[osd.2]
key = AQDCM4tmu++vLRAAuJOXjTuEHR9VsKz7ShVEPg==

root@pxm3:~# cat /var/lib/ceph/osd/ceph-3/keyring
[osd.3]
key = AQDUM4tmagOEKBAAeAfZXcyU1naRkqIE5iVOfw==

Bash:
root@pxm2:~# ceph auth list
osd.0
    key: AQCKQJhmMYWeEhAAosrE8Ff+1kZbKcroi22TvQ==
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.1
    key: AQCTQJhm2uLvAxAAC3uIcRk9d0sxLJgIxcivtw==
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.2
    key: AQDCM4tmu++vLRAAuJOXjTuEHR9VsKz7ShVEPg==
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.3
    key: AQBYMZhmmlaFDRAAtoy4//XweaI94OvrPV1aiQ==
    caps: [mon] allow profile osd
    caps: [osd] allow *
client.admin
    key: AQAk95NmSjMdORAAiAHkTSSMquKkBAGpALjwQA==
    caps: [mds] allow *
    caps: [mgr] allow *
    caps: [mon] allow *
    caps: [osd] allow *
client.bootstrap-mds
    key: AQAl95NmWuofFRAAhV7/f1PykW/KlcB8Rede9w==
    caps: [mon] allow profile bootstrap-mds
client.bootstrap-mgr
    key: AQAl95NmmPQfFRAAG2cQNiWQgdx0Rvr7ZFmuhg==
    caps: [mon] allow profile bootstrap-mgr
client.bootstrap-osd
    key: AQAl95Nmr/0fFRAA4TahG5PCbZIsltFgkRNEgA==
    caps: [mon] allow profile bootstrap-osd
client.bootstrap-rbd
    key: AQAl95Nm0gYgFRAAWBvqfphKk62InqY9x0ijHg==
    caps: [mon] allow profile bootstrap-rbd
client.bootstrap-rbd-mirror
    key: AQAl95Nmxg8gFRAAomFkp299Ca04NwGpfbSZRg==
    caps: [mon] allow profile bootstrap-rbd-mirror
client.bootstrap-rgw
    key: AQAl95Nm3RwgFRAA/QoIemXPFv5Gs1/PWfJuYw==
    caps: [mon] allow profile bootstrap-rgw
client.crash
    key: AQAl95NmlvL0HRAAovpivsfHqqokmO0vqIR5Lg==
    caps: [mgr] profile crash
    caps: [mon] profile crash
mgr.pxm1
    key: AQDm+JNmtJ3CCRAAl/wZaT6z12LCfAghvm6s4w==
    caps: [mds] allow *
    caps: [mon] allow profile mgr
    caps: [osd] allow *
mgr.pxm2
    key: AQAm95Nm6+dmABAA6405D8ROtNpJf6iVhMegQA==
    caps: [mds] allow *
    caps: [mon] allow profile mgr
    caps: [osd] allow *
mgr.pxm3
    key: AQDp95Nm3ZbLDxAAWc1zLaVOhE0wMxLJCL5IGg==
    caps: [mds] allow *
    caps: [mon] allow profile mgr
    caps: [osd] allow *
 
root@pxm2:~# ceph-volume lvm list


====== osd.0 =======

[block] /dev/ceph-1740d41a-2ae7-4c4d-820f-ec3702e3ba90/osd-block-39f9b32f-c6e7-4b3f-b7f0-9b11a5832aaa

block device /dev/ceph-1740d41a-2ae7-4c4d-820f-ec3702e3ba90/osd-block-39f9b32f-c6e7-4b3f-b7f0-9b11a5832aaa
block uuid 4QeVWT-pjxc-i1ZX-Vomv-t305-JRaI-5TJFf3
cephx lockbox secret
cluster fsid 5514a69a-46ba-4a44-bb56-8d3109c6c9e0
cluster name ceph
crush device class nvme
encrypted 0
osd fsid 39f9b32f-c6e7-4b3f-b7f0-9b11a5832aaa
osd id 0
osdspec affinity
type block
vdo 0
devices /dev/nvme1n1

====== osd.1 =======

[block] /dev/ceph-ad425d70-4aa3-419a-997f-f3a4082c9904/osd-block-bb4df480-2b9b-4604-a44d-6151d5c0cb33

block device /dev/ceph-ad425d70-4aa3-419a-997f-f3a4082c9904/osd-block-bb4df480-2b9b-4604-a44d-6151d5c0cb33
block uuid JCdfjp-odd6-agXg-7qnn-2Dye-UIgp-2QTJUj
cephx lockbox secret
cluster fsid 5514a69a-46ba-4a44-bb56-8d3109c6c9e0
cluster name ceph
crush device class nvme
encrypted 0
osd fsid bb4df480-2b9b-4604-a44d-6151d5c0cb33
osd id 1
osdspec affinity
type block
vdo 0
devices /dev/nvme0n1

root@pxm3:~# ceph-volume lvm list


====== osd.2 =======

[block] /dev/ceph-94682b88-d09c-4eab-9170-c6d31eac79e6/osd-block-3f6756d6-e64b-4c60-9ac2-305c0e71cc51

block device /dev/ceph-94682b88-d09c-4eab-9170-c6d31eac79e6/osd-block-3f6756d6-e64b-4c60-9ac2-305c0e71cc51
block uuid m5UJhh-Uvj4-KZNG-yJQM-32J5-fO4W-MWX7sC
cephx lockbox secret
cluster fsid 5514a69a-46ba-4a44-bb56-8d3109c6c9e0
cluster name ceph
crush device class nvme
encrypted 0
osd fsid 3f6756d6-e64b-4c60-9ac2-305c0e71cc51
osd id 2
osdspec affinity
type block
vdo 0
devices /dev/nvme0n1

====== osd.3 =======

[block] /dev/ceph-d5ffd027-8289-4a1c-9378-6687d9f950ad/osd-block-eece9fc9-44d6-460b-aced-572c79a98be8

block device /dev/ceph-d5ffd027-8289-4a1c-9378-6687d9f950ad/osd-block-eece9fc9-44d6-460b-aced-572c79a98be8
block uuid Ygjuhw-CFSV-sBKo-5tK3-uPlg-P5iJ-3C2GHy
cephx lockbox secret
cluster fsid 5514a69a-46ba-4a44-bb56-8d3109c6c9e0
cluster name ceph
crush device class nvme
encrypted 0
osd fsid eece9fc9-44d6-460b-aced-572c79a98be8
osd id 3
osdspec affinity
type block
vdo 0
devices /dev/nvme1n1

---

root@pxm2:~# ceph-volume simple scan -f /var/lib/ceph/osd/ceph-0
stderr: Unknown device "/var/lib/ceph/osd/ceph-0": No such device
Running command: /usr/sbin/cryptsetup status tmpfs
--> Ignoring /var/lib/ceph/osd/ceph-0 because it's not a ceph-disk created osd.

root@pxm2:~# ceph-volume simple scan -f /var/lib/ceph/osd/ceph-1
stderr: Unknown device "/var/lib/ceph/osd/ceph-1": No such device
Running command: /usr/sbin/cryptsetup status tmpfs
--> Ignoring /var/lib/ceph/osd/ceph-1 because it's not a ceph-disk created osd.

root@pxm3:~# ceph-volume simple scan /var/lib/ceph/osd/ceph-3
stderr: Unknown device "/var/lib/ceph/osd/ceph-3": No such device
Running command: /usr/sbin/cryptsetup status tmpfs
--> Ignoring /var/lib/ceph/osd/ceph-3 because it's not a ceph-disk created osd.
root@pxm3:~# ceph-volume simple scan /var/lib/ceph/osd/ceph-2
stderr: Unknown device "/var/lib/ceph/osd/ceph-2": No such device
Running command: /usr/sbin/cryptsetup status tmpfs
--> Ignoring /var/lib/ceph/osd/ceph-2 because it's not a ceph-disk created osd.
 
Last edited:
Hello, I wanted to let you know that we managed to resolve the issue.
As the OSDs were healthy, the solution was to rebuild (export) the monitor data from the old cluster from the OSDs, as per more details below and also in the links at the end, because without the monitor database, the ceph cluster cannot recognize the data in the OSDs nor the pools contained in them.

It was also necessary to create a very basic ceph.conf file in /etc/pve/ (and creating a symbolic link to /etc/ceph/ceph.conf) containing the fsid of the old cluster (you can pull the fsid registered in the OSD directly from OSD with the ceph-volume lvm list command), so that when reinstalling ceph, the cluster has the same fsid as the OSDs, as we did not find a way to modify the fsid in the OSDs, and if a cluster with a different fsid from the OSD is added, the osd service error log will show the error "bad fsid", not letting the osd service go up.

Well, I hope it helps someone not to go through terror and despair like I did, until I did hundreds of tests through various guides (links below) and received this information.

I leave it as a suggestion to the Proxmox team to place a warning or a permission ban in cases of manual deletion of an important file from the ceph cluster (for example monitor database files).

Thank you

In summary:
I concluded after studying the Ceph architecture better. There are small nuances regarding Proxmox.

The solution was to restore the monitor data from the OSDs, but you have to have all the OSDs intact and available, contrary to what much documentation said that just one OSD was enough, because before importing to the new monitor, the extracted data For old monitors, which are in the OSDs, you need all the *.sst files (I think it's iso) referring to the monitor database.

I believe that perhaps I cannot have the Encrypt OSD option enabled.

If you use separate units for wal and db disk, you need to follow the store.db slow part of the bug that Redhat and IBM mention in the links below:

I used several guides together, as it was necessary to test different ways, mainly because the cluster broke in the middle of the guides' instructions. Therefore, we raised VMs and made a direct pass through the OSD disks, precisely to speed up restoring the VM snapshot and be able to start a new attempt, as the OSD data is not touched, but only the data referring to the monitor/mgr keys and of the ceph cluster.

Some commands in most guides should not be used, because they break the Ceph cluster, like generating the master key again, which is not necessary.

The thing about raising the ceph with the same fsid is also necessary

I used links from Redhat, IBM, Suse, Ceph, Proxmox. I more or less put together a puzzle, between different tests.

After recovering the monitor data from the OSD, you have to extract the monmap from the data, remove a monitor that is automatically added during extraction (monitor: a), and then re-include the monitor I want, manually directly in the data , as the monitor service will be stopped. Then, just reinject the monmap into the monitor database using monmaotool.

About the documentation, like, what one doesn't have in another, and there are some that have too much that makes the Ceph cluster break.

I also had to edit the log_external_backlog file which is inside the monitor directory.

The links are these:

- https://www.ibm.com/docs/en/storage-ceph/7?topic=osds-activating (ativar os OSDs Bluestore para os dados a serem extraídos ficarem disponíveis)

- https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

- https://documentation.suse.com/ses/7.1/html/ses-all/bp-troubleshooting-monitors.html

- https://docs.redhat.com/en/document...-ceph-monitor-for-bare-metal-deployments_diag

- https://docs.redhat.com/en/document...-ceph-monitor-store-when-using-bluestore_diag

- https://forum.proxmox.com/threads/recover-ceph-from-osds-only.113699/

- https://forum.proxmox.com/threads/recover-data-from-ceph-osds.138259/

- https://www.ibm.com/docs/en/storage-ceph/6?topic=store-recovering-ceph-monitor-when-using-bluestore

- https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/

- https://lists.ceph.io/hyperkitty/
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!