Ceph - Most OSDs down and all PGs unknown after P2V migration

opodopolopolous

New Member
Jan 11, 2025
1
0
1
I run a small single-node ceph cluster (not via Proxmox) for home file storage (deployed by cephadm). It was running bare-metal, and I attempted a physical-to-virtual migration to a Proxmox VM (I am passing through the PCIe HBA that is connected to all the disks to the VM). After doing so, all of my PGs seemed to be "unknown". Initial after a boot, the OSDs appear to be up, but after a while, they go down. I assume some sort of timeout in the OSD start process. The systemd processes (and podman containers) are still running and appear to be happy. I don't see anything crazy in their logs. I'm relatively new to Ceph, so I don't really know where to go from here. Can anyone provide any guidance?

ceph -s
Code:
 cluster:
    id:     768819b0-a83f-11ee-81d6-74563c5bfc7b
    health: HEALTH_WARN
            Reduced data availability: 545 pgs inactive
            139 pgs not deep-scrubbed in time
            17 slow ops, oldest one blocked for 1668 sec, mon.fileserver has slow ops

  services:
    mon: 1 daemons, quorum fileserver (age 28m)
    mgr: fileserver.rgtdvr(active, since 28m), standbys: fileserver.gikddq
    osd: 17 osds: 5 up (since 116m), 5 in (since 10m)

  data:
    pools:   3 pools, 545 pgs
    objects: 1.97M objects, 7.5 TiB
    usage:   7.7 TiB used, 1.4 TiB / 9.1 TiB avail
    pgs:     100.000% pgs unknown
             545 unknown

ceph osd df
Code:
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE   VAR   PGS  STATUS
 0    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0    0    down
 1    hdd  3.63869         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0    0    down
 3    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0  112    down
 4    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0  117    down
 5    hdd  3.63869         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0    0    down
 6    hdd  3.63869         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0    0    down
 7    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0    0    down
 8    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0  106    down
20    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0  115    down
21    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0   94    down
22    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0   98    down
23    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0  109    down
24    hdd  1.81940   1.00000  1.8 TiB  1.6 TiB  1.6 TiB   4 KiB  3.0 GiB  186 GiB  90.00  1.06  117      up
25    hdd  1.81940   1.00000  1.8 TiB  1.6 TiB  1.6 TiB  10 KiB  2.8 GiB  220 GiB  88.18  1.04  114      up
26    hdd  1.81940   1.00000  1.8 TiB  1.5 TiB  1.5 TiB   9 KiB  2.8 GiB  297 GiB  84.07  0.99  109      up
27    hdd  1.81940   1.00000  1.8 TiB  1.4 TiB  1.4 TiB   7 KiB  2.5 GiB  474 GiB  74.58  0.88   98      up
28    hdd  1.81940   1.00000  1.8 TiB  1.6 TiB  1.6 TiB  10 KiB  3.0 GiB  206 GiB  88.93  1.04  115      up
                       TOTAL  9.1 TiB  7.7 TiB  7.7 TiB  42 KiB   14 GiB  1.4 TiB  85.15
MIN/MAX VAR: 0.88/1.06  STDDEV: 5.65

ceph pg stat
Code:
545 pgs: 545 unknown; 7.5 TiB data, 7.7 TiB used, 1.4 TiB / 9.1 TiB avail

systemctl | grep ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b
Code:
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@alertmanager.fileserver.service         loaded active     running   Ceph alertmanager.fileserver for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@ceph-exporter.fileserver.service        loaded active     running   Ceph ceph-exporter.fileserver for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@crash.fileserver.service                loaded active     running   Ceph crash.fileserver for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@grafana.fileserver.service              loaded active     running   Ceph grafana.fileserver for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@mgr.fileserver.gikddq.service           loaded active     running   Ceph mgr.fileserver.gikddq for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@mgr.fileserver.rgtdvr.service           loaded active     running   Ceph mgr.fileserver.rgtdvr for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@mon.fileserver.service                  loaded active     running   Ceph mon.fileserver for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.0.service                           loaded active     running   Ceph osd.0 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.1.service                           loaded active     running   Ceph osd.1 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.20.service                          loaded active     running   Ceph osd.20 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.21.service                          loaded active     running   Ceph osd.21 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.22.service                          loaded active     running   Ceph osd.22 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.23.service                          loaded active     running   Ceph osd.23 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.24.service                          loaded active     running   Ceph osd.24 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.25.service                          loaded active     running   Ceph osd.25 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.26.service                          loaded active     running   Ceph osd.26 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.27.service                          loaded active     running   Ceph osd.27 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.28.service                          loaded active     running   Ceph osd.28 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.3.service                           loaded active     running   Ceph osd.3 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.4.service                           loaded active     running   Ceph osd.4 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.5.service                           loaded active     running   Ceph osd.5 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.6.service                           loaded active     running   Ceph osd.6 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.7.service                           loaded active     running   Ceph osd.7 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@osd.8.service                           loaded active     running   Ceph osd.8 for 768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@prometheus.fileserver.service           loaded active     running   Ceph prometheus.fileserver for 768819b0-a83f-11ee-81d6-74563c5bfc7b
system-ceph\x2d768819b0\x2da83f\x2d11ee\x2d81d6\x2d74563c5bfc7b.slice             loaded active     active    Slice /system/ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b.target                                  loaded active     active    Ceph cluster 768819b0-a83f-11ee-81d6-74563c5bfc7b

I've attached the mon and osd.3 logs.
 

Attachments

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!