I have a three-node cluster with PVE8 and Ceph installed. The names are pfsense-1, pfsense-2 and r730. I have been running PVE for about a year and recently installed Ceph on these nodes. It worked well, but when I reboot the r730 node, it won't boot (I waited for 15 hours). I reinstalled the R730 node, it seems to be fixed. But when I reboot the pfsense-2 node, the problem also occurred ( I waited for 4 hours). The stuck screen was identical to what happened on r730 previously. Here's the "ceph -s result", stuck screen and "ceph old tree" result.
cluster:
id: 4823fd4b-a059-4c88-b287-e26ae916d3fb
health: HEALTH_WARN
1/3 mons down, quorum pfsense-1,r730
Degraded data redundancy: 202217/606651 objects degraded (33.333%), 176 pgs degraded, 193 pgs undersized
services:
mon: 3 daemons, quorum pfsense-1,r730 (age 4h), out of quorum: pfsense-2
mgr: pfsense-1(active, since 4d), standbys: r730
mds: 1/1 daemons up, 1 standby
osd: 21 osds: 16 up (since 4h), 16 in (since 4h)
data:
volumes: 1/1 healthy
pools: 4 pools, 193 pgs
objects: 202.22k objects, 788 GiB
usage: 2.6 TiB used, 17 TiB / 20 TiB avail
pgs: 202217/606651 objects degraded (33.333%)
176 active+undersized+degraded
17 active+undersized
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 27.01759 root default
-7 7.27745 host pfsense-1
1 ssd 1.45549 osd.1 up 1.00000 1.00000
3 ssd 1.45549 osd.3 up 1.00000 1.00000
5 ssd 1.45549 osd.5 up 1.00000 1.00000
7 ssd 1.45549 osd.7 up 1.00000 1.00000
8 ssd 1.45549 osd.8 up 1.00000 1.00000
-10 7.27745 host pfsense-2
10 ssd 1.45549 osd.10 down 0 1.00000
12 ssd 1.45549 osd.12 down 0 1.00000
13 ssd 1.45549 osd.13 down 0 1.00000
15 ssd 1.45549 osd.15 down 0 1.00000
17 ssd 1.45549 osd.17 down 0 1.00000
-3 12.46269 host r730
0 hdd 1.20079 osd.0 up 1.00000 1.00000
2 hdd 1.20079 osd.2 up 1.00000 1.00000
4 hdd 1.20079 osd.4 up 1.00000 1.00000
6 hdd 1.20079 osd.6 up 1.00000 1.00000
9 hdd 1.20079 osd.9 up 1.00000 1.00000
11 hdd 1.20079 osd.11 up 1.00000 1.00000
14 hdd 1.20079 osd.14 up 1.00000 1.00000
16 hdd 1.20079 osd.16 up 1.00000 1.00000
18 hdd 1.20079 osd.18 up 1.00000 1.00000
19 hdd 1.20079 osd.19 up 1.00000 1.00000
20 ssd 0.45479 osd.20 up 1.00000 1.00000