Feb 25 16:41:29 ga-node-08c ceph-osd[1892]: 2026-02-25T16:41:29.697-0500 77896f6686c0 -1 osd.11 4078 heartbeat_check: no reply from 10.77.204.122:6816 osd.12 ever on either front or back, first ping sent 2026-02-25T16:36:56.638684-0500 (oldest deadline 2026-02-25T16:37:16.638684-0500)
Feb 25 16:41:29 ga-node-08c ceph-osd[1892]: 2026-02-25T16:41:29.697-0500 77896f6686c0 -1 osd.11 4078 heartbeat_check: no reply from 10.77.204.123:6802 osd.14 ever on either front or back, first ping sent 2026-02-25T16:36:56.638684-0500 (oldest deadline 2026-02-25T16:37:16.638684-0500)
Feb 25 16:41:29 ga-node-08c ceph-osd[1892]: 2026-02-25T16:41:29.697-0500 77896f6686c0 -1 osd.11 4078 heartbeat_check: no reply from 10.77.204.123:6806 osd.15 ever on either front or back, first ping sent 2026-02-25T16:36:56.638684-0500 (oldest deadline 2026-02-25T16:37:16.638684-0500)
# Ceph Debugging Documentation — ga-node-08c / ga-node-01c
**Date:** 2026-02-25
**Cluster:** Proxmox VE 8 (Debian Bookworm)
**Issue:** OSDs unable to rejoin cluster after node reboot
---
## 1. Environment
### Cluster Topology
| Node | Public IP (vmbr0) | Cluster IP (vmbr1078) | Role |
|---|---|---|---|
| ga-node-08c | 10.77.204.117 | 10.78.204.117 | OSD (problematic) |
| ga-node-10c | 10.77.204.122 | 10.78.204.122 | OSD + MGR |
| ga-node-11c | 10.77.204.123 | 10.78.204.123 | OSD + MON |
| ga-node-13cr | 10.77.204.150 | 10.78.204.150 | OSD + MON |
| ga-node-14c | 10.77.204.116 | 10.78.204.116 | OSD + MON (leader) |
| ga-node-01c | — | — | OSD (problematic, OSDs deleted) |
### Initial OSDs on ga-node-08c
- osd.11, osd.13, osd.16 (all failing after reboot)
- osd.12 → confirmed on ga-node-10c (not on 08c)
### Network Configuration
- `vmbr0`: public network, standard MTU
- `vmbr1078`: cluster network, **MTU 9000** (jumbo frames)
- NICs: Mellanox ConnectX
### `/etc/ceph/ceph.conf` (identical on all nodes)
```ini
[global]
cluster_network = 10.78.204.122/24
public_network = 10.77.204.122/24
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
```
> **Note:** `public_network` and `cluster_network` use a host IP with /24 — Ceph correctly interprets the subnet.
---
## 2. Initial Symptoms
After rebooting ga-node-08c (OSDs were pre-marked `out`):
- osd.11 starts, briefly registers with the monitor, then immediately goes down
- OSD logs show: `heartbeat_check: no reply from osd.X`
- Repeating cycle: start → heartbeat failure → stop → restart
- Same behavior observed on ga-node-01c (OSDs eventually deleted)
---
## 3. Tests Performed
### 3.1 Network Connectivity (nc)
**Result:
OK in both directions**
```bash
# From another node toward ga-node-08c
nc -zv 10.77.204.117 6800 # public OSD
nc -zv 10.77.204.117 6802 # public heartbeat
nc -zv 10.78.204.117 6800 # cluster OSD
nc -zv 10.78.204.117 6802 # cluster heartbeat
# Ports tested: 6800, 6801, 6802, 6803, 6804, 6805
```
### 3.2 Firewall
**Result:
No blocking**
```bash
iptables -L -n # empty
nft list ruleset # empty
# Proxmox firewall disabled
```
### 3.3 NTP / Clock Synchronization
**Result:
Synchronized**
```bash
chronyc tracking
timedatectl
```
All nodes synchronized, negligible clock drift.
### 3.4 OSD Keyring
**Result:
Match confirmed**
```bash
# On ga-node-08c
cat /var/lib/ceph/osd/ceph-11/keyring
# Key: AQDCRHBpozx9IxAAEK327zjGBVKv5kSyP0zwlw==
# Compared with cluster
ceph auth get osd.11
# Identical
```
### 3.5 Blocklist
**Result:
No entries**
```bash
ceph osd blocklist ls
```
### 3.6 Kernel Version
**Result:
Inconclusive (not the root cause)**
- Tested with `6.8.12-18-pve` (installed version) → failure
- Tested with `6.8.12-16-pve` → same failure
- Conclusion: kernel version is **not** the cause
### 3.7 NIC Offloading Disabled
**Result:
Did not resolve the issue**
```bash
ethtool -K <interface> tx off rx off gso off gro off tso off
```
### 3.8 Jumbo Frames (MTU 9000)
**Result:
Working** (tested from ga-node-11c)
```bash
ping -M do -s 8972 10.78.204.150
# 8980 bytes from 10.78.204.150: icmp_seq=1 ttl=64 time=0.063 ms
```
> **TODO:** Test specifically **from ga-node-08c**
### 3.9 Heartbeat Debug (debug_ms=5)
**Result: Revealing**
Applied via admin socket (runtime):
```bash
ceph daemon /var/run/ceph/ceph-osd.11.asok config set debug_ms 5
```
Then persistently:
```bash
ceph config set osd.11 debug_ms 5
```
Logs in `/var/log/ceph/ceph-osd.11.log`
**Key finding:** Heartbeat **works in both directions** at the network level:
- osd.11 sends pings (`-->`) to all peers
- All peers reply with `ping_reply` (`<==`)
- All connections in `s=READY` state
- BUT `up_from 0` in osd.11's pings → deadlock (see section 5)
---
## 4. Configurations Applied
### 4.1 mClock Profile (beginning of session)
```bash
ceph config set osd osd_mclock_profile high_client_ops
```
### 4.2 osd_heartbeat_grace (temporary — removed)
```bash
# Applied only to osd.11 for debugging
ceph config set osd.11 osd_heartbeat_grace 300
# Then removed
ceph config rm osd.11 osd_heartbeat_grace
```
> **Unintended side effect:** Other OSDs also waited 300s before reporting osd.11 as dead, causing the OSD boot cycle to repeat every ~300 seconds instead of 20 seconds.
### 4.3 debug_ms (remove when done)
```bash
ceph config set osd.11 debug_ms 5
# Remove when debugging is complete:
ceph config rm osd.11 debug_ms
```
### 4.4 Recommended Permanent Fix
```bash
ceph config set osd osd_heartbeat_grace 60
```
> Increases the grace period before an OSD is reported as dead (20s → 60s). Gives ga-node-08c OSDs enough time to initialize heartbeat connections on startup.
---
## 5. Root Cause Analysis
### The Identified Deadlock
```
osd.11 starts
↓
Monitor briefly marks it UP
↓
osd.11 sends pings with up_from=0
↓
Peer OSDs ignore these pings (osd.11 is marked DOWN in the OSD map)
↓
After ~300s (grace), peer OSDs report osd.11 as failed to the monitor
↓
Monitor marks osd.11 DOWN
↓
osd.11 receives "wrongly marked me down" → kills itself
↓
Restart → cycle repeats
```
### Evidence in Monitor Logs (`ceph.log` on ga-node-14c)
```
16:41:36 - osd.11 boot
16:46:38 - osd.7, osd.12, osd.14, osd.6, osd.5 report osd.11 failed
"after 300.061138 >= grace 69.944756"
16:46:39 - osd.11 marked itself dead as of e4082
"Monitor daemon marked osd.11 down, but it is still running"
16:56:40 - osd.11 boot (2nd attempt)
→ IMMEDIATELY reports all other OSDs as failed
17:01:42 - same cycle, killed again
"after 300.249378 >= grace 119.815915"
```
### Why `up_from` Stays at 0
When an OSD boots, it sets `up_from` to the epoch at which the monitor officially marks it `up`. If the monitor keeps marking it `down` faster than the OSD can stabilize, `up_from` never gets set. Peer OSDs receiving pings with `up_from=0` treat them as invalid and do not update their heartbeat timers — which causes them to report the OSD as failed, completing the deadlock.
### Impact on VMs
Every attempt to add osd.11 causes:
- 27 PGs entering `remapped+peering` state (I/O blocked)
- Slow ops blocked for 60-70+ seconds
- **VMs paused** for the entire duration of peering
- Once osd.11 dies and the cycle repeats, the pauses repeat
---
## 6. Actions Taken
### nodown Attempt
```bash
ceph osd set nodown # prevents monitor from acting on failure reports
ceph config rm osd.11 osd_heartbeat_grace
systemctl restart ceph-osd@11
# OSD came up, cluster started rebalancing
# Stopped too early (30s) — needs 2-3 min minimum patience
ceph osd unset nodown
```
### ga-node-08c Cleanup
```bash
# OSDs 11, 13, 16 purged from cluster (already absent from OSD tree)
ceph osd purge 11 --yes-i-really-mean-it
ceph osd purge 13 --yes-i-really-mean-it
ceph osd purge 16 --yes-i-really-mean-it
# Host removed from crush map
ceph osd crush rm ga-node-08c
# Ceph services stopped on ga-node-08c (only ceph-crash.service remained)
systemctl stop ceph-crash.service
systemctl disable ceph-crash.service
```
> **Warning:** `pveceph purge` was attempted but aborted — this command destroys the **entire** cluster, not just the local node. Use manual cleanup instead.
---
## 7. Current Cluster State
```
cluster health: HEALTH_OK
osd: 14 osds: 14 up, 12 in
pools: 2 pools, 129 pgs → active+clean
data: 4.1 TiB, 13 TiB used, 8.3 TiB avail
```
### Current OSD Tree
| Node | OSDs | Status |
|---|---|---|
| ga-node-10c | 0, 4, 7, 12 | up, in |
| ga-node-11c | 14, 15 | up, **reweight=0** (slow disks, intentional) |
| ga-node-13cr | 2, 5, 8, 9 | up, in |
| ga-node-14c | 1, 3, 6, 10 | up, in |
| ga-node-08c | — | **removed** |
| ga-node-01c | — | empty crush entry |
---
## 8. Unresolved Issues
1. **Root cause unknown:** Why does ga-node-08c (and ga-node-01c) take more than 20s to establish heartbeat after reboot? Untested lead: jumbo frame ping **from** ga-node-08c specifically.
2. **ga-node-01c** has an empty crush entry to clean up:
```bash
ceph osd crush rm ga-node-01c
```
3. **debug_ms=5** may still be set in the config store for osd.11 (now deleted — verify):
```bash
ceph config rm osd.11 debug_ms
```
---
## 9. Recommended Procedure to Reintegrate ga-node-08c
When ready to recreate OSDs on ga-node-08c:
```bash
# 1. Apply global grace (if not already done)
ceph config set osd osd_heartbeat_grace 60
# 2. Prevent monitor from marking OSDs down during startup
ceph osd set nodown
# 3. Create new OSD via Proxmox UI or ceph-volume
# 4. Monitor — wait at least 2-3 MINUTES before taking any action
watch -n 2 'ceph osd tree | grep ga-node-08c'
# Terminal 2:
watch -n 2 'ceph status'
# Terminal 3 (live monitor log):
ssh 10.77.204.116 "tail -f /var/log/ceph/ceph.log | grep osd"
# 5. Once OSD is stable (up_from != 0, status up)
ceph osd unset nodown
# 6. Set in only after cluster is HEALTH_OK
ceph osd in <id>
```
---
## 10. Reference Commands
```bash
# General cluster status
ceph status
ceph osd tree
ceph health detail
# Live monitor log
ssh 10.77.204.116 "tail -f /var/log/ceph/ceph.log"
# Persistent config management
ceph config dump
ceph config get osd.X <parameter>
ceph config set osd.X <parameter> <value>
ceph config rm osd.X <parameter>
# Admin socket (runtime only, not persistent)
ceph daemon /var/run/ceph/ceph-osd.11.asok config set <param> <value>
# Cluster flags
ceph osd set nodown
ceph osd unset nodown
# Heartbeat debug
ceph config set osd.11 debug_ms 5
tail -f /var/log/ceph/ceph-osd.11.log | grep -E "ping|heartbeat"
# OSD map inspection
ceph osd dump | python3 -c "
import sys, json
d = json.load(sys.stdin)
for o in d['osds']:
if o['osd'] == 11:
print(o)
"
```
---
## 11. Key Lessons Learned
| Finding | Detail |
|---|---|
| Heartbeat network works | Both directions confirmed via debug_ms=5 logs |
| Root cause is a deadlock | `up_from=0` → peers ignore pings → failure reports → monitor marks down → repeat |
| `osd_heartbeat_grace` on a single OSD | Affects how long peer OSDs wait before reporting that specific OSD as failed |
| `nodown` flag | Breaks the deadlock by preventing the monitor from acting on failure reports |
| 30s is not enough | After adding an OSD, wait 2-3 minutes before concluding it failed |
| VM pauses during OSD ops | Caused by PGs entering `peering` state — normal but needs OSD stability to resolve |
Feb 25 16:41:29 ga-node-08c ceph-osd[1892]: 2026-02-25T16:41:29.697-0500 77896f6686c0 -1 osd.11 4078 heartbeat_check: no reply from 10.77.204.123:6802 osd.14 ever on either front or back, first ping sent 2026-02-25T16:36:56.638684-0500 (oldest deadline 2026-02-25T16:37:16.638684-0500)
Feb 25 16:41:29 ga-node-08c ceph-osd[1892]: 2026-02-25T16:41:29.697-0500 77896f6686c0 -1 osd.11 4078 heartbeat_check: no reply from 10.77.204.123:6806 osd.15 ever on either front or back, first ping sent 2026-02-25T16:36:56.638684-0500 (oldest deadline 2026-02-25T16:37:16.638684-0500)
# Ceph Debugging Documentation — ga-node-08c / ga-node-01c
**Date:** 2026-02-25
**Cluster:** Proxmox VE 8 (Debian Bookworm)
**Issue:** OSDs unable to rejoin cluster after node reboot
---
## 1. Environment
### Cluster Topology
| Node | Public IP (vmbr0) | Cluster IP (vmbr1078) | Role |
|---|---|---|---|
| ga-node-08c | 10.77.204.117 | 10.78.204.117 | OSD (problematic) |
| ga-node-10c | 10.77.204.122 | 10.78.204.122 | OSD + MGR |
| ga-node-11c | 10.77.204.123 | 10.78.204.123 | OSD + MON |
| ga-node-13cr | 10.77.204.150 | 10.78.204.150 | OSD + MON |
| ga-node-14c | 10.77.204.116 | 10.78.204.116 | OSD + MON (leader) |
| ga-node-01c | — | — | OSD (problematic, OSDs deleted) |
### Initial OSDs on ga-node-08c
- osd.11, osd.13, osd.16 (all failing after reboot)
- osd.12 → confirmed on ga-node-10c (not on 08c)
### Network Configuration
- `vmbr0`: public network, standard MTU
- `vmbr1078`: cluster network, **MTU 9000** (jumbo frames)
- NICs: Mellanox ConnectX
### `/etc/ceph/ceph.conf` (identical on all nodes)
```ini
[global]
cluster_network = 10.78.204.122/24
public_network = 10.77.204.122/24
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
```
> **Note:** `public_network` and `cluster_network` use a host IP with /24 — Ceph correctly interprets the subnet.
---
## 2. Initial Symptoms
After rebooting ga-node-08c (OSDs were pre-marked `out`):
- osd.11 starts, briefly registers with the monitor, then immediately goes down
- OSD logs show: `heartbeat_check: no reply from osd.X`
- Repeating cycle: start → heartbeat failure → stop → restart
- Same behavior observed on ga-node-01c (OSDs eventually deleted)
---
## 3. Tests Performed
### 3.1 Network Connectivity (nc)
**Result:
```bash
# From another node toward ga-node-08c
nc -zv 10.77.204.117 6800 # public OSD
nc -zv 10.77.204.117 6802 # public heartbeat
nc -zv 10.78.204.117 6800 # cluster OSD
nc -zv 10.78.204.117 6802 # cluster heartbeat
# Ports tested: 6800, 6801, 6802, 6803, 6804, 6805
```
### 3.2 Firewall
**Result:
```bash
iptables -L -n # empty
nft list ruleset # empty
# Proxmox firewall disabled
```
### 3.3 NTP / Clock Synchronization
**Result:
```bash
chronyc tracking
timedatectl
```
All nodes synchronized, negligible clock drift.
### 3.4 OSD Keyring
**Result:
```bash
# On ga-node-08c
cat /var/lib/ceph/osd/ceph-11/keyring
# Key: AQDCRHBpozx9IxAAEK327zjGBVKv5kSyP0zwlw==
# Compared with cluster
ceph auth get osd.11
# Identical
```
### 3.5 Blocklist
**Result:
```bash
ceph osd blocklist ls
```
### 3.6 Kernel Version
**Result:
- Tested with `6.8.12-18-pve` (installed version) → failure
- Tested with `6.8.12-16-pve` → same failure
- Conclusion: kernel version is **not** the cause
### 3.7 NIC Offloading Disabled
**Result:
```bash
ethtool -K <interface> tx off rx off gso off gro off tso off
```
### 3.8 Jumbo Frames (MTU 9000)
**Result:
```bash
ping -M do -s 8972 10.78.204.150
# 8980 bytes from 10.78.204.150: icmp_seq=1 ttl=64 time=0.063 ms
```
> **TODO:** Test specifically **from ga-node-08c**
### 3.9 Heartbeat Debug (debug_ms=5)
**Result: Revealing**
Applied via admin socket (runtime):
```bash
ceph daemon /var/run/ceph/ceph-osd.11.asok config set debug_ms 5
```
Then persistently:
```bash
ceph config set osd.11 debug_ms 5
```
Logs in `/var/log/ceph/ceph-osd.11.log`
**Key finding:** Heartbeat **works in both directions** at the network level:
- osd.11 sends pings (`-->`) to all peers
- All peers reply with `ping_reply` (`<==`)
- All connections in `s=READY` state
- BUT `up_from 0` in osd.11's pings → deadlock (see section 5)
---
## 4. Configurations Applied
### 4.1 mClock Profile (beginning of session)
```bash
ceph config set osd osd_mclock_profile high_client_ops
```
### 4.2 osd_heartbeat_grace (temporary — removed)
```bash
# Applied only to osd.11 for debugging
ceph config set osd.11 osd_heartbeat_grace 300
# Then removed
ceph config rm osd.11 osd_heartbeat_grace
```
> **Unintended side effect:** Other OSDs also waited 300s before reporting osd.11 as dead, causing the OSD boot cycle to repeat every ~300 seconds instead of 20 seconds.
### 4.3 debug_ms (remove when done)
```bash
ceph config set osd.11 debug_ms 5
# Remove when debugging is complete:
ceph config rm osd.11 debug_ms
```
### 4.4 Recommended Permanent Fix
```bash
ceph config set osd osd_heartbeat_grace 60
```
> Increases the grace period before an OSD is reported as dead (20s → 60s). Gives ga-node-08c OSDs enough time to initialize heartbeat connections on startup.
---
## 5. Root Cause Analysis
### The Identified Deadlock
```
osd.11 starts
↓
Monitor briefly marks it UP
↓
osd.11 sends pings with up_from=0
↓
Peer OSDs ignore these pings (osd.11 is marked DOWN in the OSD map)
↓
After ~300s (grace), peer OSDs report osd.11 as failed to the monitor
↓
Monitor marks osd.11 DOWN
↓
osd.11 receives "wrongly marked me down" → kills itself
↓
Restart → cycle repeats
```
### Evidence in Monitor Logs (`ceph.log` on ga-node-14c)
```
16:41:36 - osd.11 boot
16:46:38 - osd.7, osd.12, osd.14, osd.6, osd.5 report osd.11 failed
"after 300.061138 >= grace 69.944756"
16:46:39 - osd.11 marked itself dead as of e4082
"Monitor daemon marked osd.11 down, but it is still running"
16:56:40 - osd.11 boot (2nd attempt)
→ IMMEDIATELY reports all other OSDs as failed
17:01:42 - same cycle, killed again
"after 300.249378 >= grace 119.815915"
```
### Why `up_from` Stays at 0
When an OSD boots, it sets `up_from` to the epoch at which the monitor officially marks it `up`. If the monitor keeps marking it `down` faster than the OSD can stabilize, `up_from` never gets set. Peer OSDs receiving pings with `up_from=0` treat them as invalid and do not update their heartbeat timers — which causes them to report the OSD as failed, completing the deadlock.
### Impact on VMs
Every attempt to add osd.11 causes:
- 27 PGs entering `remapped+peering` state (I/O blocked)
- Slow ops blocked for 60-70+ seconds
- **VMs paused** for the entire duration of peering
- Once osd.11 dies and the cycle repeats, the pauses repeat
---
## 6. Actions Taken
### nodown Attempt
```bash
ceph osd set nodown # prevents monitor from acting on failure reports
ceph config rm osd.11 osd_heartbeat_grace
systemctl restart ceph-osd@11
# OSD came up, cluster started rebalancing
# Stopped too early (30s) — needs 2-3 min minimum patience
ceph osd unset nodown
```
### ga-node-08c Cleanup
```bash
# OSDs 11, 13, 16 purged from cluster (already absent from OSD tree)
ceph osd purge 11 --yes-i-really-mean-it
ceph osd purge 13 --yes-i-really-mean-it
ceph osd purge 16 --yes-i-really-mean-it
# Host removed from crush map
ceph osd crush rm ga-node-08c
# Ceph services stopped on ga-node-08c (only ceph-crash.service remained)
systemctl stop ceph-crash.service
systemctl disable ceph-crash.service
```
> **Warning:** `pveceph purge` was attempted but aborted — this command destroys the **entire** cluster, not just the local node. Use manual cleanup instead.
---
## 7. Current Cluster State
```
cluster health: HEALTH_OK
osd: 14 osds: 14 up, 12 in
pools: 2 pools, 129 pgs → active+clean
data: 4.1 TiB, 13 TiB used, 8.3 TiB avail
```
### Current OSD Tree
| Node | OSDs | Status |
|---|---|---|
| ga-node-10c | 0, 4, 7, 12 | up, in |
| ga-node-11c | 14, 15 | up, **reweight=0** (slow disks, intentional) |
| ga-node-13cr | 2, 5, 8, 9 | up, in |
| ga-node-14c | 1, 3, 6, 10 | up, in |
| ga-node-08c | — | **removed** |
| ga-node-01c | — | empty crush entry |
---
## 8. Unresolved Issues
1. **Root cause unknown:** Why does ga-node-08c (and ga-node-01c) take more than 20s to establish heartbeat after reboot? Untested lead: jumbo frame ping **from** ga-node-08c specifically.
2. **ga-node-01c** has an empty crush entry to clean up:
```bash
ceph osd crush rm ga-node-01c
```
3. **debug_ms=5** may still be set in the config store for osd.11 (now deleted — verify):
```bash
ceph config rm osd.11 debug_ms
```
---
## 9. Recommended Procedure to Reintegrate ga-node-08c
When ready to recreate OSDs on ga-node-08c:
```bash
# 1. Apply global grace (if not already done)
ceph config set osd osd_heartbeat_grace 60
# 2. Prevent monitor from marking OSDs down during startup
ceph osd set nodown
# 3. Create new OSD via Proxmox UI or ceph-volume
# 4. Monitor — wait at least 2-3 MINUTES before taking any action
watch -n 2 'ceph osd tree | grep ga-node-08c'
# Terminal 2:
watch -n 2 'ceph status'
# Terminal 3 (live monitor log):
ssh 10.77.204.116 "tail -f /var/log/ceph/ceph.log | grep osd"
# 5. Once OSD is stable (up_from != 0, status up)
ceph osd unset nodown
# 6. Set in only after cluster is HEALTH_OK
ceph osd in <id>
```
---
## 10. Reference Commands
```bash
# General cluster status
ceph status
ceph osd tree
ceph health detail
# Live monitor log
ssh 10.77.204.116 "tail -f /var/log/ceph/ceph.log"
# Persistent config management
ceph config dump
ceph config get osd.X <parameter>
ceph config set osd.X <parameter> <value>
ceph config rm osd.X <parameter>
# Admin socket (runtime only, not persistent)
ceph daemon /var/run/ceph/ceph-osd.11.asok config set <param> <value>
# Cluster flags
ceph osd set nodown
ceph osd unset nodown
# Heartbeat debug
ceph config set osd.11 debug_ms 5
tail -f /var/log/ceph/ceph-osd.11.log | grep -E "ping|heartbeat"
# OSD map inspection
ceph osd dump | python3 -c "
import sys, json
d = json.load(sys.stdin)
for o in d['osds']:
if o['osd'] == 11:
print(o)
"
```
---
## 11. Key Lessons Learned
| Finding | Detail |
|---|---|
| Heartbeat network works | Both directions confirmed via debug_ms=5 logs |
| Root cause is a deadlock | `up_from=0` → peers ignore pings → failure reports → monitor marks down → repeat |
| `osd_heartbeat_grace` on a single OSD | Affects how long peer OSDs wait before reporting that specific OSD as failed |
| `nodown` flag | Breaks the deadlock by preventing the monitor from acting on failure reports |
| 30s is not enough | After adding an OSD, wait 2-3 minutes before concluding it failed |
| VM pauses during OSD ops | Caused by PGs entering `peering` state — normal but needs OSD stability to resolve |