[SOLVED] Ceph issue after replacing OSD

fl0wfr

New Member
Oct 18, 2021
2
0
1
38
Hi,

I have a 3 node cluster running Proxmox 6.4-8 with Ceph. 2 of the 3 nodes have 1.2TB for Ceph (each node has one 1.2TB disk for OSD, one 1.2TB disk for DB), the third node has the same configuration but with 900GB disks. I decided to stop, out and destroy the 900GB OSD to replace with 1.2TB drives. I created the new OSD and DB and the rebuild started, but I noticed that it has now stopped.
The managers are not running on the 3 nodes and I cannot start it. Ceph status is:
HEALTH_WARN:
no active mgr
Degraded data redundancy: 103597/612570 objects degraded (16.912%), 66 pgs degraded, 66 pgs undersizedpg 1.0 is stuck undersized for 23563.092456, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.3 is stuck undersized for 23563.098549, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.7 is stuck undersized for 23563.090415, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.8 is stuck undersized for 23563.079996, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.9 is stuck undersized for 23563.121097, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.d is stuck undersized for 23563.074253, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.10 is stuck undersized for 23563.102468, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.12 is stuck undersized for 23563.104936, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.13 is stuck undersized for 23563.085147, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.15 is stuck undersized for 23563.080369, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.17 is stuck undersized for 23563.073643, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,0]
pg 1.19 is stuck undersized for 23563.078630, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,0]
pg 1.1b is stuck undersized for 23563.076137, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,0]
pg 1.1d is stuck undersized for 23563.107707, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.1e is stuck undersized for 23563.096508, current state active+undersized+degraded+remapped+backfilling, last acting [2,0]
pg 1.1f is stuck undersized for 23563.091719, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.21 is stuck undersized for 23563.092385, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.22 is stuck undersized for 23563.125558, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.23 is stuck undersized for 23563.087485, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,0]
pg 1.25 is stuck undersized for 23563.112149, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.27 is stuck undersized for 23563.089720, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.28 is stuck undersized for 23563.117410, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.2a is stuck undersized for 23563.086187, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.2c is stuck undersized for 23563.116624, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.2d is stuck undersized for 23563.086257, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.2e is stuck undersized for 23563.085613, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.2f is stuck undersized for 23563.107829, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.30 is stuck undersized for 23563.068805, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.31 is stuck undersized for 23563.076131, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.36 is active+undersized+degraded+remapped+backfill_wait, acting [0,2]
pg 1.50 is stuck undersized for 23563.080087, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.51 is stuck undersized for 23563.124589, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.52 is stuck undersized for 23563.093492, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,0]
pg 1.57 is stuck undersized for 23563.103923, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.5a is stuck undersized for 23563.098697, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.5b is stuck undersized for 23563.089283, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,0]
pg 1.5e is stuck undersized for 23563.098997, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.60 is stuck undersized for 23563.133911, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.61 is stuck undersized for 23563.121044, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.65 is stuck undersized for 23563.118895, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.67 is stuck undersized for 23563.123614, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.6b is stuck undersized for 23563.094855, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.6c is stuck undersized for 23563.111168, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.6f is stuck undersized for 23563.113243, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.73 is stuck undersized for 23563.129722, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.74 is stuck undersized for 23563.129205, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.77 is stuck undersized for 23563.129954, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.79 is stuck undersized for 23563.108296, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.7a is stuck undersized for 23563.099325, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.7d is stuck undersized for 23563.123545, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]
pg 1.7e is stuck undersized for 23563.075480, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,2]

Any idea how to finish the rebuild and start the managers?

Best regards,
 
Found it, an error in zabbix configuration in Ceph manager, after disabling zabbix in Ceph, I was able to start the managers, rebuild is working again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!