We migrated our existing FileStore OSDs to BlueStore successfully on 2 out of 5 hosts before deciding to change the SSD partition sizes that BlueStore DB and WAL sit on. We previously sized them at 10GB, then they were journals for the HDD FileStore OSDs and wanted to increase them to 60GB partitions to ensure BlueStore DBs never overflow back on to HDDs.
Our RBD pools are configured with 3 replicas and we naturally checked that Ceph was healthy before marking all OSDs in a host as out, destroying them and then re-creating them after repartitioning the SSDs.
This worked perfectly on the first host and everything replicated again. When we started this on the second host, after again confirming that Ceph was healthy we observed that one object was unfound. The Ceph crush map should avoid placing multiple copies of the same object on a single OSD and should only confirm the write once it's landed on all replicas, correct?
Ceph healthy before we started:
Took 4 OSDs on host kvm5c offline and destroyed them. All pools run as 3/1 so I assume all data to be replicated thrice on different hosts. Health however reports one object as unfound:
We listed the missing object in the affected placement group:
Next we looked up which RBD image this affected:
VM 142 had locked up so we turned it off and told Ceph to delete the missing data:
We finally booted the VM (Linux) using a rescue ISO image and ran file system integrity tests, which luckily didn't yield any errors...
Our RBD pools are configured with 3 replicas and we naturally checked that Ceph was healthy before marking all OSDs in a host as out, destroying them and then re-creating them after repartitioning the SSDs.
This worked perfectly on the first host and everything replicated again. When we started this on the second host, after again confirming that Ceph was healthy we observed that one object was unfound. The Ceph crush map should avoid placing multiple copies of the same object on a single OSD and should only confirm the write once it's landed on all replicas, correct?
Ceph healthy before we started:
Code:
[admin@kvm5c ~]# ceph -s
cluster:
id: a3f1c21f-f883-48e0-9bd2-4f869c72b17d
health: HEALTH_WARN
noout flag(s) set
services:
mon: 3 daemons, quorum 1,2,3
mgr: kvm5b(active), standbys: kvm5c, kvm5d
mds: cephfs-1/1/1 up {0=kvm5b=up:active}, 2 up:standby
osd: 20 osds: 20 up, 20 in
flags noout
data:
pools: 3 pools, 592 pgs
objects: 1202k objects, 4609 GB
usage: 14377 GB used, 23350 GB / 37728 GB avail
pgs: 589 active+clean
3 active+clean+scrubbing+deep
io:
client: 414 kB/s rd, 8770 kB/s wr, 117 op/s rd, 971 op/s wr
Took 4 OSDs on host kvm5c offline and destroyed them. All pools run as 3/1 so I assume all data to be replicated thrice on different hosts. Health however reports one object as unfound:
Code:
[admin@kvm5c ~]# ceph -s
cluster:
id: a3f1c21f-f883-48e0-9bd2-4f869c72b17d
health: HEALTH_WARN
noout flag(s) set
1/1231025 objects unfound (0.000%)
Degraded data redundancy: 718772/3689399 objects degraded (19.482%), 336 pgs unclean, 336 pgs degraded, 319 pgs undersized
services:
mon: 3 daemons, quorum 1,2,3
mgr: kvm5b(active), standbys: kvm5c, kvm5d
mds: cephfs-1/1/1 up {0=kvm5b=up:active}, 2 up:standby
osd: 20 osds: 16 up, 16 in; 319 remapped pgs
flags noout
data:
pools: 3 pools, 592 pgs
objects: 1202k objects, 4610 GB
usage: 11678 GB used, 18601 GB / 30280 GB avail
pgs: 718772/3689399 objects degraded (19.482%)
1/1231025 objects unfound (0.000%)
287 active+undersized+degraded+remapped+backfill_wait
256 active+clean
23 active+undersized+degraded+remapped+backfilling
17 active+recovery_wait+degraded
9 active+recovery_wait+undersized+degraded+remapped
io:
client: 1130 kB/s rd, 12092 kB/s wr, 177 op/s rd, 835 op/s wr
recovery: 232 MB/s, 60 objects/s
We listed the missing object in the affected placement group:
Code:
[admin@kvm5c ~]# ceph health detail
pg 0.177 has 1 unfound objects
[admin@kvm5c ~]# ceph pg 0.177 list_missing
{
"offset": {
"oid": "",
"key": "",
"snapid": 0,
"hash": 0,
"max": 0,
"pool": -9223372036854775808,
"namespace": ""
},
"num_missing": 1,
"num_unfound": 1,
"objects": [
{
"oid": {
"oid": "rbd_data.3338a3238e1f29.00000000000006f6",
"key": "",
"snapid": -2,
"hash": 3281629559,
"max": 0,
"pool": 0,
"namespace": ""
},
"need": "8993'34050777",
"have": "8993'34050774",
"flags": "none",
"locations": []
}
],
"more": false
}
Next we looked up which RBD image this affected:
Code:
[admin@kvm5c ~]# rbd --pool rbd ls | while read image ; do rbd --pool rbd info $image; done | grep -C 5 3338a3238e1f29
rbd image 'vm-142-disk-1':
size 102400 MB in 25600 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.3338a3238e1f29
format: 2
features: layering
flags:
VM 142 had locked up so we turned it off and told Ceph to delete the missing data:
Code:
ceph pg 0.177 mark_unfound_lost delete
We finally booted the VM (Linux) using a rescue ISO image and ran file system integrity tests, which luckily didn't yield any errors...