Hi,
I'm in the process of setting up a new cluster with 3 nodes. The nodes have a pair of 3.5 TiB NVMe disks each that I use for ceph. I've split them into four logical osds each through
ceph-volume lvm batch --bluestore --osds-per-device=4 /dev/nvme0n1 /dev/nvme1n1
so I have eight OSDs per Server. So far, so good:
The servers have a dual 10gbit NIC. One 10g link was intended as both ceph public network and VM LAN bridge, but right now, there are no VMs, and there is no traffic other than ceph. The other 10g link is set as the ceph cluster network.
Next, I set up a pool (2/3 min size/size) for running benchmarks - mostly trying out different results for running the disks as single OSDs, 2 per device, 4 per device, even 8 per device, and got my best results at 4 OSDs per device. However, no matter what I put in for number of PGs, I get ceph degraded errors sporadically - when writing or even when just reading.
An example output follows, and I have no idea why this happens. I have dd-written over all of the drives several times, and dd-read them back, too, never with a single error. SMART values are squeaky clean:
Why is this happening, what can I check, how do I get my ceph to perform stably? Sample log excerpt:
I'm in the process of setting up a new cluster with 3 nodes. The nodes have a pair of 3.5 TiB NVMe disks each that I use for ceph. I've split them into four logical osds each through
ceph-volume lvm batch --bluestore --osds-per-device=4 /dev/nvme0n1 /dev/nvme1n1
so I have eight OSDs per Server. So far, so good:
The servers have a dual 10gbit NIC. One 10g link was intended as both ceph public network and VM LAN bridge, but right now, there are no VMs, and there is no traffic other than ceph. The other 10g link is set as the ceph cluster network.
Next, I set up a pool (2/3 min size/size) for running benchmarks - mostly trying out different results for running the disks as single OSDs, 2 per device, 4 per device, even 8 per device, and got my best results at 4 OSDs per device. However, no matter what I put in for number of PGs, I get ceph degraded errors sporadically - when writing or even when just reading.
An example output follows, and I have no idea why this happens. I have dd-written over all of the drives several times, and dd-read them back, too, never with a single error. SMART values are squeaky clean:
Code:
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 34 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 1,585,713 [811 GB]
Data Units Written: 55,039,679 [28.1 TB]
Host Read Commands: 10,258,506
Host Write Commands: 98,255,852
Controller Busy Time: 131
Power Cycles: 5
Power On Hours: 218
Unsafe Shutdowns: 0
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 34 Celsius
Temperature Sensor 2: 43 Celsius
Why is this happening, what can I check, how do I get my ceph to perform stably? Sample log excerpt:
Code:
2021-08-29T15:27:21.932925+0200 mgr.galactica (mgr.14113) 5932 : cluster [DBG] pgmap v5986: 64 pgs: 1 remapped+peering, 63 active+clean; 1.8 GiB data, 34 GiB used, 21 TiB / 21 TiB avail; 84 MiB/s wr, 21.46k op/s
2021-08-29T15:27:22.071589+0200 mon.galactica (mon.0) 4177 : cluster [DBG] osdmap e358: 24 total, 24 up, 24 in
2021-08-29T15:27:23.933412+0200 mgr.galactica (mgr.14113) 5934 : cluster [DBG] pgmap v5988: 64 pgs: 1 active+recovering+undersized+remapped, 1 active+recovery_wait+undersized+degraded+remapped, 1 remapped+peering, 61 active+clean; 1.9 GiB data, 34 GiB used, 21 TiB / 21 TiB avail; 70 MiB/s wr, 17.88k op/s; 7707/1505487 objects degraded (0.512%); 7745/1505487 objects misplaced (0.514%); 59 KiB/s, 14 objects/s recovering
2021-08-29T15:27:24.612333+0200 mon.galactica (mon.0) 4186 : cluster [WRN] Health check failed: Degraded data redundancy: 7707/1505487 objects degraded (0.512%), 1 pg degraded (PG_DEGRADED)
2021-08-29T15:27:25.933867+0200 mgr.galactica (mgr.14113) 5935 : cluster [DBG] pgmap v5989: 64 pgs: 2 active+recovering+undersized+remapped, 1 active+recovery_wait+undersized+degraded+remapped, 1 remapped+peering, 60 active+clean; 2.0 GiB data, 34 GiB used, 21 TiB / 21 TiB avail; 32 MiB/s wr, 8.18k op/s; 7707/1544034 objects degraded (0.499%); 15346/1544034 objects misplaced (0.994%); 196 KiB/s, 48 objects/s recovering
2021-08-29T15:27:27.934322+0200 mgr.galactica (mgr.14113) 5937 : cluster [DBG] pgmap v5990: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 2.0 GiB data, 34 GiB used, 21 TiB / 21 TiB avail; 31 MiB/s wr, 8.04k op/s; 7707/1586793 objects degraded (0.486%); 22771/1586793 objects misplaced (1.435%); 435 KiB/s, 108 objects/s recovering
2021-08-29T15:27:29.934807+0200 mgr.galactica (mgr.14113) 5939 : cluster [DBG] pgmap v5991: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 2.1 GiB data, 34 GiB used, 21 TiB / 21 TiB avail; 39 MiB/s wr, 10.05k op/s; 7707/1683261 objects degraded (0.458%); 21793/1683261 objects misplaced (1.295%); 827 KiB/s, 206 objects/s recovering
2021-08-29T15:27:30.085010+0200 mon.galactica (mon.0) 4199 : cluster [WRN] Health check update: Degraded data redundancy: 7707/1683261 objects degraded (0.458%), 1 pg degraded (PG_DEGRADED)
2021-08-29T15:27:31.935238+0200 mgr.galactica (mgr.14113) 5940 : cluster [DBG] pgmap v5992: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 2.2 GiB data, 35 GiB used, 21 TiB / 21 TiB avail; 35 MiB/s wr, 8.91k op/s; 7707/1717110 objects degraded (0.449%); 21053/1717110 objects misplaced (1.226%); 1.0 MiB/s, 257 objects/s recovering
2021-08-29T15:27:33.935723+0200 mgr.galactica (mgr.14113) 5942 : cluster [DBG] pgmap v5993: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 2.3 GiB data, 35 GiB used, 21 TiB / 21 TiB avail; 37 MiB/s wr, 9.37k op/s; 7707/1783098 objects degraded (0.432%); 20528/1783098 objects misplaced (1.151%); 1.0 MiB/s, 261 objects/s recovering
2021-08-29T15:27:35.936166+0200 mgr.galactica (mgr.14113) 5944 : cluster [DBG] pgmap v5994: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 2.4 GiB data, 35 GiB used, 21 TiB / 21 TiB avail; 37 MiB/s wr, 9.60k op/s; 7707/1851117 objects degraded (0.416%); 20032/1851117 objects misplaced (1.082%); 1.1 MiB/s, 292 objects/s recovering
2021-08-29T15:27:36.100152+0200 mon.galactica (mon.0) 4212 : cluster [WRN] Health check update: Degraded data redundancy: 7707/1851117 objects degraded (0.416%), 1 pg degraded (PG_DEGRADED)
2021-08-29T15:27:37.936585+0200 mgr.galactica (mgr.14113) 5945 : cluster [DBG] pgmap v5995: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 2.4 GiB data, 35 GiB used, 21 TiB / 21 TiB avail; 39 MiB/s wr, 9.94k op/s; 7707/1901892 objects degraded (0.405%); 19225/1901892 objects misplaced (1.011%); 1.3 MiB/s, 342 objects/s recovering
2021-08-29T15:27:39.937115+0200 mgr.galactica (mgr.14113) 5947 : cluster [DBG] pgmap v5996: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 2.5 GiB data, 35 GiB used, 21 TiB / 21 TiB avail; 45 MiB/s wr, 11.53k op/s; 7707/2001930 objects degraded (0.385%); 18054/2001930 objects misplaced (0.902%); 1.5 MiB/s, 392 objects/s recovering
2021-08-29T15:27:41.937539+0200 mgr.galactica (mgr.14113) 5948 : cluster [DBG] pgmap v5997: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 2.6 GiB data, 35 GiB used, 21 TiB / 21 TiB avail; 38 MiB/s wr, 9.82k op/s; 7707/2036769 objects degraded (0.378%); 17055/2036769 objects misplaced (0.837%); 1.5 MiB/s, 394 objects/s recovering
2021-08-29T15:27:42.114866+0200 mon.galactica (mon.0) 4217 : cluster [WRN] Health check update: Degraded data redundancy: 7707/2036769 objects degraded (0.378%), 1 pg degraded (PG_DEGRADED)
2021-08-29T15:27:43.938018+0200 mgr.galactica (mgr.14113) 5950 : cluster [DBG] pgmap v5998: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 2.7 GiB data, 36 GiB used, 21 TiB / 21 TiB avail; 42 MiB/s wr, 10.75k op/s; 7707/2104119 objects degraded (0.366%); 16332/2104119 objects misplaced (0.776%); 1.5 MiB/s, 393 objects/s recovering
2021-08-29T15:27:45.938504+0200 mgr.galactica (mgr.14113) 5952 : cluster [DBG] pgmap v5999: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 2.8 GiB data, 36 GiB used, 21 TiB / 21 TiB avail; 42 MiB/s wr, 10.84k op/s; 7707/2173458 objects degraded (0.355%); 15663/2173458 objects misplaced (0.721%); 1.6 MiB/s, 405 objects/s recovering
2021-08-29T15:27:47.989334+0200 mon.galactica (mon.0) 4222 : cluster [WRN] Health check update: Degraded data redundancy: 7707/2226720 objects degraded (0.346%), 1 pg degraded (PG_DEGRADED)
2021-08-29T15:27:47.938929+0200 mgr.galactica (mgr.14113) 5953 : cluster [DBG] pgmap v6000: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 2.8 GiB data, 36 GiB used, 21 TiB / 21 TiB avail; 41 MiB/s wr, 10.43k op/s; 7707/2226720 objects degraded (0.346%); 14347/2226720 objects misplaced (0.644%); 1.9 MiB/s, 473 objects/s recovering
2021-08-29T15:27:49.939396+0200 mgr.galactica (mgr.14113) 5955 : cluster [DBG] pgmap v6001: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 3.0 GiB data, 37 GiB used, 21 TiB / 21 TiB avail; 47 MiB/s wr, 12.10k op/s; 7707/2337432 objects degraded (0.330%); 12628/2337432 objects misplaced (0.540%); 2.1 MiB/s, 549 objects/s recovering
2021-08-29T15:27:51.939783+0200 mgr.galactica (mgr.14113) 5957 : cluster [DBG] pgmap v6002: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 3.0 GiB data, 37 GiB used, 21 TiB / 21 TiB avail; 41 MiB/s wr, 10.42k op/s; 7707/2377251 objects degraded (0.324%); 10626/2377251 objects misplaced (0.447%); 2.4 MiB/s, 618 objects/s recovering
2021-08-29T15:27:53.616399+0200 mon.galactica (mon.0) 4231 : cluster [WRN] Health check update: Degraded data redundancy: 7707/2377251 objects degraded (0.324%), 1 pg degraded (PG_DEGRADED)
2021-08-29T15:27:53.940259+0200 mgr.galactica (mgr.14113) 5958 : cluster [DBG] pgmap v6003: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 3.1 GiB data, 37 GiB used, 21 TiB / 21 TiB avail; 46 MiB/s wr, 11.65k op/s; 7707/2456202 objects degraded (0.314%); 9405/2456202 objects misplaced (0.383%); 2.5 MiB/s, 637 objects/s recovering
2021-08-29T15:27:54.625745+0200 mon.galactica (mon.0) 4232 : cluster [DBG] osdmap e359: 24 total, 24 up, 24 in
2021-08-29T15:27:55.627295+0200 mon.galactica (mon.0) 4233 : cluster [DBG] osdmap e360: 24 total, 24 up, 24 in
2021-08-29T15:27:55.940627+0200 mgr.galactica (mgr.14113) 5960 : cluster [DBG] pgmap v6006: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 3 active+recovering+undersized+remapped, 60 active+clean; 3.2 GiB data, 37 GiB used, 21 TiB / 21 TiB avail; 50 MiB/s wr, 12.83k op/s; 7707/2534712 objects degraded (0.304%); 8298/2534712 objects misplaced (0.327%); 3.0 MiB/s, 755 objects/s recovering
2021-08-29T15:27:57.941096+0200 mgr.galactica (mgr.14113) 5962 : cluster [DBG] pgmap v6007: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 2 active+recovering+undersized+remapped, 61 active+clean; 3.3 GiB data, 38 GiB used, 21 TiB / 21 TiB avail; 40 MiB/s wr, 10.20k op/s; 7707/2582295 objects degraded (0.298%); 6737/2582295 objects misplaced (0.261%); 2.9 MiB/s, 736 objects/s recovering
2021-08-29T15:27:58.617479+0200 mon.galactica (mon.0) 4238 : cluster [WRN] Health check update: Degraded data redundancy: 7707/2582295 objects degraded (0.298%), 1 pg degraded (PG_DEGRADED)
2021-08-29T15:27:59.941653+0200 mgr.galactica (mgr.14113) 5963 : cluster [DBG] pgmap v6008: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 2 active+recovering+undersized+remapped, 61 active+clean; 3.4 GiB data, 38 GiB used, 21 TiB / 21 TiB avail; 43 MiB/s wr, 10.92k op/s; 7707/2639313 objects degraded (0.292%); 1130/2639313 objects misplaced (0.043%); 4.6 MiB/s, 1.19k objects/s recovering
2021-08-29T15:28:00.100446+0200 mon.galactica (mon.0) 4239 : cluster [DBG] osdmap e361: 24 total, 24 up, 24 in
2021-08-29T15:28:01.106141+0200 mon.galactica (mon.0) 4244 : cluster [DBG] osdmap e362: 24 total, 24 up, 24 in
2021-08-29T15:28:01.942002+0200 mgr.galactica (mgr.14113) 5965 : cluster [DBG] pgmap v6011: 64 pgs: 1 active+recovery_wait+undersized+degraded+remapped, 2 active+recovering+undersized+remapped, 61 active+clean; 3.4 GiB data, 38 GiB used, 21 TiB / 21 TiB avail; 24 MiB/s wr, 6.17k op/s; 7707/2646573 objects degraded (0.291%); 1130/2646573 objects misplaced (0.043%); 4.4 MiB/s, 1.13k objects/s recovering
2021-08-29T15:28:02.108974+0200 mon.galactica (mon.0) 4245 : cluster [DBG] osdmap e363: 24 total, 24 up, 24 in
2021-08-29T15:28:03.618456+0200 mon.galactica (mon.0) 4246 : cluster [WRN] Health check update: Degraded data redundancy: 7707/2646573 objects degraded (0.291%), 1 pg degraded (PG_DEGRADED)
2021-08-29T15:28:03.942377+0200 mgr.galactica (mgr.14113) 5966 : cluster [DBG] pgmap v6013: 64 pgs: 1 active+clean+wait, 2 active+recovering+undersized+remapped, 61 active+clean; 3.4 GiB data, 37 GiB used, 21 TiB / 21 TiB avail; 14 MiB/s wr, 3.57k op/s; 7123/2646573 objects misplaced (0.269%); 4.8 MiB/s, 1.22k objects/s recovering
2021-08-29T15:28:04.620975+0200 mon.galactica (mon.0) 4251 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 7707/2646573 objects degraded (0.291%), 1 pg degraded)
2021-08-29T15:28:04.620990+0200 mon.galactica (mon.0) 4252 : cluster [INF] Cluster is now healthy
Last edited: