ceph-osd OOM

Fug1 · Tuesday at 21:18

I have a 3-node PVE 7.4-18 cluster running Ceph 15.2.17. There is one OSD per node, so pretty simple. I'm using 3 replicas, so the data should basically be mirrored across all OSDs in the cluster.

Everything has been running fine for months, but I've suddenly lost the ability to get my OSDs up and running.

The ceph-osd on each node keeps crashing on startup, and it looks like it's being killed by the Linux OOM killer:

[ 4530.421204] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=system-ceph\x2dosd.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/system-ceph\x2dosd.slice/[EMAIL]ceph-osd@4.service[/EMAIL],task=ceph-osd,pid=37704,uid=64045

[ 4530.421315] Out of memory: Killed process 37704 (ceph-osd) total-vm:39459496kB, anon-rss:31373092kB, file-rss:756kB, shmem-rss:0kB, UID:64045 pgtables:76112kB oom_score_adj:0

In the ceph-osd log, the last message before the crash is like this:

2025-01-07T14:35:42.066-0500 7f3bf2418d80 0 osd.4 36581 load_pgs

This seemed to come on suddenly and it's affecting all OSDs in the cluster. So I guess it must be something to do with the data, and I wonder if there's a way to recover it.

I found this article and wonder if I should go through this process, but wanted to find out if anyone had experienced anything similar.

https://www.croit.io/blog/how-to-solve-the-oom-killer-process-from-killing-your-osds

Happy to provide any other detail, but I'm not sure what else would be helpful.

TIA!

Fug1 · Tuesday at 23:03

A couple of additional data points:

Two of the nodes have 32GB of memory, the other has 64GB of memory. All three nodes are experiencing the ceph-osd OOM issue.

osd_memory_target for the OSDs appears to be 4GB

ceph config get osd osd_memory_target
4294967296

aaron · 2025-01-08T10:41:57+0100

Ceph version 15 is already quite a few years old. I do remember that there used to be an occasional issue with OOM, but it has been to long for any details.

Fug1 said:
I found this article and wonder if I should go through this process, but wanted to find out if anyone had experienced anything similar.

Doesn't hurt to test it on one of the OSDs.

Fug1 · 2025-01-08T18:42:35+0100

Yes, I really need to upgrade but can't do that while the cluster is unhealthy.

I tried to go through the process documented on that webpage, but it's lacking some detail.

The command:

while read pg; do echo $pg; ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-{OSD-ID} --op log --pgid $pg &gt; pglog.json; jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)' &lt; pglog.json; done &lt; /root/osd.{OSD-ID}.pgs.txt 2&gt;&amp;1 | tee dups.log

creates a file called dups.log that contains two numbers for each pg. The first number seems to be the number of log entries and the second seems to be the number of duplicate log entries.

In my case, it output the below information for pg_id 18.10:
18.10
2363
9486541

Which suggests that pg_id 18.10 has 2363 log entries and 9486541 duplicate log entries. There are other pgs with a high number of duplicates, but this one is the highest.

The next step in the process is to trim the duplicate log entries with the command:

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-{OSD-ID} --op trim-pg-log-dups --pgid {PG-ID} --osd_max_pg_log_entries=100 --osd_pg_log_dups_tracked=100 --osd_pg_log_trim_max=500000

But it doesn't really indicate what an acceptable number of duplicates would be. Should I just run it on one pg at a time, starting with the one with the most duplicates, and see if the OSD is able to start after trimming each pg? Or should there normally be 0 duplicates and I need to trim any pg with any duplicates?

Fug1 · 2025-01-08T19:02:42+0100

Apparently Ceph Quincy has a log entry that suggests running this command if the number of duplicates exceeds 6,000.

P

Thread 'PVE7.2: errors in OSD logs after upgrade from Ceph Pacific to Quincy'

Oct 24, 2022

Hello everyone,

last weekend I did upgrade my cluster from Ceph Pacific 16.2.9 to Quincy 17.2.4 without any issues.

Ceph status is HEALTH_OK and everything is working.

While checking the logs I found plenty of this line in the OSD logs:

2022-10-22T16:01:09.412+0200 7f727824f240  0 read_log_and_missing WARN num of dups exceeded 6000. You can be hit by THE DUPS BUG https://tracker.ceph.com/issues/53729. Consider ceph-objectstore-tool --op trim-pg-log-dups

According to the Ceph release notes [0] "a new offline mechanism has been added: ceph-objectstore-tool now has a...

Fug1 · 2025-01-09T01:25:03+0100

I ran the trim on all PGs in all OSDs where the duplicate entries were greater than 6,000. That seems to have done the trick, my OSDs can now start and my cluster is healthy.

Neobin · 2025-01-09T02:26:27+0100

Glad to read, that you got your problem solved.

Now, you can have some fun

:

Good luck and do not forget your backups!

Search

Search

ceph-osd OOM

Fug1

Active Member

Fug1

Active Member

aaron

Proxmox Staff Member

Fug1

Active Member

Fug1

Active Member

Thread 'PVE7.2: errors in OSD logs after upgrade from Ceph Pacific to Quincy'

Fug1

Active Member

Neobin

Distinguished Member