I am having issues with journalctl consuming way too much cpu power on one of my PVE nodes. First i want to say that I am not certain if this is normal or not. I will provide some technical information with the hope that someone can assist me resolving this issue. I have 2 clusters each with 4 nodes and this is not the first time this issue is popping up. THe other servers have 1-2 journalctl processes at most but every now and then the journalctl seems to be going out of control.
Basically what the issue is that these processes are dealing with logging the OSD logs:
#ps -o pid,cmd -p 145925
PID CMD
145925 journalctl -u ceph-osd@* --since 11 minutes ago
#strace -p 147260
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 26, 0x25a0000) = 0x7ffa3e1b8000
munmap(0x7ffa3f1b8000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 5, 0x878000) = 0x7ffa3f1b8000
munmap(0x7ffa3b411000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 6, 0x32a1000) = 0x7ffa3b411000
munmap(0x7ffa391d0000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 10, 0x602000) = 0x7ffa391d0000
munmap(0x7ffa3ac11000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 12, 0x2f3a000) = 0x7ffa3ac11000
munmap(0x7ffa381d0000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 12, 0x41bd000) = 0x7ffa381d0000
munmap(0x7ffa3c411000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 15, 0x1c5c000) = 0x7ffa3c411000
munmap(0x7ffa3cc11000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 16, 0x5513000) = 0x7ffa3cc11000
munmap(0x7ffa399d0000, 8388608) = 0
mmap(NULL, 4972544, PROT_READ, MAP_SHARED, 16, 0x7b42000) = 0x7ffa3dcfa000
munmap(0x7ffa3bc11000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 17, 0x2c33000) = 0x7ffa3bc11000
munmap(0x7ffa3e1b8000, 8388608) = 0
From what i have gathered journalctl is mapping portions of Ceph OSD log files into memory to efficiently read and process them into logs. This should be a normal process except there seem to be some kind of an issue with it since the CPU seems to be under very heavy load on a cluster which isn't that heavily used.
I reduced the logs for the osd to 0/0
#ceph config get osd debug_osd
0/0
That didn't help at all.
I also tried to limit the CPUs which journalctl can use by editing /etc/systemd/journald.conf and adding 'CPUAffinity=0' as well as editing ' /lib/systemd/system/systemd-journald.service ' by adding CPUAffinity=0 but either the effect was little or non existent ( i can't notice a difference). Ceph seems to be ignoring these rules and just running additional journalctl processes.
If anyone can let me know if this is normal and if I should be concerned about this? What bothers me is that at one point the journalctl processes became 15+ which surely can't be normal and stopping/killing them does not seem to resolve the issue.
Thank you for your help in advance.
Basically what the issue is that these processes are dealing with logging the OSD logs:
#ps -o pid,cmd -p 145925
PID CMD
145925 journalctl -u ceph-osd@* --since 11 minutes ago
#strace -p 147260
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 26, 0x25a0000) = 0x7ffa3e1b8000
munmap(0x7ffa3f1b8000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 5, 0x878000) = 0x7ffa3f1b8000
munmap(0x7ffa3b411000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 6, 0x32a1000) = 0x7ffa3b411000
munmap(0x7ffa391d0000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 10, 0x602000) = 0x7ffa391d0000
munmap(0x7ffa3ac11000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 12, 0x2f3a000) = 0x7ffa3ac11000
munmap(0x7ffa381d0000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 12, 0x41bd000) = 0x7ffa381d0000
munmap(0x7ffa3c411000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 15, 0x1c5c000) = 0x7ffa3c411000
munmap(0x7ffa3cc11000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 16, 0x5513000) = 0x7ffa3cc11000
munmap(0x7ffa399d0000, 8388608) = 0
mmap(NULL, 4972544, PROT_READ, MAP_SHARED, 16, 0x7b42000) = 0x7ffa3dcfa000
munmap(0x7ffa3bc11000, 8388608) = 0
mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 17, 0x2c33000) = 0x7ffa3bc11000
munmap(0x7ffa3e1b8000, 8388608) = 0
From what i have gathered journalctl is mapping portions of Ceph OSD log files into memory to efficiently read and process them into logs. This should be a normal process except there seem to be some kind of an issue with it since the CPU seems to be under very heavy load on a cluster which isn't that heavily used.
I reduced the logs for the osd to 0/0
#ceph config get osd debug_osd
0/0
That didn't help at all.
I also tried to limit the CPUs which journalctl can use by editing /etc/systemd/journald.conf and adding 'CPUAffinity=0' as well as editing ' /lib/systemd/system/systemd-journald.service ' by adding CPUAffinity=0 but either the effect was little or non existent ( i can't notice a difference). Ceph seems to be ignoring these rules and just running additional journalctl processes.
If anyone can let me know if this is normal and if I should be concerned about this? What bothers me is that at one point the journalctl processes became 15+ which surely can't be normal and stopping/killing them does not seem to resolve the issue.
Thank you for your help in advance.