Hi,
Name's Erik and I am a longtime lurker but this being the first time I could not resolve an issue using documentation or this forum I thought I'd register and ask.
I have a hyperconverged 3-node cluster with ceph that was recently upgraded from 4.x with ceph Hammer to 6.3 and ceph Nautilus, following the PVE upgrade guides and. I have to say that PVE has been nothing but impressive when it comes to upgrades. Compliments to the software and to the docs!
The snag that has hit me this time is that, while everything still works OK, is that the /tmp directory has started to fill up with ceph admin socket files that seem to belong to processes that no longer exist. I caught on to the problem because monitoring alerted about / running out of inodes.
After some digging it seems like it is the pvestatd process that forks a subprocess (which is renamed to "pverados" which creates these files and does bind() and listen() to the socket but does not delete them when the child process exits. Strace shows no attempt to unlink() them as far as I can tell.
There is one such file created every two seconds or so and scripted cleanup of the files is a bit risky as there are legit ceph admin sockets with the same naming structure, such as the ones being used by running vms.
Versions are:
pve-manager/6.3-3/eee5f901 (running kernel: 5.4.78-2-pve)
ceph 14.2.16-pve1
Anyone who might have an idea on what might be wrong?
Best regards,
/Erik
root@pve2:~# ls -ltr /tmp/ | grep ceph-client | tail -10
srwxr-xr-x 1 root root 0 Feb 18 18:03 ceph-client.admin.1299363.93951616497296.asok
srwxr-xr-x 1 root root 0 Feb 18 18:03 ceph-client.admin.1299383.93951616497296.asok
srwxr-xr-x 1 root root 0 Feb 18 18:03 ceph-client.admin.1299410.93951687626256.asok
srwxr-xr-x 1 root root 0 Feb 18 18:03 ceph-client.admin.1299440.93932732736024.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299474.93951687628944.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299518.93951687626256.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299538.93951688944176.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299570.93932732730872.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299597.93951616503840.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299618.93951616503600.asok
root@pve2:~# ls -ltr /tmp/ | grep ceph-client | tail -10
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299518.93951687626256.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299538.93951688944176.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299570.93932732730872.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299597.93951616503840.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299618.93951616503600.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299637.93951687626784.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299671.93932732737768.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299696.93951616503120.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299717.93951616503088.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299736.93951688951504.asok
root@pve2:~# ls -ltr /tmp/ | grep ceph-client | wc -l
349152
Name's Erik and I am a longtime lurker but this being the first time I could not resolve an issue using documentation or this forum I thought I'd register and ask.
I have a hyperconverged 3-node cluster with ceph that was recently upgraded from 4.x with ceph Hammer to 6.3 and ceph Nautilus, following the PVE upgrade guides and. I have to say that PVE has been nothing but impressive when it comes to upgrades. Compliments to the software and to the docs!
The snag that has hit me this time is that, while everything still works OK, is that the /tmp directory has started to fill up with ceph admin socket files that seem to belong to processes that no longer exist. I caught on to the problem because monitoring alerted about / running out of inodes.
After some digging it seems like it is the pvestatd process that forks a subprocess (which is renamed to "pverados" which creates these files and does bind() and listen() to the socket but does not delete them when the child process exits. Strace shows no attempt to unlink() them as far as I can tell.
There is one such file created every two seconds or so and scripted cleanup of the files is a bit risky as there are legit ceph admin sockets with the same naming structure, such as the ones being used by running vms.
Versions are:
pve-manager/6.3-3/eee5f901 (running kernel: 5.4.78-2-pve)
ceph 14.2.16-pve1
Anyone who might have an idea on what might be wrong?
Best regards,
/Erik
root@pve2:~# ls -ltr /tmp/ | grep ceph-client | tail -10
srwxr-xr-x 1 root root 0 Feb 18 18:03 ceph-client.admin.1299363.93951616497296.asok
srwxr-xr-x 1 root root 0 Feb 18 18:03 ceph-client.admin.1299383.93951616497296.asok
srwxr-xr-x 1 root root 0 Feb 18 18:03 ceph-client.admin.1299410.93951687626256.asok
srwxr-xr-x 1 root root 0 Feb 18 18:03 ceph-client.admin.1299440.93932732736024.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299474.93951687628944.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299518.93951687626256.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299538.93951688944176.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299570.93932732730872.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299597.93951616503840.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299618.93951616503600.asok
root@pve2:~# ls -ltr /tmp/ | grep ceph-client | tail -10
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299518.93951687626256.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299538.93951688944176.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299570.93932732730872.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299597.93951616503840.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299618.93951616503600.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299637.93951687626784.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299671.93932732737768.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299696.93951616503120.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299717.93951616503088.asok
srwxr-xr-x 1 root root 0 Feb 18 18:04 ceph-client.admin.1299736.93951688951504.asok
root@pve2:~# ls -ltr /tmp/ | grep ceph-client | wc -l
349152