Ceph-OSD syslog messages “… reset not still connected to ...”

Comcrypto · Aug 12, 2021

Hello everyone,

We are currently running 7 Proxmox-Servers, all using supermicro mainboards, inside one cluster.
2 servers are used as a Ceph-Backend to store the VM images, so on 3 of the 7 servers Ceph-OSD is installed and running.

The problem we are currently facing is that the syslogs of those 3 servers are filled up with the following messages from ceph:

/var/log/syslog:

Aug 9 04:25:31 scci-hv11 ceph-osd[921008]: 2021-08-09T04:25:31.202+0200 7f5a6e6d3700 -1 reset not still connected to 0x5615cbcc5790
Aug 9 04:25:31 scci-hv11 ceph-osd[921008]: 2021-08-09T04:25:31.202+0200 7f5a6e6d3700 -1 reset not still connected to 0x5615cbdb6b60
Aug 9 04:25:31 scci-hv11 ceph-osd[921008]: 2021-08-09T04:25:31.202+0200 7f5a6e6d3700 -1 reset not still connected to 0x5615cc33edd0
...
Aug 9 04:25:31 scci-hv11 ceph-osd[921008]: 2021-08-09T04:25:31.202+0200 7f5a6e6d3700 -1 reset not still connected to 0x56161314a750
Aug 9 04:25:31 scci-hv11 ceph-osd[921008]: 2021-08-09T04:25:31.202+0200 7f5a6e6d3700 -1 reset not still connected to 0x56161314aa90
Aug 9 04:25:31 scci-hv11 ceph-osd[921008]: 2021-08-09T04:25:31.202+0200 7f5a6e6d3700 -1 reset not still connected to 0x56161b84e000

The timing of the messages doesn’t seem to follow a specific pattern. In one mailing list I read that the messages should occur at most every 15 minutes; This is not the case. We can observe the messages every 2-10 minutes, seemingly random. According to the ceph-sourcecode this message is printed when Ceph attempts to close an unavailable socket.

So the question I am having is, are these messages cause for concern for an bigger underlying problem? And if the messages are just expected debugging messages, how do I turn them off?

Thank you and kind regards!

mira · Aug 12, 2021

Could you provide the complete syslog?
Is it only this one process, or another one as well?

Please also provide the output of pveversion -v

Comcrypto · Aug 13, 2021

Hello mira,

thank you for the fast reply.
It seems to be only one process of ceph-osd.

The output for pveversion -v is the following. It is identical on all nodes of the cluster:

Code:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_ADDRESS = "de_DE.UTF-8",
    LC_NAME = "de_DE.UTF-8",
    LC_MONETARY = "de_DE.UTF-8",
    LC_PAPER = "de_DE.UTF-8",
    LC_IDENTIFICATION = "de_DE.UTF-8",
    LC_TELEPHONE = "de_DE.UTF-8",
    LC_MEASUREMENT = "de_DE.UTF-8",
    LC_TIME = "de_DE.UTF-8",
    LC_NUMERIC = "de_DE.UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
proxmox-ve: 6.4-1 (running kernel: 5.4.119-1-pve)
pve-manager: 6.4-8 (running version: 6.4-8/185e14db)
pve-kernel-5.4: 6.4-3
pve-kernel-helper: 6.4-3
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 15.2.13-pve1~bpo10
ceph-fuse: 15.2.13-pve1~bpo10
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-6
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

I attached the logfile from one of the servers, that are used as the ceph-backend and host the VM images.
The log looks OK up until around ~4:00 am and again at around ~7:20am. At first I suspected the messages might be connected to our nightly backups to the PBS, as those finish betweenaround 3:00am to 4:30am, but the messages aren't limited to the backup-duration or the duration shortly after.

Furthermore i attached the first hour (For size reasons) of the syslog of the server that runs the ceph-monitor, which looks far worse.

Thanks for your time, I am looking forward to your insights.

Kind regards.

mira · Aug 13, 2021

Code:

Aug 13 00:22:57 scci-hv22 kernel: [4773221.731923] libceph: osd0 (1)192.168.70.11:6801 socket closed (con state OPEN)
Aug 13 00:22:57 scci-hv22 kernel: [4773222.516912] libceph: osd2 (1)192.168.70.12:6801 socket closed (con state OPEN)
Aug 13 00:22:57 scci-hv22 kernel: [4773222.542061] libceph: osd1 (1)192.168.70.22:6803 socket closed (con state OPEN)

These seem to be common when KRBD is used and typically nothing to worry about. Perhaps disable it and see if it changes anything.
Is there anything in the ceph-osd log for that one? Does the process correspond to osd1?

Code:

Aug 13 00:07:01 scci-hv22 kernel: [4772266.458772] libceph: osd1 (1)192.168.70.22:6803 socket error on write

Or which OSD is it?

Search

Search

Ceph-OSD syslog messages “… reset not still connected to ...”

Comcrypto

Member

mira

Proxmox Staff Member

Comcrypto

Member

Attachments

mira

Proxmox Staff Member