Ceph-OSD syslog messages “… reset not still connected to ...”

Comcrypto

Member
Aug 4, 2021
2
0
6
24
Hello everyone,

We are currently running 7 Proxmox-Servers, all using supermicro mainboards, inside one cluster.
2 servers are used as a Ceph-Backend to store the VM images, so on 3 of the 7 servers Ceph-OSD is installed and running.

The problem we are currently facing is that the syslogs of those 3 servers are filled up with the following messages from ceph:

/var/log/syslog:
Aug 9 04:25:31 scci-hv11 ceph-osd[921008]: 2021-08-09T04:25:31.202+0200 7f5a6e6d3700 -1 reset not still connected to 0x5615cbcc5790 Aug 9 04:25:31 scci-hv11 ceph-osd[921008]: 2021-08-09T04:25:31.202+0200 7f5a6e6d3700 -1 reset not still connected to 0x5615cbdb6b60 Aug 9 04:25:31 scci-hv11 ceph-osd[921008]: 2021-08-09T04:25:31.202+0200 7f5a6e6d3700 -1 reset not still connected to 0x5615cc33edd0 ... Aug 9 04:25:31 scci-hv11 ceph-osd[921008]: 2021-08-09T04:25:31.202+0200 7f5a6e6d3700 -1 reset not still connected to 0x56161314a750 Aug 9 04:25:31 scci-hv11 ceph-osd[921008]: 2021-08-09T04:25:31.202+0200 7f5a6e6d3700 -1 reset not still connected to 0x56161314aa90 Aug 9 04:25:31 scci-hv11 ceph-osd[921008]: 2021-08-09T04:25:31.202+0200 7f5a6e6d3700 -1 reset not still connected to 0x56161b84e000

The timing of the messages doesn’t seem to follow a specific pattern. In one mailing list I read that the messages should occur at most every 15 minutes; This is not the case. We can observe the messages every 2-10 minutes, seemingly random. According to the ceph-sourcecode this message is printed when Ceph attempts to close an unavailable socket.

So the question I am having is, are these messages cause for concern for an bigger underlying problem? And if the messages are just expected debugging messages, how do I turn them off?

Thank you and kind regards!
 
Could you provide the complete syslog?
Is it only this one process, or another one as well?

Please also provide the output of pveversion -v
 
Hello mira,

thank you for the fast reply.
It seems to be only one process of ceph-osd.

The output for pveversion -v is the following. It is identical on all nodes of the cluster:

Code:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_ADDRESS = "de_DE.UTF-8",
    LC_NAME = "de_DE.UTF-8",
    LC_MONETARY = "de_DE.UTF-8",
    LC_PAPER = "de_DE.UTF-8",
    LC_IDENTIFICATION = "de_DE.UTF-8",
    LC_TELEPHONE = "de_DE.UTF-8",
    LC_MEASUREMENT = "de_DE.UTF-8",
    LC_TIME = "de_DE.UTF-8",
    LC_NUMERIC = "de_DE.UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
proxmox-ve: 6.4-1 (running kernel: 5.4.119-1-pve)
pve-manager: 6.4-8 (running version: 6.4-8/185e14db)
pve-kernel-5.4: 6.4-3
pve-kernel-helper: 6.4-3
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 15.2.13-pve1~bpo10
ceph-fuse: 15.2.13-pve1~bpo10
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-6
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1


I attached the logfile from one of the servers, that are used as the ceph-backend and host the VM images.
The log looks OK up until around ~4:00 am and again at around ~7:20am. At first I suspected the messages might be connected to our nightly backups to the PBS, as those finish betweenaround 3:00am to 4:30am, but the messages aren't limited to the backup-duration or the duration shortly after.

Furthermore i attached the first hour (For size reasons) of the syslog of the server that runs the ceph-monitor, which looks far worse.

Thanks for your time, I am looking forward to your insights.

Kind regards.
 

Attachments

Code:
Aug 13 00:22:57 scci-hv22 kernel: [4773221.731923] libceph: osd0 (1)192.168.70.11:6801 socket closed (con state OPEN)
Aug 13 00:22:57 scci-hv22 kernel: [4773222.516912] libceph: osd2 (1)192.168.70.12:6801 socket closed (con state OPEN)
Aug 13 00:22:57 scci-hv22 kernel: [4773222.542061] libceph: osd1 (1)192.168.70.22:6803 socket closed (con state OPEN)

These seem to be common when KRBD is used and typically nothing to worry about. Perhaps disable it and see if it changes anything.
Is there anything in the ceph-osd log for that one? Does the process correspond to osd1?
Code:
Aug 13 00:07:01 scci-hv22 kernel: [4772266.458772] libceph: osd1 (1)192.168.70.22:6803 socket error on write
Or which OSD is it?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!