Ceph OSD Map

psionic · Jan 10, 2020

I have a 4 node ProxMox Cluster with Ceph, 4 OSDs per node.

When I run 'cat /sys/kernel/debug/ceph/*/osdmap' on each node I get the following on 3 of 4 nodes.

epoch 7125 barrier 0 flags 0x588000
pool 1 'Ceph-CT-VM' type 1 size 3 min_size 2 pg_num 256 pg_num_mask 255 flags 0x1 lfor 0 read_tier -1 write_tier -1
pool 4 'test' type 1 size 3 min_size 2 pg_num 64 pg_num_mask 63 flags 0x1 lfor 0 read_tier -1 write_tier -1
osd0 (1)10.10.3.11:6810 100% (exists, up) 100%
osd1 (1)10.10.3.11:6805 100% (exists, up) 100%
osd2 (1)10.10.3.11:6811 100% (exists, up) 100%
osd3 (1)10.10.3.11:6801 100% (exists, up) 100%
osd4 (1)10.10.3.12:6806 100% (exists, up) 100%
osd5 (1)10.10.3.12:6805 100% (exists, up) 100%
osd6 (1)10.10.3.12:6807 100% (exists, up) 100%
osd7 (1)10.10.3.12:6803 100% (exists, up) 100%
osd8 (1)10.10.3.13:6801 100% (exists, up) 100%
osd9 (1)10.10.3.13:6809 100% (exists, up) 100%
osd10 (1)10.10.3.13:6813 100% (exists, up) 100%
osd11 (1)10.10.3.13:6805 100% (exists, up) 100%
osd12 (1)10.10.3.14:6813 100% (exists, up) 100%
osd13 (1)10.10.3.14:6803 100% (exists, up) 100%
osd14 (1)10.10.3.14:6801 100% (exists, up) 100%
osd15 (1)10.10.3.14:6808 100% (exists, up) 100%

On 1 of the 4 nodes I get:
cat: '/sys/kernel/debug/ceph/*/osdmap': No such file or directory

Shouldn't all 4 nodes have the same info? If so, any info on fixing the odd node?

Package Versions:
proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.3.10-1-pve: 5.3.10-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.5-pve1
ceph-fuse: 14.2.5-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 1.2.8-1+pve4
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-15
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-4
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

Alwin · Jan 13, 2020

James Pass said:
Shouldn't all 4 nodes have the same info? If so, any info on fixing the odd node?

Depends on what the fourth node is doing? Does it run any Ceph services? Has it mounted any CephFS or mapped any rbd images?

psionic · Jan 13, 2020

Alwin said:
Depends on what the fourth node is doing? Does it run any Ceph services? Has it mounted any CephFS or mapped any rbd images?

The node in question has an active monitor/manager and has 4 active OSDs.

Alwin · Jan 14, 2020

Then either the kernel client isn't used, debugfs is not activated or there are no slow ops on that node.
https://docs.ceph.com/docs/nautilus/cephfs/troubleshooting/#id1

psionic · Jan 14, 2020

Alwin said:
Then either the kernel client isn't used, debugfs is not activated or there are no slow ops on that node.
https://docs.ceph.com/docs/nautilus/cephfs/troubleshooting/#id1

Thanks Alwin, using this article:
https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/ch09s07.html

Debug Filesystem
A RAM-based filesystem can be used to output a lot of different debugging information. This filesystem is called debugfs and can be enabled by:

Kernel hacking
[*] Debug filesystem

After you enable this option and boot the rebuilt kernel, it creates the directory /sys/kernel/debug as a location for the user to mount the debugfs filesystem. Do this manually by:

$
mount -t debugfs none /sys/kernel/debug

or have the filesystem mounted automatically at boot time by adding the following line to the /etc/fstab file:

debugfs /sys/kernel/debug debugfs 0 0

After you mount debugfs, a large number of different directories and files will turn up in the /sys/kernel/debug/ directory. These are all virtual and dynamically generated by the kernel, like the files in procfs or sysfs. The files can be used to help debug different kernel subsystems, or just perused to see what is happening to the system as it runs.

So I first looked at /etc/fstab for each node and none of them have 'debugfs /sys/kernel/debug debugfs 0 0' present in /etc/fstab.

Then I looked at the /sys/kernel/debug/ directory. There is a 'ceph' folder present in all the nodes except the one with the error.

Good Nodes:
ls /sys/kernel/debug/
acpi cleancache dri gpio kprobes pkg_temp_thermal regulator sunrpc virtio-ports
bdi clear_warn_once dynamic_debug hid kvm pm_genpd resctrl suspend_stats wakeup_sources
block clk error_injection i40e mce pm_qos sched_debug swiotlb x86
btrfs device_component extfrag intel_powerclamp memcg_slabinfo pwm sched_features sync zram
cec devices_deferred fault_around_bytes iosf_sb opp ras sleep_time tracing zswap
ceph dma_buf frontswap ixgbe pinctrl regmap split_huge_pages usb

Error Node:
ls /sys/kernel/debug/
acpi clear_warn_once dynamic_debug hid kvm pm_genpd resctrl suspend_stats wakeup_sources
bdi clk error_injection i40e mce pm_qos sched_debug swiotlb x86
block device_component extfrag intel_powerclamp memcg_slabinfo pwm sched_features sync zram
btrfs devices_deferred fault_around_bytes iosf_sb opp ras sleep_time tracing zswap
cec dma_buf frontswap ixgbe pinctrl regmap split_huge_pages usb
cleancache dri gpio kprobes pkg_temp_thermal regulator sunrpc virtio-ports

Any Ideas how to fix

I also found this article:
https://ceph.com/geen-categorie/see-what-the-ceph-client-sees/

But the ceph.conf is the same for all nodes. So I don't understand why 3 out of 4 are getting debug info. I've had warning/errors from all nodes at one time or another. So must be a particular Ceph setting that is different on the one node compared to the other 3?

Also ran the following command on the error node:
ceph daemon mon.pve12 config get mon_cluster_log_file_level
{
"mon_cluster_log_file_level": "debug"
}
The other 2 nodes with monitors have the same output...

cat /etc/ceph/ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.10.4.11/24
fsid = aff8cf10-e628-4f89-bf4b-01b451d11775
mon_allow_pool_delete = true
mon_host = 10.10.3.13 10.10.3.14 10.10.3.12
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.10.3.11/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

Alwin · Jan 15, 2020

Has it any CephFS mounted or mapped any rbd (container) images? Otherwise no Ceph kernel client is used.

psionic · Jan 15, 2020

Ceph is set up exactly the same on every node except Node #1 has no monitor or manager, the other 3 nodes do have monitor/manager. Node #2 is the one that is getting the error...

Alwin · Jan 16, 2020

If the node is not using the Ceph kernel client (cephfs, container, krbd), no information will be visible.

psionic · Jan 17, 2020

I reinstalled Ceph, not because of this issue. But now I am getting an OSD map on all nodes as long as there are active VM/CT running on node...

Search

Search

Ceph OSD Map

psionic

Member

Alwin

Proxmox Retired Staff

psionic

Member

Alwin

Proxmox Retired Staff

psionic

Member

Alwin

Proxmox Retired Staff

psionic

Member

Alwin

Proxmox Retired Staff

psionic

Member