[SOLVED] Issues with Ceph on new installs

d2600hz

New Member
Mar 12, 2024
17
3
3
HI there,

We're evaluating Proxmox as a virtual environment solution, and I've run up 3 x nodes in 3 different datacenters, and by and large we're getting the same errors on all datacenters with Ceph (HEALTH WARN) - mainly creating+peering

I've trawled through a lot of documentation, so hoping someone may have some ideas on where I'm going wrong. By and large I'm doing a default install of Proxmox.

Configuration is as follows:

- 3 x identical servers with 6 x SSD (4 x SSD for Ceph)
- 1 gig public / 10G private network - MTU 9000 on 10G network. Public network is tagged vlan on vmbr.xxx with bond0 as underlying. bond1 is dedicated as an access port to cluster network.
- Created mon/mgr on all nodes
- Created OSD on all disks
- Created pool with defaults

So in the first two DC's I have varying issues - but DC 1 has the following:

Code:
root@pve002-ewr:~# ceph -s
  cluster:
    health: HEALTH_WARN
            Reduced data availability: 23 pgs inactive, 22 pgs peering
            24 slow ops, oldest one blocked for 4219 sec, daemons [osd.5,osd.7] have slow ops.
 
  services:
    mon: 3 daemons, quorum pve002-ewr,pve003-ewr,pve004-ewr (age 2d)
    mgr: pve003-ewr(active, since 2d), standbys: pve002-ewr, pve004-ewr
    osd: 12 osds: 12 up (since 67m), 12 in (since 67m); 1 remapped pgs
 
  data:
    pools:   2 pools, 129 pgs
    objects: 2 objects, 128 KiB
    usage:   325 MiB used, 21 TiB / 21 TiB avail
    pgs:     17.829% pgs not active
             106 active+clean
             21  creating+peering
             1   creating+activating
             1   remapped+peering

And DC3 is in an even worse state:

Code:
root@pve002-sjc:~# ceph -s
  cluster:
    health: HEALTH_WARN
            Reduced data availability: 129 pgs inactive, 129 pgs peering
            258 slow ops, oldest one blocked for 274222 sec, daemons [osd.0,osd.3,osd.9] have slow ops.
 
  services:
    mon: 3 daemons, quorum pve002-sjc,pve004-sjc,pve003-sjc (age 3d)
    mgr: pve004-sjc(active, since 3d), standbys: pve002-sjc, pve003-sjc
    osd: 12 osds: 12 up (since 3d), 12 in (since 3d)
 
  data:
    pools:   2 pools, 129 pgs
    objects: 0 objects, 0 B
    usage:   338 MiB used, 21 TiB / 21 TiB avail
    pgs:     100.000% pgs not active
             129 creating+peering


What am I doing wrong? I must be missing a trick somewhere, but I can't find any reason for it. Any help would be appreciated!!! I've search through the dumps, followed a lot of guides, but can't work out what I'm doing wrong here.
 
Last edited:
The SSD's are Dell 1.92TB ones, is that what you meant?


Code:
=== START OF INFORMATION SECTION ===
Device Model:     SSDSC2KG019TZR
Serial Number:    PHYJ325100QS1P9CGN
LU WWN Device Id: 5 5cd2e4 15647331b
Add. Product Id:  DELL(tm)
Firmware Version: 7CV1DL74
User Capacity:    1,920,383,410,176 bytes [1.92 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        Not in smartctl database 7.3/5319
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Mar 25 19:00:29 2024 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
Last edited:
Yeah,the disks are okay then. Only thing i can think of,is that there is some problem with ceph network and bonding or something like that
 
Ceph.conf:

Code:
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.10.250.2/24
         fsid = e92f5533-311d-4059-bf18-2ff413e866ab
         mon_allow_pool_delete = true
         mon_host = xx.xx.209.212 xx.xx.209.213 xx.xx.209.214
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = xx.xx.209.212/27

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve002-ewr]
         public_addr = xx.xx.209.212

[mon.pve003-ewr]
         public_addr = xx.xx.209.213

[mon.pve004-ewr]
         public_addr = xx.xx.209.214

cluster_network: bond1 - 10G LAG access port
public_network: vmbr.250 (tagged VLAN for management)

Code:
auto lo
iface lo inet loopback

auto bond0
iface bond0 inet manual
        bond-slaves eno8303 eno8403
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer3+4
        bond-lacp-rate 1

auto bond1
iface bond1 inet static
        address 10.10.250.2/24
        bond-slaves eno12399np0 eno12409np1
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer3+4
        bond-lacp-rate 1
        mtu 9000

auto vmbr0
iface vmbr0 inet manual
        bridge-ports bond0
        bridge-stp on
        bridge-vlan-aware yes
        bridge-vids 2-4094

auto vmbr0.250
iface vmbr0.250 inet static
        address xx.xx.213.213/27
        gateway xx.xx.213.193

iface eno8303 inet manual

iface eno8403 inet manual

iface eno12399np0 inet manual

iface eno12409np1 inet manual


source /etc/network/interfaces.d/*
 
Last edited:
I'm about to give up on Ceph. In fact probably give up on proxmox.

These are production level boxes, with a normal setup, and I cannot for the life of me get this working.

I rebuilt DC3 yesterday and I'm still getting the same kind of result:

1711502582078.png

It's still very hard to work out what's wrong, where to look what's wrong and how to fix it. I know the fix would be to buy a subscription, but the whole point of trying a product is to deem it's production worthy!

Sorry, frustrated and annoyed as I've wasted a few days, building / re-building and google is not helping me.
 
For info - the hash type is very important. While I was debugging I noticed some errors on the switch in regards to LACP that only turned up in a horribly verbose mode.

Doesn't happen with Juniper switches which are in our lab. Sorry for the spamming, but sorted.