Ceph problem

anton.chasnyk

New Member
Jul 12, 2016
15
0
1
35
I installed proxmox4.4 and ceph jewel, according by manual, but i got a problem with my OSDs.
If I reboot first node, OSDs on this node still online and my VMs are suspended

size/min - 2/1
pg num - 128

OSDs:
node1
- osd.0
- osd.1
node 2
- osd.3
- osd.4
node 3
- no OSDs


In debug message I found this:
Code:
[    3.391884] systemd[1]: [/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
[    3.392414] systemd[1]: [/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
[    3.393806] systemd[1]: [/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' in section 'Service'


UPD: On both two nodes with OSDs I got this problem. (I have total three nodes, one node without OSDs)

UPD: Ceph Config:
Code:
[global]
     auth client required = cephx
     auth cluster required = cephx
     auth service required = cephx
     cluster network = 192.168.87.0/24
     filestore xattr use omap = true
     fsid = 5ee20a5b-eeb2-4bcb-86a5-57cd9421c0d9
     keyring = /etc/pve/priv/$cluster.$name.keyring
     osd journal size = 5120
     osd pool default min size = 1
     public network = 192.168.87.0/24

[osd]
     keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.1]
     host = node1
     mon addr = 192.168.87.15:6789

[mon.2]
     host = node2
     mon addr = 192.168.87.13:6789

[mon.0]
     host = node3
     mon addr = 192.168.87.14:6789

Code:
#
begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host node1 {
         id -2 # do not change unnecessarily
         # weight 1.851
         alg straw hash 0 # rjenkins1
         item osd.0 weight 0.926
         item osd.1 weight 0.926
}

host node2 {
         id -3 # do not change unnecessarily
         # weight 1.851 alg straw
         hash 0 # rjenkins1
         item osd.2 weight 0.926
         item osd.3 weight 0.926
}

root default {
          id -1 # do not change unnecessarily
          # weight 3.703
          alg straw
          hash 0 # rjenkins1
          item h2 weight 1.851
          item h1 weight 1.851
}

# rules
rule replicated_ruleset {
          ruleset 0
          type replicated
          min_size 1
          max_size 10
          step take default step chooseleaf firstn 0 type host 
          step emit
} # end crush map
 
Last edited:
Hi,
is your ceph-cluster healthy?
What is the output of
Code:
ceph -s
Code:
osd pool default min size = 1
"osd pool default min size = 1" is dangerous!

It's highly recommended to use 3 replica - so you must have on min. three hosts OSDs.

How look your pools?
Code:
for i in `ceph osd lspools | tr -d ",[0-9]"`
  do
    ceph osd dump | grep \'$i\'
done
Udo
 
Hi,
is your ceph-cluster healthy?
What is the output of
Code:
ceph -s

"osd pool default min size = 1" is dangerous!

It's highly recommended to use 3 replica - so you must have on min. three hosts OSDs.

How look your pools?
Code:
for i in `ceph osd lspools | tr -d ",[0-9]"`
  do
    ceph osd dump | grep \'$i\'
done
Udo

ceph -s
Code:
 cluster 5ee20a5b-eeb2-4bcb-86a5-57cd9421c0d9
     health HEALTH_OK
     monmap e3: 3 mons at {0=192.168.87.14:6789/0,1=192.168.87.15:6789/0,2=192.168.87.13:6789/0}
            election epoch 30, quorum 0,1,2 2,0,1
     osdmap e61: 4 osds: 4 up, 4 in
            flags sortbitwise,require_jewel_osds
      pgmap v176: 128 pgs, 1 pools, 0 bytes data, 0 objects
            138 MB used, 3791 GB / 3791 GB avail
                 128 active+clean

My pool looks like this :
Code:
pool 4 'rbd'
replicated size 2
min_size 1
crush_ruleset 0
object_hash rjenkins
pg_num 128
pgp_num 128
last_change 61
flags hashpspool
stripe_width 0

"osd pool default min size = 1" is dangerous!
I will fix it soon, thx
 
Hi,
but you need three osd-hosts, replica=3 to archive with "osd pool default min size = 2" an secure behavior, where one node can go down without interuption.

Udo
I recreate my pool:
Code:
pool 8 'rbd'
replicated size 3
min_size 2
crush_ruleset 0
object_hash rjenkins
pg_num 128
pgp_num 128
last_change 167
flags hashpspool
stripe_width 0
and setup OSDs this like:
OSDs:
node1
- osd.0
- osd.1
node 2
- osd.3
node 3
- osd.4

In this configuration cluster working fine. Thanks for help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!