Ceph problem when master node is out

Konstantinos Pappas · Jan 10, 2015

somone to have the same problem ?
for your information i follow this guide
http://pve.proxmox.com/wiki/Ceph_Server

is not make sense here, somebody from proxmox team to answer officially????
something do it wrong ? it is problem from proxmox cluster
i don't now what should i do and if not what the problem comes how to pass to my production ?
please any help.
thanks

wahmed · Jan 10, 2015

If your pool replica is at 1, try to change it 2. See if it makes a difference. After you change replica it will go through cluster rebalancing. It is normal.

Konstantinos Pappas · Jan 10, 2015

symmcom said:
If your pool replica is at 1, try to change it 2. See if it makes a difference. After you change replica it will go through cluster rebalancing. It is normal.

Hello Mr wasin

please see the attache image, i think the replica is two, the others is one.

Konstantinos Pappas · Jan 10, 2015

Mr wasin

let me explain again, we have 3 nodes,
1. master demo1 > pvecm create cluster
2. demo2 > pvecm add demo1
3. demo3 > pvecm add demo1

to all nodes - pveceph install, pvecepf createmon etc

practise 1.
for some reason turn of the node 3, then the ceph storage is work ok without any problem.

practise 2.
for some reason hardware failure and node 2 is off is dead. then the ceph storage is work without problem, it means i can have access to vms

practise 3.

lets say for some reason the master node 1 take fire, destroy whaever you want, then the ceph storage is not accessible, that means from node2, or node3 i cant see my data or vms or snapshots etc.

and my question is

something do it wrong or it is proxmox wrong ?

thanks ;-)

udo · Jan 11, 2015

Konstantinos Pappas said:
Mr wasin

let me explain again, we have 3 nodes,
1. master demo1 > pvecm create cluster
2. demo2 > pvecm add demo1
3. demo3 > pvecm add demo1

to all nodes - pveceph install, pvecepf createmon etc

practise 1.
for some reason turn of the node 3, then the ceph storage is work ok without any problem.

practise 2.
for some reason hardware failure and node 2 is off is dead. then the ceph storage is work without problem, it means i can have access to vms

practise 3.

lets say for some reason the master node 1 take fire, destroy whaever you want, then the ceph storage is not accessible, that means from node2, or node3 i cant see my data or vms or snapshots etc.

and my question is

something do it wrong or it is proxmox wrong ?

thanks ;-)

Hi Konstantinos,
on a pve-cluster all nodes are master. It's depends only which nodes have quorum, and which not.

Let see, what's wrong with your installation.

on https://forum.proxmox.com/threads/20700-Ceph-problem-when-master-node-is-out?p=105653#post105653 you post your crushtable (unfortunality without line breaks - please use with https "Settings -> General Settings -> Miscellaneous Options -> Standard Editor; this will preverse your line breaks!).
The weight of the OSDs are 0 - and this can't be right.
A good thing is to use the TB-size as weight, so that smaller disks, if you use disks with different size, are not overloaded.

To see the TB-Size you can use this command:

Code:

 df -k | grep osd | awk '{print $2/(1024^3) }'

For me (with 4TB-Disks and ext4-format) I got the output

Code:

In this case you should do an

Code:

ceph osd crush set 0 3.58 root=default host=demo1
ceph osd crush set 1 3.58 root=default host=demo1
ceph osd crush set 2 3.58 root=default host=demo1
ceph osd crush set 3 3.58 root=default host=demo1

ceph osd crush set 4 3.58 root=default host=demo2
ceph osd crush set 5 3.58 root=default host=demo2
ceph osd crush set 6 3.58 root=default host=demo2
ceph osd crush set 7 3.58 root=default host=demo2

ceph osd crush set 8 3.58 root=default host=demo3
ceph osd crush set 9 3.58 root=default host=demo3
ceph osd crush set 10 3.58 root=default host=demo3
ceph osd crush set 11 3.58 root=default host=demo3

The screenshot shows 3GB are round 1% usage? This mean you have 300GB disks? in this case the weight value is something like 0.28.

But your VM stuck issue sound a little bit different.

Run this on demo2?

Code:

rados -p mystorage bench 60 write

If this work, does the same work with demo1 down?

Udo

udo · Jan 11, 2015

Karl said:
It would be dangerous to have 4! You need an odd number of monitors, with a minimum of 3. If you have an even number then the system can't always be sure if there is a network partition or not.

Hi Karl,
no it's not dangerous, only helpless. Also with 4 mons only one can die! Quorum is mon/2+1 and ceph don't know anything about network partitions... I think you mean the pve-quorum.

Udo

Konstantinos Pappas · Jan 11, 2015

udo said:
Hi Karl,
no it's not dangerous, only helpless. Also with 4 mons only one can die! Quorum is mon/2+1 and ceph don't know anything about network partitions... I think you mean the pve-quorum.

Udo

udo thanks a lot for useful informations,
as i make a deep investigation the results are and help a lot friends here

1. the problem is not come from ceph storage etc.
2. the Quorum belong to the server that create the cluster, i mean pvecm create cluster
3. the other nodes to add pvecm add to node1 , etc does not belongs the Quorum, that's why when shut down demo2 or demo3 node the cluster and ceph storage working well and when demo1 (node1) shut down everything broken.
4. it is necessary to create share disk to split Quorum via network and nodes.

my questions is

can i create via storage lvm share folder and push Quorum here when node1 is down ?
there is example via iSCSI target https://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster
but i am not familiar with that.
also what changes need to be done into the cluster.conf ?

any help appreciated

wahmed · Jan 11, 2015

Konstantinos Pappas said:
1. the problem is not come from ceph storage etc.
2. the Quorum belong to the server that create the cluster, i mean pvecm create cluster
3. the other nodes to add pvecm add to node1 , etc does not belongs the Quorum, that's why when shut down demo2 or demo3 node the cluster and ceph storage working well and when demo1 (node1) shut down everything broken.
4. it is necessary to create share disk to split Quorum via network and nodes.

This may be the reason becasue all ceph configuration and keyrings are on /etc/pve which is a cluster file system. When Proxmox loses quorum that filesystem becomes unavailable thus rendering Ceph configuration inaccessible.

But...didnt you add both demo2 and demo3 to the Proxmox cluster already? From Proxmox GUI dont you see all 3 Proxmox nodes ?

Konstantinos Pappas · Jan 11, 2015

Mr wasin

hello again of course add it, 3 nodes demo1,demo2,demo3
pvecm create cluster for demo1 and demo2,demo3 pvecm add demo1 etc.

it is possible to make the same demo with three nodes and ceph and verified if you have the same results when node1 (demo1> is down ?

help a lot of people i think and needs know before transfer to production .

or someone else to test the same 3 nodes cluster with ceph ?

udo · Jan 11, 2015

Konstantinos Pappas said:
udo thanks a lot for useful informations,
as i make a deep investigation the results are and help a lot friends here

1. the problem is not come from ceph storage etc.

Hi Konstantinos,
does 1. mean, that "rados -p mystorage bench 60 write" work ahen demo1 is down also?

2. the Quorum belong to the server that create the cluster, i mean pvecm create cluster
3. the other nodes to add pvecm add to node1 , etc does not belongs the Quorum, that's why when shut down demo2 or demo3 node the cluster and ceph storage working well and when demo1 (node1) shut down everything broken.

you mixed there something.
PVE with ceph on the same nodes has two different (independet but with influences to each other) quoren.
First pve-cluster - here you can look with

Code:

pvecm nodes
pvecm status

If you have quorum, /etc/pve is writable - e.g. you can do something like "touch /etc/pve/xx; rm /etc/pve/xx"

The quorum in the ceph-cluster can controlled with

Code:

ceph health detail
ceph -s

4. it is necessary to create share disk to split Quorum via network and nodes.

Forget this with three or more nodes! 3 (+) Nodes are fine for an pve-cluster.

If 1. is ok, then I assume that something is wrong with the storage.cfg-entry for your ceph-pool. E.G. if demo1 is the only accessible mon, the VMs can't reach the disks, if demo1 is down!

Please post the output of following commands:

Code:

# on node demo1,2,3
netstat -na | grep 6789

# only from one node
grep -A 5 rbd: /etc/pve/storage.cfg

# also only from one node
cat /etc/ceph/ceph.conf

Udo

udo · Jan 11, 2015

Konstantinos Pappas said:
or someone else to test the same 3 nodes cluster with ceph ?

Hi,
yes for testing purpose (because I have an independet ceph-cluster). Work without trouble!

Udo

Konstantinos Pappas · Jan 11, 2015

hello udo, i post the details tommorow.

the problem focus when node1 (demo1) is down
let me explain.
prepare to create cluster nodes lets say 4 or 5 or 6 whaever
total nodes 6

if down for any reason node2, or node3, etc everything is going well.
right now if node 1 for some reason is down, then start the problems.
i try to understand how proxmox work.
for your information i make test demo with 10 nodes
when node1 shutting down, start the problem.

so the conclusion is never shut down the node1?

udo · Jan 11, 2015

Konstantinos Pappas said:
hello udo, i post the details tommorow.

the problem focus when node1 (demo1) is down
let me explain.
prepare to create cluster nodes lets say 4 or 5 or 6 whaever
total nodes 6

if down for any reason node2, or node3, etc everything is going well.
right now if node 1 for some reason is down, then start the problems.
i try to understand how proxmox work.
for your information i make test demo with 10 nodes
when node1 shutting down, start the problem.

so the conclusion is never shut down the node1?

Hi,
I have understand your issue, but you conclusion is wrong!

Like I wrote before:
all nodes in an pve-cluster (should) have the same votes (for the quorum). So it's makes no difference, if you shutdown the first node or another!

The same is in an ceph-cluster: you can shutdown each mon, if enough mons to hold the quorum are alive.

The same with OSD-Nodes (which are all the same in the pve-ceph-scenario): If one OSD-Node is down (equal which), the data is provided by the remaining OSD-hosts.
On possible design-failure can be an crushmap with failure-domain on osd and not host (than you can have all copies on different OSDs, but there are all on one host!). But this is not your issue, because your ceph -s output before shows that 50% are degraded - the remaing 50% are enough.

Like I wrote before, I assume the issue is in the access of the mons from the remaining nodes... please provide the output from my post before.

Udo

wahmed · Jan 12, 2015

udo said:
If 1. is ok, then I assume that something is wrong with the storage.cfg-entry for your ceph-pool. E.G. if demo1 is the only accessible mon, the VMs can't reach the disks, if demo1 is down!

Please post the output of following commands:

Code:

# on node demo1,2,3 netstat -na | grep 6789 # only from one node grep -A 5 rbd: /etc/pve/storage.cfg # also only from one node cat /etc/ceph/ceph.conf

Udo

I had the suspicion of something like this and asked to confirm the storage.cfg. I think he already added other 2 mons in the storage.cfg.

@Konstantinos
I know 3 nodes Proxmox+Ceph works just fine. I used a 3 node cluster for months before scaling out. You were requested to post crushmap twice. Could you please post it in an easily understandable format? Just take a screenshot of the crushmap from Proxmox GUI. Please post screenshots of following:
1. Crushmap (Proxmox GUI)
2. storage.cfg (CLI)
3. Pool list (Proxmox GUI)
4. ceph.conf (CLI)

Hide any information you dont want public.

Konstantinos Pappas · Jan 12, 2015

hello udo and wasin
below i have the commands that ask me also include two attach file from crushmap and pools
////////////////////////////////////////////////////////////

# demo1 - node1
netstat -na | grep 6789
root@demo1:~# netstat -na | grep 6789
tcp 0 0 192.168.1.201:6789 0.0.0.0:* LISTEN
tcp 0 9 192.168.1.201:6789 192.168.1.202:57155 ESTABLISHED
tcp 0 0 192.168.1.201:6789 192.168.1.203:38122 ESTABLISHED
tcp 0 0 192.168.1.201:52970 192.168.1.202:6789 TIME_WAIT
tcp 0 0 192.168.1.201:52977 192.168.1.202:6789 TIME_WAIT
tcp 0 0 192.168.1.201:52961 192.168.1.202:6789 TIME_WAIT
tcp 0 0 192.168.1.201:60426 192.168.1.203:6789 ESTABLISHED
tcp 0 0 192.168.1.201:52982 192.168.1.202:6789 TIME_WAIT
tcp 0 0 192.168.1.201:37704 192.168.1.201:6789 TIME_WAIT
tcp 0 0 192.168.1.201:6789 192.168.1.202:57278 ESTABLISHED
tcp 0 0 192.168.1.201:37681 192.168.1.201:6789 TIME_WAIT
tcp 0 0 192.168.1.201:52998 192.168.1.202:6789 TIME_WAIT
tcp 0 0 192.168.1.201:52991 192.168.1.202:6789 TIME_WAIT
tcp 0 9 192.168.1.201:6789 192.168.1.203:38101 ESTABLISHED
tcp 0 0 192.168.1.201:60417 192.168.1.203:6789 ESTABLISHED
tcp 0 0 192.168.1.201:52980 192.168.1.202:6789 TIME_WAIT
tcp 0 0 192.168.1.201:32780 192.168.1.203:6789 TIME_WAIT
tcp 0 0 192.168.1.201:37131 192.168.1.201:6789 ESTABLISHED
tcp 0 0 192.168.1.201:52400 192.168.1.202:6789 ESTABLISHED
tcp 0 0 192.168.1.201:6789 192.168.1.201:37131 ESTABLISHED
tcp 0 0 192.168.1.201:6789 192.168.1.203:38169 ESTABLISHED

# demo2 - node2
netstat -na | grep 6789

root@demo2:~# netstat -na | grep 6789
tcp 0 0 192.168.1.202:6789 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.202:57155 192.168.1.201:6789 ESTABLISHED
tcp 0 0 192.168.1.202:6789 192.168.1.203:59066 ESTABLISHED
tcp 0 0 192.168.1.202:6789 192.168.1.203:58837 ESTABLISHED
tcp 0 0 192.168.1.202:57278 192.168.1.201:6789 ESTABLISHED
tcp 0 0 192.168.1.202:6789 192.168.1.203:59106 ESTABLISHED
tcp 0 0 192.168.1.202:40888 192.168.1.203:6789 ESTABLISHED
tcp 0 0 192.168.1.202:40931 192.168.1.203:6789 ESTABLISHED
tcp 0 0 192.168.1.202:6789 192.168.1.201:52400 ESTABLISHED
tcp 0 0 192.168.1.202:40961 192.168.1.203:6789 ESTABLISHED

# demo3 - node3
netstat -na | grep 6789

root@demo3:~# netstat -na | grep 6789
tcp 0 0 192.168.1.203:6789 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.203:59106 192.168.1.202:6789 ESTABLISHED
tcp 0 0 192.168.1.203:6789 192.168.1.202:40961 ESTABLISHED
tcp 0 0 192.168.1.203:38101 192.168.1.201:6789 ESTABLISHED
tcp 0 0 192.168.1.203:59066 192.168.1.202:6789 ESTABLISHED
tcp 0 0 192.168.1.203:38122 192.168.1.201:6789 ESTABLISHED
tcp 0 0 192.168.1.203:6789 192.168.1.202:40931 ESTABLISHED
tcp 0 0 192.168.1.203:58837 192.168.1.202:6789 ESTABLISHED
tcp 0 0 192.168.1.203:6789 192.168.1.201:60426 ESTABLISHED
tcp 0 0 192.168.1.203:6789 192.168.1.202:40888 ESTABLISHED
tcp 0 0 192.168.1.203:38169 192.168.1.201:6789 ESTABLISHED
tcp 0 0 192.168.1.203:6789 192.168.1.201:60417 ESTABLISHED

# only from one node
grep -A 5 rbd: /etc/pve/storage.cfg

root@demo1:~# grep -A 5 rbd: /etc/pve/storage.cfg
rbd: storage
monhost 192.168.1.201;192.168.1.202;192.168.1.203
pool storage
content images
nodes demo3,demo2,demo1
username admin

# also only from one node
cat /etc/ceph/ceph.conf

root@demo1:~# cat /etc/ceph/ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
auth supported = cephx
cluster network = 192.168.1.0/24
filestore xattr use omap = true
fsid = 30d1a422-ba23-4191-9355-1d8609475f3f
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 192.168.1.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.0]
host = demo1
mon addr = 192.168.1.201:6789

[mon.1]
host = demo2
mon addr = 192.168.1.202:6789

[mon.2]
host = demo3
mon addr = 192.168.1.203:6789

Konstantinos Pappas · Jan 12, 2015

here the images
crushmap

pool

udo · Jan 12, 2015

Konstantinos Pappas said:
hello udo and wasin
below i have the commands that ask me also include two attach file from crushmap and pools
////////////////////////////////////////////////////////////

# demo1 - node1
netstat -na | grep 6789
root@demo1:~# netstat -na | grep 6789
tcp 0 0 192.168.1.201:6789 0.0.0.0:* LISTEN
tcp 0 9 192.168.1.201:6789 192.168.1.202:57155 ESTABLISHED
tcp 0 0 192.168.1.201:6789 192.168.1.203:38122 ESTABLISHED
tcp 0 0 192.168.1.201:52970 192.168.1.202:6789 TIME_WAIT
tcp 0 0 192.168.1.201:52977 192.168.1.202:6789 TIME_WAIT
tcp 0 0 192.168.1.201:52961 192.168.1.202:6789 TIME_WAIT
tcp 0 0 192.168.1.201:60426 192.168.1.203:6789 ESTABLISHED
tcp 0 0 192.168.1.201:52982 192.168.1.202:6789 TIME_WAIT
tcp 0 0 192.168.1.201:37704 192.168.1.201:6789 TIME_WAIT
tcp 0 0 192.168.1.201:6789 192.168.1.202:57278 ESTABLISHED
tcp 0 0 192.168.1.201:37681 192.168.1.201:6789 TIME_WAIT
tcp 0 0 192.168.1.201:52998 192.168.1.202:6789 TIME_WAIT
tcp 0 0 192.168.1.201:52991 192.168.1.202:6789 TIME_WAIT
tcp 0 9 192.168.1.201:6789 192.168.1.203:38101 ESTABLISHED
tcp 0 0 192.168.1.201:60417 192.168.1.203:6789 ESTABLISHED
tcp 0 0 192.168.1.201:52980 192.168.1.202:6789 TIME_WAIT
tcp 0 0 192.168.1.201:32780 192.168.1.203:6789 TIME_WAIT
tcp 0 0 192.168.1.201:37131 192.168.1.201:6789 ESTABLISHED
tcp 0 0 192.168.1.201:52400 192.168.1.202:6789 ESTABLISHED
tcp 0 0 192.168.1.201:6789 192.168.1.201:37131 ESTABLISHED
tcp 0 0 192.168.1.201:6789 192.168.1.203:38169 ESTABLISHED

# demo2 - node2
netstat -na | grep 6789

root@demo2:~# netstat -na | grep 6789
tcp 0 0 192.168.1.202:6789 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.202:57155 192.168.1.201:6789 ESTABLISHED
tcp 0 0 192.168.1.202:6789 192.168.1.203:59066 ESTABLISHED
tcp 0 0 192.168.1.202:6789 192.168.1.203:58837 ESTABLISHED
tcp 0 0 192.168.1.202:57278 192.168.1.201:6789 ESTABLISHED
tcp 0 0 192.168.1.202:6789 192.168.1.203:59106 ESTABLISHED
tcp 0 0 192.168.1.202:40888 192.168.1.203:6789 ESTABLISHED
tcp 0 0 192.168.1.202:40931 192.168.1.203:6789 ESTABLISHED
tcp 0 0 192.168.1.202:6789 192.168.1.201:52400 ESTABLISHED
tcp 0 0 192.168.1.202:40961 192.168.1.203:6789 ESTABLISHED

# demo3 - node3
netstat -na | grep 6789

root@demo3:~# netstat -na | grep 6789
tcp 0 0 192.168.1.203:6789 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.203:59106 192.168.1.202:6789 ESTABLISHED
tcp 0 0 192.168.1.203:6789 192.168.1.202:40961 ESTABLISHED
tcp 0 0 192.168.1.203:38101 192.168.1.201:6789 ESTABLISHED
tcp 0 0 192.168.1.203:59066 192.168.1.202:6789 ESTABLISHED
tcp 0 0 192.168.1.203:38122 192.168.1.201:6789 ESTABLISHED
tcp 0 0 192.168.1.203:6789 192.168.1.202:40931 ESTABLISHED
tcp 0 0 192.168.1.203:58837 192.168.1.202:6789 ESTABLISHED
tcp 0 0 192.168.1.203:6789 192.168.1.201:60426 ESTABLISHED
tcp 0 0 192.168.1.203:6789 192.168.1.202:40888 ESTABLISHED
tcp 0 0 192.168.1.203:38169 192.168.1.201:6789 ESTABLISHED
tcp 0 0 192.168.1.203:6789 192.168.1.201:60417 ESTABLISHED

Hi,
fine!

# only from one node
grep -A 5 rbd: /etc/pve/storage.cfg

root@demo1:~# grep -A 5 rbd: /etc/pve/storage.cfg
rbd: storage
monhost 192.168.1.201;192.168.1.202;192.168.1.203
pool storage
content images
nodes demo3,demo2,demo1
username admin

this don't fit. In your screenshot the poolname/storagename is mystorage and here storage??

Does it change anything if you use this monhosts:

Code:

 	monhost 192.168.1.201 192.168.1.202 192.168.1.203

and you don't answer, if an "rados bench" work with demo1 down or not.

In the screenshot of the crushmap shows only the beginning.
Can you do an

Code:

ceph osd crush dump -f json-pretty

Udo

Konstantinos Pappas · Jan 12, 2015

Hello udo thanks a lot mate for the help
i make new fresh installation so the ceph mystorage change to storage

root@demo1:~# ceph osd crush dump -f json-pretty

{ "devices": [
{ "id": 0,
"name": "osd.0"},
{ "id": 1,
"name": "osd.1"},
{ "id": 2,
"name": "osd.2"},
{ "id": 3,
"name": "osd.3"},
{ "id": 4,
"name": "osd.4"},
{ "id": 5,
"name": "osd.5"},
{ "id": 6,
"name": "osd.6"},
{ "id": 7,
"name": "osd.7"},
{ "id": 8,
"name": "osd.8"},
{ "id": 9,
"name": "osd.9"},
{ "id": 10,
"name": "osd.10"},
{ "id": 11,
"name": "osd.11"}],
"types": [
{ "type_id": 0,
"name": "osd"},
{ "type_id": 1,
"name": "host"},
{ "type_id": 2,
"name": "chassis"},
{ "type_id": 3,
"name": "rack"},
{ "type_id": 4,
"name": "row"},
{ "type_id": 5,
"name": "pdu"},
{ "type_id": 6,
"name": "pod"},
{ "type_id": 7,
"name": "room"},
{ "type_id": 8,
"name": "datacenter"},
{ "type_id": 9,
"name": "region"},
{ "type_id": 10,
"name": "root"}],
"buckets": [
{ "id": -1,
"name": "default",
"type_id": 10,
"type_name": "root",
"weight": 0,
"alg": "straw",
"hash": "rjenkins1",
"items": [
{ "id": -2,
"weight": 0,
"pos": 0},
{ "id": -3,
"weight": 0,
"pos": 1},
{ "id": -4,
"weight": 0,
"pos": 2}]},
{ "id": -2,
"name": "demo1",
"type_id": 1,
"type_name": "host",
"weight": 0,
"alg": "straw",
"hash": "rjenkins1",
"items": [
{ "id": 0,
"weight": 0,
"pos": 0},
{ "id": 1,
"weight": 0,
"pos": 1},
{ "id": 2,
"weight": 0,
"pos": 2},
{ "id": 3,
"weight": 0,
"pos": 3}]},
{ "id": -3,
"name": "demo2",
"type_id": 1,
"type_name": "host",
"weight": 0,
"alg": "straw",
"hash": "rjenkins1",
"items": [
{ "id": 4,
"weight": 0,
"pos": 0},
{ "id": 5,
"weight": 0,
"pos": 1},
{ "id": 6,
"weight": 0,
"pos": 2},
{ "id": 7,
"weight": 0,
"pos": 3}]},
{ "id": -4,
"name": "demo3",
"type_id": 1,
"type_name": "host",
"weight": 0,
"alg": "straw",
"hash": "rjenkins1",
"items": [
{ "id": 8,
"weight": 0,
"pos": 0},
{ "id": 9,
"weight": 0,
"pos": 1},
{ "id": 10,
"weight": 0,
"pos": 2},
{ "id": 11,
"weight": 0,
"pos": 3}]}],
"rules": [
{ "rule_id": 0,
"rule_name": "replicated_ruleset",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{ "op": "take",
"item": -1,
"item_name": "default"},
{ "op": "chooseleaf_firstn",
"num": 0,
"type": "host"},
{ "op": "emit"}]}],
"tunables": { "choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"profile": "bobtail",
"optimal_tunables": 0,
"legacy_tunables": 0,
"require_feature_tunables": 1,
"require_feature_tunables2": 1}}

Konstantinos Pappas · Jan 12, 2015

udo for your information i have the same problem, when the node1 is down freeze all
when shutdown node2, or node3 everything all right

pfffffffffff

wahmed · Jan 12, 2015

Hi Konstantinos,
It was not necessary to do fresh installation. You could just delete the storage from Proxmox GUI and reattach it with proper pool name.

Konstantinos Pappas said:
Hello udo thanks a lot mate for the help
i make new fresh installation so the ceph mystorage change to storage

root@demo1:~# ceph osd crush dump -f json-pretty
.........................................
"num": 0,
"type": "host"},
{ "op": "emit"}]}],
"tunables": { "choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"profile": "bobtail",
"optimal_tunables": 0,
"legacy_tunables": 0,
"require_feature_tunables": 1,
"require_feature_tunables2": 1}}

From your crushmap, i am not understanding why your profile says bobtail. If you following Proxmox Ceph installation wiki (http://pve.proxmox.com/wiki/Ceph_Server) you should at least have dumpling, not Bobtail. I also dont see why all your OSD weights are "0". Unless i missed something, no where in your crushmap shows any weight for any OSD.

Are you doing this on a virtual environment or are the nodes actually physical nodes?

We totally understand your issue and what the problem is. Please dont repeat the issue.

Ceph problem when master node is out

New Member

Famous Member

New Member

New Member

Distinguished Member

Distinguished Member

New Member

Famous Member

New Member

Distinguished Member

Distinguished Member

New Member

Distinguished Member

Famous Member

New Member

New Member

Distinguished Member

New Member

New Member

Famous Member

We value your privacy