Hello
I test my new ceph installation with 3 nodes and 3 OSDs on each node.
I have an other proxmox cluster with 1 Windows VM with disk mapped on ceph.
When I stop 1 ceph node, there is near 1 minute before the 3 OSDs goes down (I think it's normal).
The problem is that the disk access in VM are blocked due to IO latency (i.e. apply latency in Proxmox GUI) before OSDs are marked down, for 1 minute.
How resolve this freeze of the VM ?
My Ceph configuration :
- Proxmox 3.3-5
- CEPH Giant 0.87-1
OSD Tree :
Crush map :
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"profile": "bobtail",
"optimal_tunables": 0,
"legacy_tunables": 0,
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"require_feature_tunables3": 0,
"has_v2_rules": 0,
"has_v3_rules": 0}}
[/CODE]
ceph.conf :
Thanks.
Best regards
I test my new ceph installation with 3 nodes and 3 OSDs on each node.
I have an other proxmox cluster with 1 Windows VM with disk mapped on ceph.
When I stop 1 ceph node, there is near 1 minute before the 3 OSDs goes down (I think it's normal).
The problem is that the disk access in VM are blocked due to IO latency (i.e. apply latency in Proxmox GUI) before OSDs are marked down, for 1 minute.
How resolve this freeze of the VM ?
My Ceph configuration :
- Proxmox 3.3-5
- CEPH Giant 0.87-1
OSD Tree :
Code:
# ceph osd tree
# id weight type name up/down reweight
-1 32.76 root default
-2 10.92 host ceph01
0 3.64 osd.0 up 1
2 3.64 osd.2 up 1
1 3.64 osd.1 up 1
-3 10.92 host ceph02
3 3.64 osd.3 up 1
4 3.64 osd.4 up 1
5 3.64 osd.5 up 1
-4 10.92 host ceph03
6 3.64 osd.6 up 1
7 3.64 osd.7 up 1
8 3.64 osd.8 up 1
Crush map :
Code:
# ceph osd crush dump
{ "devices": [
{ "id": 0,
"name": "osd.0"},
{ "id": 1,
"name": "osd.1"},
{ "id": 2,
"name": "osd.2"},
{ "id": 3,
"name": "osd.3"},
{ "id": 4,
"name": "osd.4"},
{ "id": 5,
"name": "osd.5"},
{ "id": 6,
"name": "osd.6"},
{ "id": 7,
"name": "osd.7"},
{ "id": 8,
"name": "osd.8"}],
"types": [
{ "type_id": 0,
"name": "osd"},
{ "type_id": 1,
"name": "host"},
{ "type_id": 2,
"name": "chassis"},
{ "type_id": 3,
"name": "rack"},
{ "type_id": 4,
"name": "row"},
{ "type_id": 5,
"name": "pdu"},
{ "type_id": 6,
"name": "pod"},
{ "type_id": 7,
"name": "room"},
{ "type_id": 8,
"name": "datacenter"},
{ "type_id": 9,
"name": "region"},
{ "type_id": 10,
"name": "root"}],
"buckets": [
{ "id": -1,
"name": "default",
"type_id": 10,
"type_name": "root",
"weight": 2146959,
"alg": "straw",
"hash": "rjenkins1",
"items": [
{ "id": -2,
"weight": 715653,
"pos": 0},
{ "id": -3,
"weight": 715653,
"pos": 1},
{ "id": -4,
"weight": 715653,
"pos": 2}]},
{ "id": -2,
"name": "ceph01",
"type_id": 1,
"type_name": "host",
"weight": 715653,
"alg": "straw",
"hash": "rjenkins1",
"items": [
{ "id": 0,
"weight": 238551,
"pos": 0},
{ "id": 2,
"weight": 238551,
"pos": 1},
{ "id": 1,
"weight": 238551,
"pos": 2}]},
{ "id": -3,
"name": "ceph02",
"type_id": 1,
"type_name": "host",
"weight": 715653,
"alg": "straw",
"hash": "rjenkins1",
"items": [
{ "id": 3,
"weight": 238551,
"pos": 0},
{ "id": 4,
"weight": 238551,
"pos": 1},
{ "id": 5,
"weight": 238551,
"pos": 2}]},
{ "id": -4,
"name": "ceph03",
"type_id": 1,
"type_name": "host",
"weight": 715653,
"alg": "straw",
"hash": "rjenkins1",
"items": [
{ "id": 6,
"weight": 238551,
"pos": 0},
{ "id": 7,
"weight": 238551,
"pos": 1},
{ "id": 8,
"weight": 238551,
"pos": 2}]}],
"rules": [
{ "rule_id": 0,
"rule_name": "replicated_ruleset",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{ "op": "take",
"item": -1,
"item_name": "default"},
{ "op": "chooseleaf_firstn",
"num": 0,
"type": "host"},
{ "op": "emit"}]}],
"tunables": { "choose_local_tries": 0,[CODE]
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"profile": "bobtail",
"optimal_tunables": 0,
"legacy_tunables": 0,
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"require_feature_tunables3": 0,
"has_v2_rules": 0,
"has_v3_rules": 0}}
[/CODE]
ceph.conf :
Code:
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
auth supported = cephx
cluster network = 10.10.1.0/24
filestore xattr use omap = true
fsid = 2dbbec32-a464-4bc5-bb2b-983695d1d0c6
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 192.168.80.0/24
mon osd down out subtree limit = host
osd max backfills = 1
osd recovery max active = 1
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
[mon.4]
host = ceph05
mon addr = 192.168.80.45:6789
[mon.0]
host = ceph01
mon addr = 192.168.80.41:6789
[mon.1]
host = ceph02
mon addr = 192.168.80.42:6789
[mon.3]
host = ceph04
mon addr = 192.168.80.44:6789
[mon.2]
host = ceph03
mon addr = 192.168.80.43:6789
Thanks.
Best regards