Ceph volumes inaccessible after setup

nakedhitman

New Member
Jan 5, 2016
1
0
1
36
Hello all,

I am attempting to familiarize myself with Proxmox/Ceph/LXC so that I can hopefully use it to replace my ESXi environment. I have set up a nested virtual environment that should be sufficient and followed the configuration guides for a 3-node cluster and Ceph server, but ultimately am faced with a communication error when I try to use my configured Ceph volume. I do not see any errors in the logs, so I would appreciate any help you can offer. A screenshot gallery of what I am seeing is here: http://imgur.com/a/SsUJA

Below is my configuration:
Code:
##### Ceph Config #####
[global]
    auth client required = cephx
    auth cluster required = cephx
    auth service required = cephx
    cluster network = 10.64.0.0/24
    filestore xattr use omap = true
    fsid = 726980a1-9bf8-4ffa-b714-5efdfeb02d30
    keyring = /etc/pve/priv/$cluster.$name.keyring
    osd journal size = 5120
    osd pool default min size = 1
    public network = 10.64.0.0/24

[osd]
    keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.1]
    host = pve2
    mon addr = 10.64.0.2:6789

[mon.0]
    host = pve1
    mon addr = 10.64.0.1:6789

[mon.2]
    host = pve3
    mon addr = 10.64.0.3:6789

##### Crush Map #####
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host pve1 {
    id -2        # do not change unnecessarily
    # weight 0.960
    alg straw
    hash 0    # rjenkins1
    item osd.0 weight 0.240
    item osd.1 weight 0.240
    item osd.2 weight 0.240
    item osd.3 weight 0.240
}
host pve2 {
    id -3        # do not change unnecessarily
    # weight 0.960
    alg straw
    hash 0    # rjenkins1
    item osd.4 weight 0.240
    item osd.5 weight 0.240
    item osd.6 weight 0.240
    item osd.7 weight 0.240
}
host pve3 {
    id -4        # do not change unnecessarily
    # weight 0.960
    alg straw
    hash 0    # rjenkins1
    item osd.8 weight 0.240
    item osd.9 weight 0.240
    item osd.10 weight 0.240
    item osd.11 weight 0.240
}
root default {
    id -1        # do not change unnecessarily
    # weight 2.880
    alg straw
    hash 0    # rjenkins1
    item pve1 weight 0.960
    item pve2 weight 0.960
    item pve3 weight 0.960
}

# rules
rule replicated_ruleset {
    ruleset 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map

##### Hardware #####
Host 1-2
 - Consumer-class motherboard
 - Intel i5, quad-core Haswell-generation
 - 32GB RAM
 - Intel Pro/1000 dual-port NIC
 - 4G Fibre dual-port HBA (multipathed)
 - VMware ESXi 6.0U1
Host 3
 - Supermicro A1SAM-2750F
 - 32GB RAM
 - Intel Pro/1000 dual-port NIC
 - 4G Fibre dual-port HBA (multipathed)
 - VMware ESXi 6.0U1
Storage
 - Consumer-class motherboard
 - Intel i5, quad-core Haswell-generation
 - FreeNas 9.3
 - 24GB RAM
 - 2x LSI 9211-8i SAS cards
 - 12x 2TB 7200RPM HDD in RAIDZ2
 - 4G Fibre dual-port HBA (multipathed)

Nested Hosts x3
 - vCPU: 4-core, 1-socket
 - vRAM: 8GB
 - 2x NIC - VMXNET3
 - 64GB Disk
 - 4x 256GB Disk
 - Proxmox 4.1-1/2f9650d4

##### Ceph Setup Steps #####
root@pve1:~# history
    1  pveceph install -version hammer
    2  ssh root@10.9.6.2 pveceph install -version hammer
    3  ssh root@10.9.6.3 pveceph install -version hammer
    4  pveceph init --network 10.64.0.0/24
    5  pveceph createmon

root@pve1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

auto vmbr0
iface vmbr0 inet static
        address 10.9.6.1
        netmask 255.255.0.0
        gateway 10.9.8.1
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0

auto eth1
iface eth1 inet static
        address 10.64.0.1
        netmask 255.255.255.0
 
Thats normaly a cause of not having a mon running, or mons not having quorum.

ceph health
ceph health detail

If its the case use
pveceph start mon.0
pveceph start mon.1
pveceph start mon.2


It sometimes also happens where the mons are "overloaded". The symptom to look for in that case is High io wait and/or high load averages.


Another thing that could be the problem is the nested KVM. Especially with 12 OSDs. Make sure you DO NOT run those vDisks from the same physical HDD, split em to 3-4 per physical HDD (especially when you run the journal risiding on the same vDisk as the OSD itself. ) Makes for all kinds of weirdness when trying to evaluate via nested KVM.

ps.: When you try to emulate what "speeds/benchmarks/performance" you can achieve, it is much more realistic to test using a single physical node, with boatloads of OSD's and 2 nested KVM with a single OSD each running on vDisks that reside on separate HDD's.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!