Trying to get Ceph working

tycoonbob · Nov 17, 2014

udo said:
Hi,
I mean your entrys in the [global]-section - cluster network and public network is the same.

But I think the issue is different. Also in the global-section you have defined the name for the keyring: $cluster.$name.keyring but your keyring is ssd.keyring - also only $name.keyring!
This looks, you must add your cluster-name with dot before!

Udo

I'm not sure I understand what you mean. What entries in the [global]-section? I'm not sure what that means. Ceph should only be talking on 10.10.10.0/24 (10.10.10.1-10.10.10.3 to be specific).

Where are you seeing this stuff about the keyring? I'm not seeing that. Everything should be defaults, and I followed the wiki post on moving over the keyring.

udo · Nov 17, 2014

udo said:
Hi,
I mean your entrys in the [global]-section - cluster network and public network is the same.

But I think the issue is different. Also in the global-section you have defined the name for the keyring: $cluster.$name.keyring but your keyring is ssd.keyring - also only $name.keyring!
This looks, you must add your cluster-name with dot before!

Udo

Hi,
I think I wrote bullshit...

there must be an typo... can you post following output

Code:

grep rbd -A4 /etc/pve/storage.cfg
ls -lsa /etc/ceph/
cat /etc/ceph/keyring
ls -lsa /etc/pve/priv/ceph
cat /etc/pve/priv/ceph/*.keyring

Udo

udo · Nov 17, 2014

tycoonbob said:
I'm not sure I understand what you mean. What entries in the [global]-section? I'm not sure what that means. Ceph should only be talking on 10.10.10.0/24 (10.10.10.1-10.10.10.3 to be specific).

Hi,
Please post

Code:

cat /etc/ceph/ceph.conf

Udo

tycoonbob · Nov 17, 2014

udo said:
Hi,
I think I wrote bullshit...

there must be an typo... can you post following output

Code:

grep rbd -A4 /etc/pve/storage.cfg ls -lsa /etc/ceph/ cat /etc/ceph/keyring ls -lsa /etc/pve/priv/ceph cat /etc/pve/priv/ceph/*.keyring

Udo

Here you go:

Code:

root@hv01:~# grep rbd -A4 /etc/pve/storage.cfg
rbd: ssd
        monhost 10.10.10.1 10.10.10.2 10.10.10.3
        pool ssd
        content images
        username admin
root@hv01:~# ls -lsa /etc/ceph/
total 16
4 drwxr-xr-x  2 root root 4096 Nov 14 15:45 .
4 drwxr-xr-x 86 root root 4096 Nov 17 09:04 ..
4 -rw-------  1 root root   63 Nov 11 20:14 ceph.client.admin.keyring
0 lrwxrwxrwx  1 root root   18 Nov 14 15:45 ceph.conf -> /etc/pve/ceph.conf
4 -rw-r--r--  1 root root   92 Jul 29 14:35 rbdmap
root@hv01:~# cat /etc/ceph/*.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
root@hv01:~# ls -lsa /etc/pve/priv/ceph
total 1
0 drwx------ 2 root www-data  0 Nov 17 08:01 .
0 drwx------ 2 root www-data  0 Sep 24 07:07 ..
1 -rw------- 1 root www-data 63 Nov 17 10:43 ssd.keyring
root@hv01:~# cat /etc/pve/priv/ceph/*.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
root@hv01:~#

udo · Nov 17, 2014

tycoonbob said:

Here you go:

Code:

root@hv01:~# grep rbd -A4 /etc/pve/storage.cfg
rbd: ssd
        monhost 10.10.10.1 10.10.10.2 10.10.10.3
        pool ssd
        content images
        username admin
root@hv01:~# ls -lsa /etc/ceph/
total 16
4 drwxr-xr-x  2 root root 4096 Nov 14 15:45 .
4 drwxr-xr-x 86 root root 4096 Nov 17 09:04 ..
4 -rw-------  1 root root   63 Nov 11 20:14 ceph.client.admin.keyring
0 lrwxrwxrwx  1 root root   18 Nov 14 15:45 ceph.conf -> /etc/pve/ceph.conf
4 -rw-r--r--  1 root root   92 Jul 29 14:35 rbdmap
root@hv01:~# cat /etc/ceph/*.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
root@hv01:~# ls -lsa /etc/pve/priv/ceph
total 1
0 drwx------ 2 root www-data  0 Nov 17 08:01 .
0 drwx------ 2 root www-data  0 Sep 24 07:07 ..
1 -rw------- 1 root www-data 63 Nov 17 10:43 ssd.keyring
root@hv01:~# cat /etc/pve/priv/ceph/*.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
root@hv01:~#

Hi,
don't find the issue...
Perhaps special caracter inside the keyring?
Please post

Code:

cat -A /etc/ceph/ceph.client.admin.keyring
cat -A /etc/pve/priv/ceph/ssd.keyring

What processes are running?

Code:

ps aux | grep ceph

Udo

tycoonbob · Nov 17, 2014

udo said:
Hi,
don't find the issue...
Perhaps special caracter inside the keyring?
Please post

Code:

cat -A /etc/ceph/ceph.client.admin.keyring cat -A /etc/pve/priv/ceph/ssd.keyring

What processes are running?

Code:

ps aux | grep ceph

Udo

Code:

root@hv01:~# cat -A /etc/ceph/ceph.client.admin.keyring
[client.admin]$
^Ikey = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==$
root@hv01:~# cat -A /etc/pve/priv/ceph/ssd.keyring
[client.admin]$
^Ikey = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==$
root@hv01:~# ps aux | grep ceph
root      673519  0.1  0.1 640252 45668 ?        Ssl  09:16   0:34 /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph
root      679100  0.1  0.1 208316 28808 ?        Sl   09:33   0:33 /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph
root      840192  0.0  0.0   7792   948 pts/0    S+   15:59   0:00 grep ceph
root@hv01:~#

udo · Nov 17, 2014

tycoonbob said:

Code:

root@hv01:~# cat -A /etc/ceph/ceph.client.admin.keyring
[client.admin]$
^Ikey = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==$
root@hv01:~# cat -A /etc/pve/priv/ceph/ssd.keyring
[client.admin]$
^Ikey = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==$
root@hv01:~# ps aux | grep ceph
root      673519  0.1  0.1 640252 45668 ?        Ssl  09:16   0:34 /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph
root      679100  0.1  0.1 208316 28808 ?        Sl   09:33   0:33 /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph
root      840192  0.0  0.0   7792   948 pts/0    S+   15:59   0:00 grep ceph
root@hv01:~#

Hi,
looks ok.

Last resort. Have you tried to restart the node hv01 (which you use for the gui?)?

Must go sleep now...

Udo

tycoonbob · Nov 17, 2014

udo said:
Hi,
looks ok.

Last resort. Have you tried to restart the node hv01 (which you use for the gui?)?

Must go sleep now...

Udo

I rebooted all 3 nodes yesterday, and I just rebooted hv01 again and see no change. This sucks...

May be I should just build a RAID10 with these 3 SSDs (buying another, obviously) and using iSCSI.

Heck, I'll buy someone a case of beer if they can help me figure this out!

udo · Nov 18, 2014

tycoonbob said:

Code:

root@hv01:~# ceph osd tree
# id    weight  type name       up/down reweight
-1      1.38    root default
-2      0.46            host hv01
0       0.46                    osd.0   up      1
-3      0.46            host hv02
1       0.46                    osd.1   up      1
-4      0.46            host hv03
2       0.46                    osd.2   up      1
root@hv01:~# ceph health detail
HEALTH_OK
root@hv01:~# ceph -s
    cluster d03c0973-7905-4806-b678-228a532c89a8
     health HEALTH_OK
     monmap e3: 3 mons at {0=10.10.10.1:6789/0,1=10.10.10.2:6789/0,2=10.10.10.3:6789/0}, election epoch 20, quorum 0,1,2 0,1,2
     osdmap e32: 3 osds: 3 up, 3 in
      pgmap v441: 256 pgs, 4 pools, 0 bytes data, 0 objects
            115 MB used, 1415 GB / 1415 GB avail
                 256 active+clean
root@hv01:~#

Hi,
we are looking on the wrong thing! pvesm shows the ssd storage but with 100%.

I assume, that the keyring is ok! But your pgmap is strange: pgmap v441: 256 pgs, 4 pools, 0 bytes data, 0 objects
Normaly there are some data/objects in data/metadata!

Udo

tycoonbob · Nov 18, 2014

udo said:
Hi,
we are looking on the wrong thing! pvesm shows the ssd storage but with 100%.

I assume, that the keyring is ok! But your pgmap is strange: pgmap v441: 256 pgs, 4 pools, 0 bytes data, 0 objects
Normaly there are some data/objects in data/metadata!

This is really my first exposure to Ceph, so I'm not familiar with what that pgmap or pgs mean, or how they work. I have no data on there, so I can definitely blow those osd's away, ceph zap them, or whatever I need to do...I just have no idea.

udo · Nov 18, 2014

tycoonbob said:
This is really my first exposure to Ceph, so I'm not familiar with what that pgmap or pgs mean, or how they work. I have no data on there, so I can definitely blow those osd's away, ceph zap them, or whatever I need to do...I just have no idea.

Hi,
can you create an testdisk on the ssd-cluster with you keyring?

Code:

rbd -c /etc/ceph/ceph.conf -p ssd --keyring /etc/pve/priv/ceph/ssd.keyring --id admin --size 1024 create testdisk
ceph -s
rados -p ssd ls

Udo

tycoonbob · Nov 19, 2014

udo said:
Hi,
can you create an testdisk on the ssd-cluster with you keyring?

Code:

rbd -c /etc/ceph/ceph.conf -p ssd --keyring /etc/pve/priv/ceph/ssd.keyring --id admin --size 1024 create testdisk ceph -s rados -p ssd ls

Udo

Well this is interesting...

Code:

root@hv01:~# rbd -c /etc/ceph/ceph.conf -p ssd --keyring /etc/pve/priv/ceph/ssd.keyring --id admin --size 1024 create testdisk
2014-11-18 21:43:22.099250 7fa0bbece760  0 librados: client.admin authentication error (1) Operation not permitted
rbd: couldn't connect to the cluster!
root@hv01:~# ceph -s
    cluster d03c0973-7905-4806-b678-228a532c89a8
     health HEALTH_OK
     monmap e3: 3 mons at {0=10.10.10.1:6789/0,1=10.10.10.2:6789/0,2=10.10.10.3:6789/0}, election epoch 24, quorum 0,1,2 0,1,2
     osdmap e36: 3 osds: 3 up, 3 in
      pgmap v688: 256 pgs, 4 pools, 0 bytes data, 0 objects
            117 MB used, 1415 GB / 1415 GB avail
                 256 active+clean
root@hv01:~# rados -p ssd ls
root@hv01:~#

So it seems my issue is Ceph related, with how the cluster was configured.

udo · Nov 19, 2014

Hi,
no! I would say it's has something to do with the admin rights.

Do this and post the results:

Code:

ceph auth list
ceph auth get-or-create-key client.ssd mds "allow" mon "allow r" osd "allow rwx pool=ssd"
ceph auth export client.ssd | head -2 > /etc/pve/priv/ceph/ssd.keyring

ceph auth list

Change in /etc/pve/storage.cfg under rbd: ssd the username to ssd !

Does the following run then?

Code:

rbd -c /etc/ceph/ceph.conf -p ssd --keyring /etc/pve/priv/ceph/ssd.keyring --id ssd --size 1024 create testdisk

BTW. I like beer ;-)

Udo

tycoonbob · Nov 19, 2014

udo said:
Hi,
no! I would say it's has something to do with the admin rights.

Do this and post the results:

Code:

ceph auth list ceph auth get-or-create-key client.ssd mds "allow" mon "allow r" osd "allow rwx pool=ssd" ceph auth export client.ssd | head -2 > /etc/pve/priv/ceph/ssd.keyring ceph auth list

Change in /etc/pve/storage.cfg under rbd: ssd the username to ssd !

Does the following run then?

Code:

rbd -c /etc/ceph/ceph.conf -p ssd --keyring /etc/pve/priv/ceph/ssd.keyring --id ssd --size 1024 create testdisk

BTW. I like beer ;-)

Udo

Code:

root@hv01:~# ceph auth list
installed auth entries:


osd.0
        key: AQD9amZUOAVmJRAAqopPQ0dD19xzrrdfdUJfLA==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.1
        key: AQCea2ZUgMQBChAA6luUU6Hc9WTa80xwsQ6ljw==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.2
        key: AQC8a2ZUoDb1HxAA0yOCX/StE9M6dNIsUiQTeQ==
        caps: [mon] allow profile osd
        caps: [osd] allow *
client.admin
        key: AQAAamZUCHrhChAAIS131t7P1celiBRPfXWebQ==
        caps: [mds] allow
        caps: [mon] allow *
        caps: [osd] allow *
client.bootstrap-mds
        key: AQACamZUsCi6EhAABxTUQ3SqEsh327gfGrt0tQ==
        caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
        key: AQACamZUCPWyBBAALSsESce735Uvxw0rXpHSQg==
        caps: [mon] allow profile bootstrap-osd
root@hv01:~# ceph auth get-or-create-key client.ssd mds "allow" mon "allow r" osd "allow rwx pool=ssd"
AQAGp2xUGBXxJxAAlz9k5QII0b5ra6sD9MIGmA==
root@hv01:~# ceph auth export client.ssd | head -2 > /etc/pve/priv/ceph/ssd.keyring
export auth(auid = 18446744073709551615 key=AQAGp2xUGBXxJxAAlz9k5QII0b5ra6sD9MIGmA== with 3 caps)
root@hv01:~# ceph auth list
installed auth entries:


osd.0
        key: AQD9amZUOAVmJRAAqopPQ0dD19xzrrdfdUJfLA==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.1
        key: AQCea2ZUgMQBChAA6luUU6Hc9WTa80xwsQ6ljw==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.2
        key: AQC8a2ZUoDb1HxAA0yOCX/StE9M6dNIsUiQTeQ==
        caps: [mon] allow profile osd
        caps: [osd] allow *
client.admin
        key: AQAAamZUCHrhChAAIS131t7P1celiBRPfXWebQ==
        caps: [mds] allow
        caps: [mon] allow *
        caps: [osd] allow *
client.bootstrap-mds
        key: AQACamZUsCi6EhAABxTUQ3SqEsh327gfGrt0tQ==
        caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
        key: AQACamZUCPWyBBAALSsESce735Uvxw0rXpHSQg==
        caps: [mon] allow profile bootstrap-osd
client.ssd
        key: AQAGp2xUGBXxJxAAlz9k5QII0b5ra6sD9MIGmA==
        caps: [mds] allow
        caps: [mon] allow r
        caps: [osd] allow rwx pool=ssd
root@hv01:~#

Code:

root@hv01:~# cat /etc/pve/storage.cfg
nfs: nus01-nfs_01
        path /mnt/pve/nus01-nfs_01
        server 172.16.1.250
        export /proxmox
        options vers=3
        content images,iso,vztmpl,rootdir,backup
        nodes hv03,hv02,hv01
        maxfiles 2


dir: local
        path /var/lib/vz
        content images,iso,vztmpl,rootdir
        maxfiles 0


rbd: ssd
        monhost 10.10.10.1 10.10.10.2 10.10.10.3
        pool ssd
        content images
        username ssd


root@hv01:~#

Code:

root@hv01:~# rbd -c /etc/ceph/ceph.conf -p ssd --keyring /etc/pve/priv/ceph/ssd.keyring --id ssd --size 1024 create testdisk
root@hv01:~# ceph -s
    cluster d03c0973-7905-4806-b678-228a532c89a8
     health HEALTH_OK
     monmap e3: 3 mons at {0=10.10.10.1:6789/0,1=10.10.10.2:6789/0,2=10.10.10.3:6789/0}, election epoch 24, quorum 0,1,2 0,1,2
     osdmap e36: 3 osds: 3 up, 3 in
      pgmap v689: 256 pgs, 4 pools, 136 bytes data, 2 objects
            117 MB used, 1415 GB / 1415 GB avail
                 256 active+clean
root@hv01:~# rados -p ssd ls
rbd_directory
testdisk.rbd
root@hv01:~#

I was also able to create a new VM on that Ceph volume, and it's imaging out right now. So it seems that my issue was related to permissions, eh? I can't wait to do some Bonnie++ testing between this Ceph volume and my previous NFS share!

PM me your PayPal address.

tycoonbob · Nov 19, 2014

I got to admit, I'm kinda disappointed in my Bonnie++ results on this Ceph volume.

So I've got 2 VM's (CentOS7-NFS, and CentOS7-CEPH) which are identical. 512MB RAM, 2 vCPU, 10GB volume, VirtIO wherever possible, and CentOS 7.

My Ceph volume is 3 Crucial MX100 512GB SSD's, one in each PVE host. My NFS share is over a gigabit link to a server with 7 2TB 7200RPM drives in a hardware RAID 6 (LSI MegaRAID 9261-8i controller). Now the Ceph volume is completely empty and unbusy, other than this one VM. The NFS share has ~15 other active VM's (each with a lowish load, but still active), and this share lives on this RAID 6 array with other storage (mostly media) which I was streaming during these tests. I figured surely the Ceph would do so much better than NFS, but this doesn't appear to be the case.

The command I ran to test with was:
bonnie++ -d /tmp -r 512 -s 2048 -n 512 -u root -x 5 | bon_csv2html

Any ideas or thoughts?

dietmar · Nov 19, 2014

tycoonbob said:
My Ceph volume is 3 Crucial MX100 512GB SSD's, one in each PVE host.

You need much more OSDs to get resonable performance.

tycoonbob · Nov 19, 2014

dietmar said:
You need much more OSDs to get resonable performance.

Even with SSD's?

dietmar · Nov 19, 2014

tycoonbob said:
Even with SSD's?

Yes.

udo · Nov 19, 2014

tycoonbob said:
I got to admit, I'm kinda disappointed in my Bonnie++ results on this Ceph volume.

View attachment 2395

So I've got 2 VM's (CentOS7-NFS, and CentOS7-CEPH) which are identical. 512MB RAM, 2 vCPU, 10GB volume, VirtIO wherever possible, and CentOS 7.

My Ceph volume is 3 Crucial MX100 512GB SSD's, one in each PVE host. My NFS share is over a gigabit link to a server with 7 2TB 7200RPM drives in a hardware RAID 6 (LSI MegaRAID 9261-8i controller). Now the Ceph volume is completely empty and unbusy, other than this one VM. The NFS share has ~15 other active VM's (each with a lowish load, but still active), and this share lives on this RAID 6 array with other storage (mostly media) which I was streaming during these tests. I figured surely the Ceph would do so much better than NFS, but this doesn't appear to be the case.

The command I ran to test with was:
bonnie++ -d /tmp -r 512 -s 2048 -n 512 -u root -x 5 | bon_csv2html

Any ideas or thoughts?

Hi again,
ceph is not very fast with single access :-(

Perhaps you can tune a little bit. But normaly for this is an new thread better?

What kind of filesystem do you use? xfs?
Here are some performance related parameters from my ceph.conf (inode64 for hdds > 1TB)

Code:

[client]
rbd cache = true
rbd cache writethrough until flush = true

[osd]
osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
osd_scrub_load_threshold = 2.5
filestore_max_sync_interval = 10
osd_op_threads = 4
osd_disk_threads = 4

If your switch support jumbo-frames, you should enable jumbo-frames on all ceph-related NICs.

Udo

spirit · Nov 20, 2014

Hi,

Some notes about ceph and ssd:

With firefly, They are some locks osd daemons, so it's don't scale well with multi-core.
So, to have more ios, you need more osd / more disks.

With giant release, it's scale now a lot better

here my ceph.conf tunning for giant

Code:

       debug lockdep = 0/0
        debug context = 0/0
        debug crush = 0/0
        debug buffer = 0/0
        debug timer = 0/0
        debug journaler = 0/0
        debug osd = 0/0
        debug optracker = 0/0
        debug objclass = 0/0
        debug filestore = 0/0
        debug journal = 0/0
        debug ms = 0/0
        debug monc = 0/0
        debug tp = 0/0
        debug auth = 0/0
        debug finisher = 0/0
        debug heartbeatmap = 0/0
        debug perfcounter = 0/0
        debug asok = 0/0
        debug throttle = 0/0


        osd_op_threads = 5
        filestore_op_threads = 4




        osd_op_num_threads_per_shard = 1
        osd_op_num_shards = 25
        filestore_fd_cache_size = 64
        filestore_fd_cache_shards = 32


        ms_nocrc = true
        ms_dispatch_throttle_bytes = 0


        cephx sign messages = false
        cephx require signatures = false


[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring
         osd_client_message_size_cap = 0
         osd_client_message_cap = 0
         osd_enable_op_tracker = false

Note that if you need ios (not bandwidth), you need some powerfull cpus for your osd node.

Another point, about writes, some consumer ssd are prety slow with dsync write (which is need for journal)
Check this blog to test them
http://www.sebastien-han.fr/blog/20...-if-your-ssd-is-suitable-as-a-journal-device/

And last thing, they are also bottlenecks inside qemu currently.
So I think you should be able to reach around 12000-20000iops with 1 vm, but not more. (but more vms, more ios)

Trying to get Ceph working

Member

Distinguished Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Distinguished Member

Distinguished Member

We value your privacy