Trying to get Ceph working

tycoonbob · Nov 14, 2014

So I've got a 3 node Proxmox cluster, that I'm trying to setup Ceph on. I have a single 512GB SSD in each to use as osd's, but running into problems.

Code:

root@hv01:~# ceph health
HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds

This is the primary thing that I don't understand whats going on. I've added osd's and have removed them, so right now I have nothing showing up under the OSD tab, but under the Disks tab, I see my volume with the Usage column set to osd.0 (osd.1, and osd.2). Essentially, I want to wipe everything relating to this Ceph config and start over.

I've tried to clean things and restart, but I don't think I'm doing it right...or I just don't understand whats going on.

Code:

root@hv01:~# pveceph purge
root@hv01:~# pveceph init --network 10.10.10.0/24
root@hv01:~# pveceph createmon
creating /etc/pve/priv/ceph.client.admin.keyring
monmaptool: monmap file /tmp/monmap
monmaptool: generated fsid 7e8e3159-42cf-46ff-9b0f-27343a8128fd
epoch 0
fsid 7e8e3159-42cf-46ff-9b0f-27343a8128fd
last_changed 2014-11-14 14:51:57.913833
created 2014-11-14 14:51:57.913833
0: 10.10.10.1:6789/0 mon.0
monmaptool: writing epoch 0 to /tmp/monmap (1 monitors)
ceph-mon: set fsid to fe1edd77-bb21-421d-8fe6-1e92653774d9
ceph-mon: created monfs at /var/lib/ceph/mon/ceph-0 for mon.0
=== mon.0 ===
Starting Ceph mon.0 on hv01...
Starting ceph-create-keys on hv01...
root@hv01:~# ceph health
HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds

How can I clear the health warning above, and "unstuck" those pgs? How can I completely wipe those Disks to set up again?

Thanks!

tycoonbob · Nov 17, 2014

Well I've managed to wipe things and restart with my Ceph setup. I have Ceph setup, and I created a new pool, but I am having trouble adding it as a storage volume. I have a pool called "ssd_01", size/min is "3/2", pg_num is "150".

Under storage I've added:
ID: ceph-ssd_01
Pool: ssd_01
Monitor Host: 10.10.10.1:6789 10.10.10.2:6789 10.10.10.3:6789
User name: admin
Enabled: check

When I go to that volume under the storage on one of my hosts, I seen enabled and active is "Yes". If I click the Content tab, I get "rbd error: rbd: couldn't connect to the cluster! (500)". Any idea what I did wrong? Where could I find the log to help me out here?

spirit · Nov 17, 2014

Monitor Host: 10.10.10.1:6789 10.10.10.2:6789 10.10.10.3:6789

please read the doc:

http://pve.proxmox.com/wiki/Ceph_Server

you need to use ";"

10.10.10.1:6789;10.10.10.2:6789;10.10.10.3:6789

tycoonbob · Nov 17, 2014

spirit said:
please read the doc:

http://pve.proxmox.com/wiki/Ceph_Server

you need to use ";"

10.10.10.1:6789;10.10.10.2:6789;10.10.10.3:6789

Thanks. This is actually what I originally started with, but changed it after seeing a thorough blog post about setting all this up. Still making no difference. I have no clue what else to look at here, but I am still unable to view the "Content" tab of this ceph pool in Proxmox.

Code:

root@hv01:~# pveceph status
{
   "monmap" : {
      "mons" : [
         {
            "name" : "0",
            "addr" : "10.10.10.1:6789/0",
            "rank" : 0
         },
         {
            "name" : "1",
            "addr" : "10.10.10.2:6789/0",
            "rank" : 1
         },
         {
            "name" : "2",
            "addr" : "10.10.10.3:6789/0",
            "rank" : 2
         }
      ],
      "created" : "2014-11-14 15:45:52.318117",
      "epoch" : 3,
      "modified" : "2014-11-14 15:46:08.967962",
      "fsid" : "d03c0973-7905-4806-b678-228a532c89a8"
   },
   "election_epoch" : 20,
   "health" : {
      "detail" : [],
      "overall_status" : "HEALTH_OK",
      "summary" : [],
      "timechecks" : {
         "mons" : [
            {
               "name" : "0",
               "latency" : "0.000000",
               "skew" : "0.000000",
               "health" : "HEALTH_OK"
            },
            {
               "name" : "1",
               "latency" : "0.008335",
               "skew" : "0.007720",
               "health" : "HEALTH_OK"
            },
            {
               "name" : "2",
               "latency" : "0.010144",
               "skew" : "0.000000",
               "health" : "HEALTH_OK"
            }
         ],
         "epoch" : 20,
         "round_status" : "finished",
         "round" : 2
      },
      "health" : {
         "health_services" : [
            {
               "mons" : [
                  {
                     "kb_used" : 1439140,
                     "last_updated" : "2014-11-17 09:39:35.801631",
                     "name" : "0",
                     "health" : "HEALTH_OK",
                     "kb_total" : 17546044,
                     "kb_avail" : 15215616,
                     "store_stats" : {
                        "bytes_total" : 1809190,
                        "last_updated" : "0.000000",
                        "bytes_misc" : 65552,
                        "bytes_sst" : 1547030,
                        "bytes_log" : 196608
                     },
                     "avail_percent" : 86
                  },
                  {
                     "kb_used" : 1452680,
                     "last_updated" : "2014-11-17 09:39:03.045882",
                     "name" : "1",
                     "health" : "HEALTH_OK",
                     "kb_total" : 35092160,
                     "kb_avail" : 31856904,
                     "store_stats" : {
                        "bytes_total" : 2042550,
                        "last_updated" : "0.000000",
                        "bytes_misc" : 65552,
                        "bytes_sst" : 1780390,
                        "bytes_log" : 196608
                     },
                     "avail_percent" : 90
                  },
                  {
                     "kb_used" : 1451892,
                     "last_updated" : "2014-11-17 09:39:27.402420",
                     "name" : "2",
                     "health" : "HEALTH_OK",
                     "kb_total" : 35092160,
                     "kb_avail" : 31857692,
                     "store_stats" : {
                        "bytes_total" : 1915165,
                        "last_updated" : "0.000000",
                        "bytes_misc" : 65552,
                        "bytes_sst" : 1784077,
                        "bytes_log" : 65536
                     },
                     "avail_percent" : 90
                  }
               ]
            }
         ]
      }
   },
   "osdmap" : {
      "osdmap" : {
         "num_in_osds" : 3,
         "epoch" : 27,
         "nearfull" : false,
         "num_up_osds" : 3,
         "full" : false,
         "num_osds" : 3
      }
   },
   "mdsmap" : {
      "epoch" : 1,
      "by_rank" : [],
      "in" : 0,
      "max" : 1,
      "up" : 0
   },
   "pgmap" : {
      "bytes_total" : 1519478943744,
      "pgs_by_state" : [
         {
            "count" : 342,
            "state_name" : "active+clean"
         }
      ],
      "data_bytes" : 0,
      "num_pgs" : 342,
      "version" : 429,
      "bytes_avail" : 1519359598592,
      "bytes_used" : 119345152
   },
   "quorum" : [
      0,
      1,
      2
   ],
   "quorum_names" : [
      "0",
      "1",
      "2"
   ],
   "fsid" : "d03c0973-7905-4806-b678-228a532c89a8"
}

Code:

root@hv01:~# pveceph lspools
Name                       size     pg_num                 used
data                          3         64                    0
metadata                      3         64                    0
rbd                           3         64                    0
ssd_01                        3        150                    0

Code:

root@hv01:~# cat /etc/pve/storage.cfg
nfs: nus01-nfs_01
        path /mnt/pve/nus01-nfs_01
        server 172.16.1.250
        export /proxmox
        options vers=3
        content images,iso,vztmpl,rootdir,backup
        nodes hv03,hv02,hv01
        maxfiles 2


dir: local
        path /var/lib/vz
        content images,iso,vztmpl,rootdir
        maxfiles 0


rbd: ceph-sshd_01
        monhost 10.10.10.1:6789;10.10.10.2:6789;10.10.10.3:6789
        pool ssd_01
        content images
        username admin

Code:

root@hv01:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:22:19:59:02:bc brd ff:ff:ff:ff:ff:ff
    inet6 fe80::222:19ff:fe59:2bc/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:22:19:59:02:be brd ff:ff:ff:ff:ff:ff
    inet6 fe80::222:19ff:fe59:2be/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:22:19:59:02:c0 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:22:19:59:02:c2 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.1/24 brd 10.10.10.255 scope global eth3
    inet6 fe80::222:19ff:fe59:2c2/64 scope link
       valid_lft forever preferred_lft forever
6: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether 66:f8:19:a3:ea:62 brd ff:ff:ff:ff:ff:ff
7: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 00:22:19:59:02:bc brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.201/24 brd 172.16.1.255 scope global vmbr1
    inet6 fe80::222:19ff:fe59:2bc/64 scope link
       valid_lft forever preferred_lft forever
8: bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether a2:62:f5:ba:0c:a3 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a062:f5ff:feba:ca3/64 scope link
       valid_lft forever preferred_lft forever
9: venet0: <BROADCAST,POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/void
    inet6 fe80::1/128 scope link
       valid_lft forever preferred_lft forever
10: tap112i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether ee:e5:3b:99:fd:2b brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ece5:3bff:fe99:fd2b/64 scope link
       valid_lft forever preferred_lft forever
11: tap122i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether 7e:1a:d7:6f:12:3e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7c1a:d7ff:fe6f:123e/64 scope link
       valid_lft forever preferred_lft forever
12: tap129i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether c6:0c:0f:b0:cc:f6 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::c40c:fff:feb0:ccf6/64 scope link
       valid_lft forever preferred_lft forever
15: tap125i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether 4a:4c:85:a0:2b:75 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::484c:85ff:fea0:2b75/64 scope link
       valid_lft forever preferred_lft forever
16: tap118i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether 9e:83:4a:f4:30:22 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::9c83:4aff:fef4:3022/64 scope link
       valid_lft forever preferred_lft forever

Code:

root@hv01:~# ls /var/log/ceph/
ceph.log  ceph-mon.0.log
root@hv01:~# tail -25 /var/log/ceph/ceph.log
2014-11-17 09:18:29.299093 mon.1 10.10.10.2:6789/0 1 : [INF] mon.1 calling new monitor election
2014-11-17 09:18:35.192438 mon.2 10.10.10.3:6789/0 1 : [INF] mon.2 calling new monitor election
2014-11-17 09:18:35.229295 mon.1 10.10.10.2:6789/0 2 : [INF] mon.1 calling new monitor election
2014-11-17 09:33:35.802971 mon.0 10.10.10.1:6789/0 1 : [INF] mon.0 calling new monitor election
2014-11-17 09:33:35.872906 mon.0 10.10.10.1:6789/0 2 : [INF] mon.0@0 won leader election with quorum 0,1,2
2014-11-17 09:33:35.889293 mon.0 10.10.10.1:6789/0 3 : [INF] monmap e3: 3 mons at {0=10.10.10.1:6789/0,1=10.10.10.2:6789/0,2=10.10.10.3:6789/0}
2014-11-17 09:33:35.889392 mon.0 10.10.10.1:6789/0 4 : [INF] pgmap v429: 342 pgs: 342 active+clean; 0 bytes data, 113 MB used, 1415 GB / 1415 GB avail
2014-11-17 09:33:35.889546 mon.0 10.10.10.1:6789/0 5 : [INF] mdsmap e1: 0/0/1 up
2014-11-17 09:33:35.889643 mon.0 10.10.10.1:6789/0 6 : [INF] osdmap e27: 3 osds: 3 up, 3 in
2014-11-17 09:35:03.047596 mon.1 10.10.10.2:6789/0 1 : [INF] mon.1 calling new monitor election
2014-11-17 09:35:03.101121 mon.0 10.10.10.1:6789/0 7 : [INF] mon.0 calling new monitor election
2014-11-17 09:35:03.114443 mon.0 10.10.10.1:6789/0 8 : [INF] mon.0@0 won leader election with quorum 0,1,2
2014-11-17 09:35:03.129968 mon.0 10.10.10.1:6789/0 9 : [INF] monmap e3: 3 mons at {0=10.10.10.1:6789/0,1=10.10.10.2:6789/0,2=10.10.10.3:6789/0}
2014-11-17 09:35:03.130048 mon.0 10.10.10.1:6789/0 10 : [INF] pgmap v429: 342 pgs: 342 active+clean; 0 bytes data, 113 MB used, 1415 GB / 1415 GB avail
2014-11-17 09:35:03.130127 mon.0 10.10.10.1:6789/0 11 : [INF] mdsmap e1: 0/0/1 up
2014-11-17 09:35:03.130240 mon.0 10.10.10.1:6789/0 12 : [INF] osdmap e27: 3 osds: 3 up, 3 in
2014-11-17 09:35:27.404958 mon.2 10.10.10.3:6789/0 1 : [INF] mon.2 calling new monitor election
2014-11-17 09:35:27.480721 mon.0 10.10.10.1:6789/0 13 : [INF] mon.0 calling new monitor election
2014-11-17 09:35:27.484816 mon.0 10.10.10.1:6789/0 14 : [INF] mon.0@0 won leader election with quorum 0,1,2
2014-11-17 09:35:27.493998 mon.1 10.10.10.2:6789/0 2 : [INF] mon.1 calling new monitor election
2014-11-17 09:35:27.500537 mon.0 10.10.10.1:6789/0 15 : [INF] monmap e3: 3 mons at {0=10.10.10.1:6789/0,1=10.10.10.2:6789/0,2=10.10.10.3:6789/0}
2014-11-17 09:35:27.500630 mon.0 10.10.10.1:6789/0 16 : [INF] pgmap v429: 342 pgs: 342 active+clean; 0 bytes data, 113 MB used, 1415 GB / 1415 GB avail
2014-11-17 09:35:27.500714 mon.0 10.10.10.1:6789/0 17 : [INF] mdsmap e1: 0/0/1 up
2014-11-17 09:35:27.500817 mon.0 10.10.10.1:6789/0 18 : [INF] osdmap e27: 3 osds: 3 up, 3 in

Any ideas? If there is anything additional you'd like to see, please let me know.

Thanks.

brad_mssw · Nov 17, 2014

Did you copy the key over for proxmox to use for the communication?

Code:

mkdir -p /etc/pve/priv/ceph
cp /etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/ceph-sshd_01.keyring

tycoonbob · Nov 17, 2014

Yes, I have.

Code:

root@hv01:~# ls /etc/pve/priv/ceph
ceph-ssd_01.keyring
root@hv01:~# cat /etc/pve/priv/ceph/ceph-ssd_01.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==

brad_mssw · Nov 17, 2014

You mentioned you killed your first attempt and restarted. Your keyring isn't the original from the first attempt is it? You might get more information from the actual proxmox logs on why it couldn't connect.

I want to say if you attempt to create a vm on that storage pool and let it fail since it can't connect to rbd, then click on the failure log to get more info it will give you a better error message. I'm not sure how else to get that error (perhaps it goes into syslog, I don't know).

tycoonbob · Nov 17, 2014

brad_mssw said:
You mentioned you killed your first attempt and restarted. Your keyring isn't the original from the first attempt is it? You might get more information from the actual proxmox logs on why it couldn't connect.

I want to say if you attempt to create a vm on that storage pool and let it fail since it can't connect to rbd, then click on the failure log to get more info it will give you a better error message. I'm not sure how else to get that error (perhaps it goes into syslog, I don't know).

How can I make it generate a new keyring?

brad_mssw · Nov 17, 2014

I just meant you need to make sure the keyring in /etc/ceph/ currently matches the one in /etc/pve/priv/ceph/ as if it got replaced when you started over and you're still using the old one that could explain it. Really, you need a better error message to know what is going on.

I assume you've tried things like "rados ls" and "rados df" to make sure ceph itself is able to bind, right? If those commands work then you know it is something in proxmox.

tycoonbob · Nov 17, 2014

brad_mssw said:
I just meant you need to make sure the keyring in /etc/ceph/ currently matches the one in /etc/pve/priv/ceph/ as if it got replaced when you started over and you're still using the old one that could explain it. Really, you need a better error message to know what is going on.

I assume you've tried things like "rados ls" and "rados df" to make sure ceph itself is able to bind, right? If those commands work then you know it is something in proxmox.

Ah, the keys do match. Should I be concerned about how the contents of /etc/pve/priv/ceph/ceph-ssd_01.keyring use "[client.admin]" instead of something relating to the pool name?

Code:

root@hv01:/etc/pve/priv/ceph# rados -p ssd_01 ls
root@hv01:/etc/pve/priv/ceph# rados df
pool name       category                 KB      objects       clones     degraded      unfound           rd        rd KB           wr        wr KB
data            -                          0            0            0            0           0            0            0            0            0
metadata        -                          0            0            0            0           0            0            0            0            0
rbd             -                          0            0            0            0           0            0            0            0            0
ssd_01          -                          0            0            0            0           0            0            0            0            0
  total used          116548            0
  total avail     1483749608
  total space     1483866156

This is what happens when I try to create a VM on that ceph pool:

Nothing useful there. I see nothing in my Ceph logs, and I am thinking you are correct that it's an issue with PVE binding to Ceph. I'll take a look through my PVE logs and see if I can find anything else.

brad_mssw · Nov 17, 2014

What does the 'Output' column of that same screen show? Status is usually just a summary, I think the Output tab should have the real console message. But yes, definitely looks like something to do with the proxmox binding to ceph. My keyring also uses client.admin and it works fine. The only real differences from my config are:

1. I space delimit the ceph monitor hosts not semi-colon
2. I reference my ceph monitor hosts by hostnames listed in /etc/hosts
3. I do not specify the port for the ceph monitor hosts, I'm really just using "ceph1 ceph2 ceph3" (without the quotes) for my monitor hosts
4. My ceph pool name and proxmox storage name I kept the same
5. I do not use any underscores or hyphens in my names

I'm just listing the differences, not saying any one of those is at fault.

udo · Nov 17, 2014

tycoonbob said:
Ah, the keys do match. Should I be concerned about how the contents of /etc/pve/priv/ceph/ceph-ssd_01.keyring use "[client.admin]" instead of something relating to the pool name?

Hi,
the name ceph-ssd_01 must match the storage-config in /etc/pve/storage.cfg

And the right of client.admin (with the right key) you can control with

Code:

ceph auth list

So you can create user for only one specific pool like

Code:

client.pve
        key: PQBqgvdSaFapHcAAgWfbrMsVrfqTKnJsn8hMQc==
        caps: [mds] allow
        caps: [mon] allow r
        caps: [osd] allow rwx pool=pve

Udo

tycoonbob · Nov 17, 2014

brad_mssw said:
What does the 'Output' column of that same screen show? Status is usually just a summary, I think the Output tab should have the real console message. But yes, definitely looks like something to do with the proxmox binding to ceph. My keyring also uses client.admin and it works fine. The only real differences from my config are:

1. I space delimit the ceph monitor hosts not semi-colon
2. I reference my ceph monitor hosts by hostnames listed in /etc/hosts
3. I do not specify the port for the ceph monitor hosts, I'm really just using "ceph1 ceph2 ceph3" (without the quotes) for my monitor hosts
4. My ceph pool name and proxmox storage name I kept the same
5. I do not use any underscores or hyphens in my names

I'm just listing the differences, not saying any one of those is at fault.

Thanks for the feedback on all the little differences; I will try them all to see if I get different results.

So do you have a separate network from Ceph, and do you use alias names for the Ceph network? For example, HV01 is my first Proxmox host. LAN/MGMT IP is 172.16.1.201, while I have a second network at 10.10.10.1 for Ceph traffic. Are you doing the same, but creating a host entry called "ceph1" for example, and attaching it to the Ceph network? Does my question make sense to you?

tycoonbob · Nov 17, 2014

udo said:
Hi,
the name ceph-ssd_01 must match the storage-config in /etc/pve/storage.cfg

And the right of client.admin (with the right key) you can control with

Code:

ceph auth list

So you can create user for only one specific pool like

Code:

client.pve key: PQBqgvdSaFapHcAAgWfbrMsVrfqTKnJsn8hMQc== caps: [mds] allow caps: [mon] allow r caps: [osd] allow rwx pool=pve

Udo

Thanks for the feedback. The file name matches the pool storage.cfg name, but the square bracket inside the file say "client.admin", and I was wondering if that was correct. After seeing "ceph auth list", I now understand that that keyring set permission based on that "client.admin".

Code:

root@hv01:~# ceph auth list
installed auth entries:


osd.0
        key: AQD9amZUOAVmJRAAqopPQ0dD19xzrrdfdUJfLA==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.1
        key: AQCea2ZUgMQBChAA6luUU6Hc9WTa80xwsQ6ljw==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.2
        key: AQC8a2ZUoDb1HxAA0yOCX/StE9M6dNIsUiQTeQ==
        caps: [mon] allow profile osd
        caps: [osd] allow *
client.admin
        key: AQAAamZUCHrhChAAIS131t7P1celiBRPfXWebQ==
        caps: [mds] allow
        caps: [mon] allow *
        caps: [osd] allow *
client.bootstrap-mds
        key: AQACamZUsCi6EhAABxTUQ3SqEsh327gfGrt0tQ==
        caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
        key: AQACamZUCPWyBBAALSsESce735Uvxw0rXpHSQg==
        caps: [mon] allow profile bootstrap-osd
root@hv01:~#

I assume all the above looks correct.

tycoonbob · Nov 17, 2014

I created a new Ceph pool named ssd, and created a new rbd storage in proxmox, also named ssd. I moved the keyring file to "ssd.keyring" as well.

Code:

root@hv01:/etc/pve/priv/ceph# pveceph lspools
Name                       size     pg_num                 used
data                          3         64                    0
metadata                      3         64                    0
rbd                           3         64                    0
ssd                           3         64                    0

Code:

root@hv01:/etc/pve/priv/ceph# cat /etc/pve/storage.cfg
nfs: nus01-nfs_01
        path /mnt/pve/nus01-nfs_01
        server 172.16.1.250
        export /proxmox
        options vers=3
        content images,iso,vztmpl,rootdir,backup
        nodes hv03,hv02,hv01
        maxfiles 2


dir: local
        path /var/lib/vz
        content images,iso,vztmpl,rootdir
        maxfiles 0


rbd: ssd
        monhost 10.10.10.1:6789;10.10.10.2:6789;10.10.10.3:6789
        pool ssd
        content images
        username admin

Code:

root@hv01:/etc/pve/priv/ceph# ls
ssd.keyring
root@hv01:/etc/pve/priv/ceph# cat ssd.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
root@hv01:/etc/pve/priv/ceph# cat /etc/ceph/ceph.client.admin.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
root@hv01:/etc/pve/priv/ceph#

Still seeing the same problem with Proxmox viewing the "Contents" tab of that storage.
Even if I change storage.cfg to:

Code:

rbd: ssd
        monhost 10.10.10.1 10.10.10.2 10.10.10.3
        pool ssd
        content images
        username admin

It still doesn't work. This is making me sad. I bought 3 512GB SSDs to really ramp up my Proxmox setup, and I can't get it to work.

udo · Nov 17, 2014

tycoonbob said:

I created a new Ceph pool named ssd, and created a new rbd storage in proxmox, also named ssd. I moved the keyring file to "ssd.keyring" as well.

Code:

root@hv01:/etc/pve/priv/ceph# pveceph lspools
Name                       size     pg_num                 used
data                          3         64                    0
metadata                      3         64                    0
rbd                           3         64                    0
ssd                           3         64                    0

Code:

root@hv01:/etc/pve/priv/ceph# cat /etc/pve/storage.cfg
nfs: nus01-nfs_01
        path /mnt/pve/nus01-nfs_01
        server 172.16.1.250
        export /proxmox
        options vers=3
        content images,iso,vztmpl,rootdir,backup
        nodes hv03,hv02,hv01
        maxfiles 2


dir: local
        path /var/lib/vz
        content images,iso,vztmpl,rootdir
        maxfiles 0


rbd: ssd
        monhost 10.10.10.1:6789;10.10.10.2:6789;10.10.10.3:6789
        pool ssd
        content images
        username admin

Code:

root@hv01:/etc/pve/priv/ceph# ls
ssd.keyring
root@hv01:/etc/pve/priv/ceph# cat ssd.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
root@hv01:/etc/pve/priv/ceph# cat /etc/ceph/ceph.client.admin.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
root@hv01:/etc/pve/priv/ceph#

Still seeing the same problem with Proxmox viewing the "Contents" tab of that storage.
Even if I change storage.cfg to:

Code:

rbd: ssd
        monhost 10.10.10.1 10.10.10.2 10.10.10.3
        pool ssd
        content images
        username admin

It still doesn't work. This is making me sad. I bought 3 512GB SSDs to really ramp up my Proxmox setup, and I can't get it to work.

Hi,
what is the output of following commands:

Code:

ceph osd tree
ceph health detail
ceph -s

Udo

tycoonbob · Nov 17, 2014

udo said:
Hi,
what is the output of following commands:

Code:

ceph osd tree ceph health detail ceph -s

Udo

Code:

root@hv01:~# ceph osd tree
# id    weight  type name       up/down reweight
-1      1.38    root default
-2      0.46            host hv01
0       0.46                    osd.0   up      1
-3      0.46            host hv02
1       0.46                    osd.1   up      1
-4      0.46            host hv03
2       0.46                    osd.2   up      1
root@hv01:~# ceph health detail
HEALTH_OK
root@hv01:~# ceph -s
    cluster d03c0973-7905-4806-b678-228a532c89a8
     health HEALTH_OK
     monmap e3: 3 mons at {0=10.10.10.1:6789/0,1=10.10.10.2:6789/0,2=10.10.10.3:6789/0}, election epoch 20, quorum 0,1,2 0,1,2
     osdmap e32: 3 osds: 3 up, 3 in
      pgmap v441: 256 pgs, 4 pools, 0 bytes data, 0 objects
            115 MB used, 1415 GB / 1415 GB avail
                 256 active+clean
root@hv01:~#

udo · Nov 17, 2014

Hi,
looks not bad.
But I wonder one topic on your first post - you use the same network for cluster and for ceph-network?!
I don't know if this make troubles, because normaly the mons are on one network to communicate with the clients (pve-hosts) and the ceph-network is for osd-syncing between the osd-nodes...

Is the mon accessible? I assume yes, because the ceph commands work.

Code:

netstat -an | grep 6789

any trouble with other storage?

Code:

pvesm status

Udo

tycoonbob · Nov 17, 2014

udo said:
Hi,
looks not bad.
But I wonder one topic on your first post - you use the same network for cluster and for ceph-network?!
I don't know if this make troubles, because normaly the mons are on one network to communicate with the clients (pve-hosts) and the ceph-network is for osd-syncing between the osd-nodes...

Is the mon accessible? I assume yes, because the ceph commands work.

Code:

netstat -an | grep 6789

any trouble with other storage?

Code:

pvesm status

Udo

I am NOT using the same network for cluster and ceph traffic. LAN/Mgmt is on 172.16.1.0/24 while Ceph is on 10.10.10.0/24. Each network has 2 gigabit NIC's in LACP to the switch.

Code:

root@hv01:~# ip addr show vmbr1
7: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 00:22:19:59:02:bc brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.201/24 brd 172.16.1.255 scope global vmbr1
    inet6 fe80::222:19ff:fe59:2bc/64 scope link
       valid_lft forever preferred_lft forever
root@hv01:~# ip addr show eth3
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:22:19:59:02:c2 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.1/24 brd 10.10.10.255 scope global eth3
    inet6 fe80::222:19ff:fe59:2c2/64 scope link
       valid_lft forever preferred_lft forever
root@hv01:~# ping -I eth3 10.10.10.2
PING 10.10.10.2 (10.10.10.2) from 10.10.10.1 eth3: 56(84) bytes of data.
64 bytes from 10.10.10.2: icmp_req=1 ttl=64 time=0.171 ms
64 bytes from 10.10.10.2: icmp_req=2 ttl=64 time=0.151 ms
64 bytes from 10.10.10.2: icmp_req=3 ttl=64 time=0.151 ms
64 bytes from 10.10.10.2: icmp_req=4 ttl=64 time=0.177 ms
64 bytes from 10.10.10.2: icmp_req=5 ttl=64 time=0.150 ms

Code:

root@hv01:~# pvesm status
local           dir 1        34954952          180272        34774680 1.02%
nus01-nfs_01    nfs 1      9760603136      8977842080       782761056 92.48%
ssd             rbd 1               0               0               0 100.00%

udo · Nov 17, 2014

tycoonbob said:
I am NOT using the same network for cluster and ceph traffic. LAN/Mgmt is on 172.16.1.0/24 while Ceph is on 10.10.10.0/24. Each network has 2 gigabit NIC's in LACP to the switch.

Hi,
I mean your entrys in the [global]-section - cluster network and public network is the same.

But I think the issue is different. Also in the global-section you have defined the name for the keyring: $cluster.$name.keyring but your keyring is ssd.keyring - also only $name.keyring!
This looks, you must add your cluster-name with dot before!

Udo

Trying to get Ceph working

Member

Member

Distinguished Member

Member

Well-Known Member

Member

Well-Known Member

Member

Well-Known Member

Member

Well-Known Member

Distinguished Member

Member

Member

Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member