Trying to get Ceph working

tycoonbob

Member
Aug 25, 2014
67
0
6
So I've got a 3 node Proxmox cluster, that I'm trying to setup Ceph on. I have a single 512GB SSD in each to use as osd's, but running into problems.

Code:
root@hv01:~# ceph health
HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds

This is the primary thing that I don't understand whats going on. I've added osd's and have removed them, so right now I have nothing showing up under the OSD tab, but under the Disks tab, I see my volume with the Usage column set to osd.0 (osd.1, and osd.2). Essentially, I want to wipe everything relating to this Ceph config and start over.

I've tried to clean things and restart, but I don't think I'm doing it right...or I just don't understand whats going on.

Code:
root@hv01:~# pveceph purge
root@hv01:~# pveceph init --network 10.10.10.0/24
root@hv01:~# pveceph createmon
creating /etc/pve/priv/ceph.client.admin.keyring
monmaptool: monmap file /tmp/monmap
monmaptool: generated fsid 7e8e3159-42cf-46ff-9b0f-27343a8128fd
epoch 0
fsid 7e8e3159-42cf-46ff-9b0f-27343a8128fd
last_changed 2014-11-14 14:51:57.913833
created 2014-11-14 14:51:57.913833
0: 10.10.10.1:6789/0 mon.0
monmaptool: writing epoch 0 to /tmp/monmap (1 monitors)
ceph-mon: set fsid to fe1edd77-bb21-421d-8fe6-1e92653774d9
ceph-mon: created monfs at /var/lib/ceph/mon/ceph-0 for mon.0
=== mon.0 ===
Starting Ceph mon.0 on hv01...
Starting ceph-create-keys on hv01...
root@hv01:~# ceph health
HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds

How can I clear the health warning above, and "unstuck" those pgs? How can I completely wipe those Disks to set up again?

Thanks!
 
Last edited:
Well I've managed to wipe things and restart with my Ceph setup. I have Ceph setup, and I created a new pool, but I am having trouble adding it as a storage volume. I have a pool called "ssd_01", size/min is "3/2", pg_num is "150".

Under storage I've added:
ID: ceph-ssd_01
Pool: ssd_01
Monitor Host: 10.10.10.1:6789 10.10.10.2:6789 10.10.10.3:6789
User name: admin
Enabled: check

When I go to that volume under the storage on one of my hosts, I seen enabled and active is "Yes". If I click the Content tab, I get "rbd error: rbd: couldn't connect to the cluster! (500)". Any idea what I did wrong? Where could I find the log to help me out here?
 
please read the doc:

http://pve.proxmox.com/wiki/Ceph_Server

you need to use ";"


10.10.10.1:6789;10.10.10.2:6789;10.10.10.3:6789

Thanks. This is actually what I originally started with, but changed it after seeing a thorough blog post about setting all this up. Still making no difference. I have no clue what else to look at here, but I am still unable to view the "Content" tab of this ceph pool in Proxmox.

Code:
root@hv01:~# pveceph status
{
   "monmap" : {
      "mons" : [
         {
            "name" : "0",
            "addr" : "10.10.10.1:6789/0",
            "rank" : 0
         },
         {
            "name" : "1",
            "addr" : "10.10.10.2:6789/0",
            "rank" : 1
         },
         {
            "name" : "2",
            "addr" : "10.10.10.3:6789/0",
            "rank" : 2
         }
      ],
      "created" : "2014-11-14 15:45:52.318117",
      "epoch" : 3,
      "modified" : "2014-11-14 15:46:08.967962",
      "fsid" : "d03c0973-7905-4806-b678-228a532c89a8"
   },
   "election_epoch" : 20,
   "health" : {
      "detail" : [],
      "overall_status" : "HEALTH_OK",
      "summary" : [],
      "timechecks" : {
         "mons" : [
            {
               "name" : "0",
               "latency" : "0.000000",
               "skew" : "0.000000",
               "health" : "HEALTH_OK"
            },
            {
               "name" : "1",
               "latency" : "0.008335",
               "skew" : "0.007720",
               "health" : "HEALTH_OK"
            },
            {
               "name" : "2",
               "latency" : "0.010144",
               "skew" : "0.000000",
               "health" : "HEALTH_OK"
            }
         ],
         "epoch" : 20,
         "round_status" : "finished",
         "round" : 2
      },
      "health" : {
         "health_services" : [
            {
               "mons" : [
                  {
                     "kb_used" : 1439140,
                     "last_updated" : "2014-11-17 09:39:35.801631",
                     "name" : "0",
                     "health" : "HEALTH_OK",
                     "kb_total" : 17546044,
                     "kb_avail" : 15215616,
                     "store_stats" : {
                        "bytes_total" : 1809190,
                        "last_updated" : "0.000000",
                        "bytes_misc" : 65552,
                        "bytes_sst" : 1547030,
                        "bytes_log" : 196608
                     },
                     "avail_percent" : 86
                  },
                  {
                     "kb_used" : 1452680,
                     "last_updated" : "2014-11-17 09:39:03.045882",
                     "name" : "1",
                     "health" : "HEALTH_OK",
                     "kb_total" : 35092160,
                     "kb_avail" : 31856904,
                     "store_stats" : {
                        "bytes_total" : 2042550,
                        "last_updated" : "0.000000",
                        "bytes_misc" : 65552,
                        "bytes_sst" : 1780390,
                        "bytes_log" : 196608
                     },
                     "avail_percent" : 90
                  },
                  {
                     "kb_used" : 1451892,
                     "last_updated" : "2014-11-17 09:39:27.402420",
                     "name" : "2",
                     "health" : "HEALTH_OK",
                     "kb_total" : 35092160,
                     "kb_avail" : 31857692,
                     "store_stats" : {
                        "bytes_total" : 1915165,
                        "last_updated" : "0.000000",
                        "bytes_misc" : 65552,
                        "bytes_sst" : 1784077,
                        "bytes_log" : 65536
                     },
                     "avail_percent" : 90
                  }
               ]
            }
         ]
      }
   },
   "osdmap" : {
      "osdmap" : {
         "num_in_osds" : 3,
         "epoch" : 27,
         "nearfull" : false,
         "num_up_osds" : 3,
         "full" : false,
         "num_osds" : 3
      }
   },
   "mdsmap" : {
      "epoch" : 1,
      "by_rank" : [],
      "in" : 0,
      "max" : 1,
      "up" : 0
   },
   "pgmap" : {
      "bytes_total" : 1519478943744,
      "pgs_by_state" : [
         {
            "count" : 342,
            "state_name" : "active+clean"
         }
      ],
      "data_bytes" : 0,
      "num_pgs" : 342,
      "version" : 429,
      "bytes_avail" : 1519359598592,
      "bytes_used" : 119345152
   },
   "quorum" : [
      0,
      1,
      2
   ],
   "quorum_names" : [
      "0",
      "1",
      "2"
   ],
   "fsid" : "d03c0973-7905-4806-b678-228a532c89a8"
}

Code:
root@hv01:~# pveceph lspools
Name                       size     pg_num                 used
data                          3         64                    0
metadata                      3         64                    0
rbd                           3         64                    0
ssd_01                        3        150                    0

Code:
root@hv01:~# cat /etc/pve/storage.cfg
nfs: nus01-nfs_01
        path /mnt/pve/nus01-nfs_01
        server 172.16.1.250
        export /proxmox
        options vers=3
        content images,iso,vztmpl,rootdir,backup
        nodes hv03,hv02,hv01
        maxfiles 2


dir: local
        path /var/lib/vz
        content images,iso,vztmpl,rootdir
        maxfiles 0


rbd: ceph-sshd_01
        monhost 10.10.10.1:6789;10.10.10.2:6789;10.10.10.3:6789
        pool ssd_01
        content images
        username admin

Code:
root@hv01:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:22:19:59:02:bc brd ff:ff:ff:ff:ff:ff
    inet6 fe80::222:19ff:fe59:2bc/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:22:19:59:02:be brd ff:ff:ff:ff:ff:ff
    inet6 fe80::222:19ff:fe59:2be/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:22:19:59:02:c0 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:22:19:59:02:c2 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.1/24 brd 10.10.10.255 scope global eth3
    inet6 fe80::222:19ff:fe59:2c2/64 scope link
       valid_lft forever preferred_lft forever
6: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether 66:f8:19:a3:ea:62 brd ff:ff:ff:ff:ff:ff
7: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 00:22:19:59:02:bc brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.201/24 brd 172.16.1.255 scope global vmbr1
    inet6 fe80::222:19ff:fe59:2bc/64 scope link
       valid_lft forever preferred_lft forever
8: bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether a2:62:f5:ba:0c:a3 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a062:f5ff:feba:ca3/64 scope link
       valid_lft forever preferred_lft forever
9: venet0: <BROADCAST,POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/void
    inet6 fe80::1/128 scope link
       valid_lft forever preferred_lft forever
10: tap112i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether ee:e5:3b:99:fd:2b brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ece5:3bff:fe99:fd2b/64 scope link
       valid_lft forever preferred_lft forever
11: tap122i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether 7e:1a:d7:6f:12:3e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7c1a:d7ff:fe6f:123e/64 scope link
       valid_lft forever preferred_lft forever
12: tap129i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether c6:0c:0f:b0:cc:f6 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::c40c:fff:feb0:ccf6/64 scope link
       valid_lft forever preferred_lft forever
15: tap125i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether 4a:4c:85:a0:2b:75 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::484c:85ff:fea0:2b75/64 scope link
       valid_lft forever preferred_lft forever
16: tap118i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether 9e:83:4a:f4:30:22 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::9c83:4aff:fef4:3022/64 scope link
       valid_lft forever preferred_lft forever

Code:
root@hv01:~# ls /var/log/ceph/
ceph.log  ceph-mon.0.log
root@hv01:~# tail -25 /var/log/ceph/ceph.log
2014-11-17 09:18:29.299093 mon.1 10.10.10.2:6789/0 1 : [INF] mon.1 calling new monitor election
2014-11-17 09:18:35.192438 mon.2 10.10.10.3:6789/0 1 : [INF] mon.2 calling new monitor election
2014-11-17 09:18:35.229295 mon.1 10.10.10.2:6789/0 2 : [INF] mon.1 calling new monitor election
2014-11-17 09:33:35.802971 mon.0 10.10.10.1:6789/0 1 : [INF] mon.0 calling new monitor election
2014-11-17 09:33:35.872906 mon.0 10.10.10.1:6789/0 2 : [INF] mon.0@0 won leader election with quorum 0,1,2
2014-11-17 09:33:35.889293 mon.0 10.10.10.1:6789/0 3 : [INF] monmap e3: 3 mons at {0=10.10.10.1:6789/0,1=10.10.10.2:6789/0,2=10.10.10.3:6789/0}
2014-11-17 09:33:35.889392 mon.0 10.10.10.1:6789/0 4 : [INF] pgmap v429: 342 pgs: 342 active+clean; 0 bytes data, 113 MB used, 1415 GB / 1415 GB avail
2014-11-17 09:33:35.889546 mon.0 10.10.10.1:6789/0 5 : [INF] mdsmap e1: 0/0/1 up
2014-11-17 09:33:35.889643 mon.0 10.10.10.1:6789/0 6 : [INF] osdmap e27: 3 osds: 3 up, 3 in
2014-11-17 09:35:03.047596 mon.1 10.10.10.2:6789/0 1 : [INF] mon.1 calling new monitor election
2014-11-17 09:35:03.101121 mon.0 10.10.10.1:6789/0 7 : [INF] mon.0 calling new monitor election
2014-11-17 09:35:03.114443 mon.0 10.10.10.1:6789/0 8 : [INF] mon.0@0 won leader election with quorum 0,1,2
2014-11-17 09:35:03.129968 mon.0 10.10.10.1:6789/0 9 : [INF] monmap e3: 3 mons at {0=10.10.10.1:6789/0,1=10.10.10.2:6789/0,2=10.10.10.3:6789/0}
2014-11-17 09:35:03.130048 mon.0 10.10.10.1:6789/0 10 : [INF] pgmap v429: 342 pgs: 342 active+clean; 0 bytes data, 113 MB used, 1415 GB / 1415 GB avail
2014-11-17 09:35:03.130127 mon.0 10.10.10.1:6789/0 11 : [INF] mdsmap e1: 0/0/1 up
2014-11-17 09:35:03.130240 mon.0 10.10.10.1:6789/0 12 : [INF] osdmap e27: 3 osds: 3 up, 3 in
2014-11-17 09:35:27.404958 mon.2 10.10.10.3:6789/0 1 : [INF] mon.2 calling new monitor election
2014-11-17 09:35:27.480721 mon.0 10.10.10.1:6789/0 13 : [INF] mon.0 calling new monitor election
2014-11-17 09:35:27.484816 mon.0 10.10.10.1:6789/0 14 : [INF] mon.0@0 won leader election with quorum 0,1,2
2014-11-17 09:35:27.493998 mon.1 10.10.10.2:6789/0 2 : [INF] mon.1 calling new monitor election
2014-11-17 09:35:27.500537 mon.0 10.10.10.1:6789/0 15 : [INF] monmap e3: 3 mons at {0=10.10.10.1:6789/0,1=10.10.10.2:6789/0,2=10.10.10.3:6789/0}
2014-11-17 09:35:27.500630 mon.0 10.10.10.1:6789/0 16 : [INF] pgmap v429: 342 pgs: 342 active+clean; 0 bytes data, 113 MB used, 1415 GB / 1415 GB avail
2014-11-17 09:35:27.500714 mon.0 10.10.10.1:6789/0 17 : [INF] mdsmap e1: 0/0/1 up
2014-11-17 09:35:27.500817 mon.0 10.10.10.1:6789/0 18 : [INF] osdmap e27: 3 osds: 3 up, 3 in


1.PNG

2.PNG

1.PNG

2.PNG

Any ideas? If there is anything additional you'd like to see, please let me know.

Thanks.
 
Last edited:
Did you copy the key over for proxmox to use for the communication?
Code:
mkdir -p /etc/pve/priv/ceph
cp /etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/ceph-sshd_01.keyring
 
Yes, I have.

Code:
root@hv01:~# ls /etc/pve/priv/ceph
ceph-ssd_01.keyring
root@hv01:~# cat /etc/pve/priv/ceph/ceph-ssd_01.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
 
You mentioned you killed your first attempt and restarted. Your keyring isn't the original from the first attempt is it? You might get more information from the actual proxmox logs on why it couldn't connect.

I want to say if you attempt to create a vm on that storage pool and let it fail since it can't connect to rbd, then click on the failure log to get more info it will give you a better error message. I'm not sure how else to get that error (perhaps it goes into syslog, I don't know).
 
You mentioned you killed your first attempt and restarted. Your keyring isn't the original from the first attempt is it? You might get more information from the actual proxmox logs on why it couldn't connect.

I want to say if you attempt to create a vm on that storage pool and let it fail since it can't connect to rbd, then click on the failure log to get more info it will give you a better error message. I'm not sure how else to get that error (perhaps it goes into syslog, I don't know).

How can I make it generate a new keyring?
 
I just meant you need to make sure the keyring in /etc/ceph/ currently matches the one in /etc/pve/priv/ceph/ as if it got replaced when you started over and you're still using the old one that could explain it. Really, you need a better error message to know what is going on.

I assume you've tried things like "rados ls" and "rados df" to make sure ceph itself is able to bind, right? If those commands work then you know it is something in proxmox.
 
I just meant you need to make sure the keyring in /etc/ceph/ currently matches the one in /etc/pve/priv/ceph/ as if it got replaced when you started over and you're still using the old one that could explain it. Really, you need a better error message to know what is going on.

I assume you've tried things like "rados ls" and "rados df" to make sure ceph itself is able to bind, right? If those commands work then you know it is something in proxmox.

Ah, the keys do match. Should I be concerned about how the contents of /etc/pve/priv/ceph/ceph-ssd_01.keyring use "[client.admin]" instead of something relating to the pool name?

Code:
root@hv01:/etc/pve/priv/ceph# rados -p ssd_01 ls
root@hv01:/etc/pve/priv/ceph# rados df
pool name       category                 KB      objects       clones     degraded      unfound           rd        rd KB           wr        wr KB
data            -                          0            0            0            0           0            0            0            0            0
metadata        -                          0            0            0            0           0            0            0            0            0
rbd             -                          0            0            0            0           0            0            0            0            0
ssd_01          -                          0            0            0            0           0            0            0            0            0
  total used          116548            0
  total avail     1483749608
  total space     1483866156

This is what happens when I try to create a VM on that ceph pool:
1.PNG

Nothing useful there. I see nothing in my Ceph logs, and I am thinking you are correct that it's an issue with PVE binding to Ceph. I'll take a look through my PVE logs and see if I can find anything else.
 
What does the 'Output' column of that same screen show? Status is usually just a summary, I think the Output tab should have the real console message. But yes, definitely looks like something to do with the proxmox binding to ceph. My keyring also uses client.admin and it works fine. The only real differences from my config are:

1. I space delimit the ceph monitor hosts not semi-colon
2. I reference my ceph monitor hosts by hostnames listed in /etc/hosts
3. I do not specify the port for the ceph monitor hosts, I'm really just using "ceph1 ceph2 ceph3" (without the quotes) for my monitor hosts
4. My ceph pool name and proxmox storage name I kept the same
5. I do not use any underscores or hyphens in my names

I'm just listing the differences, not saying any one of those is at fault.
 
Ah, the keys do match. Should I be concerned about how the contents of /etc/pve/priv/ceph/ceph-ssd_01.keyring use "[client.admin]" instead of something relating to the pool name?
Hi,
the name ceph-ssd_01 must match the storage-config in /etc/pve/storage.cfg

And the right of client.admin (with the right key) you can control with
Code:
ceph auth list

So you can create user for only one specific pool like
Code:
client.pve
        key: PQBqgvdSaFapHcAAgWfbrMsVrfqTKnJsn8hMQc==
        caps: [mds] allow
        caps: [mon] allow r
        caps: [osd] allow rwx pool=pve
Udo
 
What does the 'Output' column of that same screen show? Status is usually just a summary, I think the Output tab should have the real console message. But yes, definitely looks like something to do with the proxmox binding to ceph. My keyring also uses client.admin and it works fine. The only real differences from my config are:

1. I space delimit the ceph monitor hosts not semi-colon
2. I reference my ceph monitor hosts by hostnames listed in /etc/hosts
3. I do not specify the port for the ceph monitor hosts, I'm really just using "ceph1 ceph2 ceph3" (without the quotes) for my monitor hosts
4. My ceph pool name and proxmox storage name I kept the same
5. I do not use any underscores or hyphens in my names

I'm just listing the differences, not saying any one of those is at fault.

1.PNG

Thanks for the feedback on all the little differences; I will try them all to see if I get different results.

So do you have a separate network from Ceph, and do you use alias names for the Ceph network? For example, HV01 is my first Proxmox host. LAN/MGMT IP is 172.16.1.201, while I have a second network at 10.10.10.1 for Ceph traffic. Are you doing the same, but creating a host entry called "ceph1" for example, and attaching it to the Ceph network? Does my question make sense to you?
 
Hi,
the name ceph-ssd_01 must match the storage-config in /etc/pve/storage.cfg

And the right of client.admin (with the right key) you can control with
Code:
ceph auth list

So you can create user for only one specific pool like
Code:
client.pve
        key: PQBqgvdSaFapHcAAgWfbrMsVrfqTKnJsn8hMQc==
        caps: [mds] allow
        caps: [mon] allow r
        caps: [osd] allow rwx pool=pve
Udo

Thanks for the feedback. The file name matches the pool storage.cfg name, but the square bracket inside the file say "client.admin", and I was wondering if that was correct. After seeing "ceph auth list", I now understand that that keyring set permission based on that "client.admin".

Code:
root@hv01:~# ceph auth list
installed auth entries:


osd.0
        key: AQD9amZUOAVmJRAAqopPQ0dD19xzrrdfdUJfLA==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.1
        key: AQCea2ZUgMQBChAA6luUU6Hc9WTa80xwsQ6ljw==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.2
        key: AQC8a2ZUoDb1HxAA0yOCX/StE9M6dNIsUiQTeQ==
        caps: [mon] allow profile osd
        caps: [osd] allow *
client.admin
        key: AQAAamZUCHrhChAAIS131t7P1celiBRPfXWebQ==
        caps: [mds] allow
        caps: [mon] allow *
        caps: [osd] allow *
client.bootstrap-mds
        key: AQACamZUsCi6EhAABxTUQ3SqEsh327gfGrt0tQ==
        caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
        key: AQACamZUCPWyBBAALSsESce735Uvxw0rXpHSQg==
        caps: [mon] allow profile bootstrap-osd
root@hv01:~#

I assume all the above looks correct.
 
I created a new Ceph pool named ssd, and created a new rbd storage in proxmox, also named ssd. I moved the keyring file to "ssd.keyring" as well.

Code:
root@hv01:/etc/pve/priv/ceph# pveceph lspools
Name                       size     pg_num                 used
data                          3         64                    0
metadata                      3         64                    0
rbd                           3         64                    0
ssd                           3         64                    0

Code:
root@hv01:/etc/pve/priv/ceph# cat /etc/pve/storage.cfg
nfs: nus01-nfs_01
        path /mnt/pve/nus01-nfs_01
        server 172.16.1.250
        export /proxmox
        options vers=3
        content images,iso,vztmpl,rootdir,backup
        nodes hv03,hv02,hv01
        maxfiles 2


dir: local
        path /var/lib/vz
        content images,iso,vztmpl,rootdir
        maxfiles 0


rbd: ssd
        monhost 10.10.10.1:6789;10.10.10.2:6789;10.10.10.3:6789
        pool ssd
        content images
        username admin

Code:
root@hv01:/etc/pve/priv/ceph# ls
ssd.keyring
root@hv01:/etc/pve/priv/ceph# cat ssd.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
root@hv01:/etc/pve/priv/ceph# cat /etc/ceph/ceph.client.admin.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
root@hv01:/etc/pve/priv/ceph#




Still seeing the same problem with Proxmox viewing the "Contents" tab of that storage.
Even if I change storage.cfg to:
Code:
rbd: ssd
        monhost 10.10.10.1 10.10.10.2 10.10.10.3
        pool ssd
        content images
        username admin

It still doesn't work. This is making me sad. I bought 3 512GB SSDs to really ramp up my Proxmox setup, and I can't get it to work. :(
 
I created a new Ceph pool named ssd, and created a new rbd storage in proxmox, also named ssd. I moved the keyring file to "ssd.keyring" as well.

Code:
root@hv01:/etc/pve/priv/ceph# pveceph lspools
Name                       size     pg_num                 used
data                          3         64                    0
metadata                      3         64                    0
rbd                           3         64                    0
ssd                           3         64                    0

Code:
root@hv01:/etc/pve/priv/ceph# cat /etc/pve/storage.cfg
nfs: nus01-nfs_01
        path /mnt/pve/nus01-nfs_01
        server 172.16.1.250
        export /proxmox
        options vers=3
        content images,iso,vztmpl,rootdir,backup
        nodes hv03,hv02,hv01
        maxfiles 2


dir: local
        path /var/lib/vz
        content images,iso,vztmpl,rootdir
        maxfiles 0


rbd: ssd
        monhost 10.10.10.1:6789;10.10.10.2:6789;10.10.10.3:6789
        pool ssd
        content images
        username admin

Code:
root@hv01:/etc/pve/priv/ceph# ls
ssd.keyring
root@hv01:/etc/pve/priv/ceph# cat ssd.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
root@hv01:/etc/pve/priv/ceph# cat /etc/ceph/ceph.client.admin.keyring
[client.admin]
        key = AQCNtGJUQMkrAhAAHNNhc4Oob1UvQR/ifhaG1A==
root@hv01:/etc/pve/priv/ceph#




Still seeing the same problem with Proxmox viewing the "Contents" tab of that storage.
Even if I change storage.cfg to:
Code:
rbd: ssd
        monhost 10.10.10.1 10.10.10.2 10.10.10.3
        pool ssd
        content images
        username admin

It still doesn't work. This is making me sad. I bought 3 512GB SSDs to really ramp up my Proxmox setup, and I can't get it to work. :(
Hi,
what is the output of following commands:
Code:
ceph osd tree
ceph health detail
ceph -s
Udo
 
Hi,
what is the output of following commands:
Code:
ceph osd tree
ceph health detail
ceph -s
Udo

Code:
root@hv01:~# ceph osd tree
# id    weight  type name       up/down reweight
-1      1.38    root default
-2      0.46            host hv01
0       0.46                    osd.0   up      1
-3      0.46            host hv02
1       0.46                    osd.1   up      1
-4      0.46            host hv03
2       0.46                    osd.2   up      1
root@hv01:~# ceph health detail
HEALTH_OK
root@hv01:~# ceph -s
    cluster d03c0973-7905-4806-b678-228a532c89a8
     health HEALTH_OK
     monmap e3: 3 mons at {0=10.10.10.1:6789/0,1=10.10.10.2:6789/0,2=10.10.10.3:6789/0}, election epoch 20, quorum 0,1,2 0,1,2
     osdmap e32: 3 osds: 3 up, 3 in
      pgmap v441: 256 pgs, 4 pools, 0 bytes data, 0 objects
            115 MB used, 1415 GB / 1415 GB avail
                 256 active+clean
root@hv01:~#
 
Hi,
looks not bad.
But I wonder one topic on your first post - you use the same network for cluster and for ceph-network?!
I don't know if this make troubles, because normaly the mons are on one network to communicate with the clients (pve-hosts) and the ceph-network is for osd-syncing between the osd-nodes...

Is the mon accessible? I assume yes, because the ceph commands work.
Code:
netstat -an | grep 6789
any trouble with other storage?
Code:
pvesm status
Udo
 
Hi,
looks not bad.
But I wonder one topic on your first post - you use the same network for cluster and for ceph-network?!
I don't know if this make troubles, because normaly the mons are on one network to communicate with the clients (pve-hosts) and the ceph-network is for osd-syncing between the osd-nodes...

Is the mon accessible? I assume yes, because the ceph commands work.
Code:
netstat -an | grep 6789
any trouble with other storage?
Code:
pvesm status
Udo

I am NOT using the same network for cluster and ceph traffic. LAN/Mgmt is on 172.16.1.0/24 while Ceph is on 10.10.10.0/24. Each network has 2 gigabit NIC's in LACP to the switch.

Code:
root@hv01:~# ip addr show vmbr1
7: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 00:22:19:59:02:bc brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.201/24 brd 172.16.1.255 scope global vmbr1
    inet6 fe80::222:19ff:fe59:2bc/64 scope link
       valid_lft forever preferred_lft forever
root@hv01:~# ip addr show eth3
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:22:19:59:02:c2 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.1/24 brd 10.10.10.255 scope global eth3
    inet6 fe80::222:19ff:fe59:2c2/64 scope link
       valid_lft forever preferred_lft forever
root@hv01:~# ping -I eth3 10.10.10.2
PING 10.10.10.2 (10.10.10.2) from 10.10.10.1 eth3: 56(84) bytes of data.
64 bytes from 10.10.10.2: icmp_req=1 ttl=64 time=0.171 ms
64 bytes from 10.10.10.2: icmp_req=2 ttl=64 time=0.151 ms
64 bytes from 10.10.10.2: icmp_req=3 ttl=64 time=0.151 ms
64 bytes from 10.10.10.2: icmp_req=4 ttl=64 time=0.177 ms
64 bytes from 10.10.10.2: icmp_req=5 ttl=64 time=0.150 ms

Code:
root@hv01:~# pvesm status
local           dir 1        34954952          180272        34774680 1.02%
nus01-nfs_01    nfs 1      9760603136      8977842080       782761056 92.48%
ssd             rbd 1               0               0               0 100.00%
 
I am NOT using the same network for cluster and ceph traffic. LAN/Mgmt is on 172.16.1.0/24 while Ceph is on 10.10.10.0/24. Each network has 2 gigabit NIC's in LACP to the switch.
Hi,
I mean your entrys in the [global]-section - cluster network and public network is the same.

But I think the issue is different. Also in the global-section you have defined the name for the keyring: $cluster.$name.keyring but your keyring is ssd.keyring - also only $name.keyring!
This looks, you must add your cluster-name with dot before!

Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!