hanging CEPH storage

luphi

Renowned Member
Nov 9, 2015
82
5
73
Hi all,

I have some issues with my 1st CEPH deployment.
I have just 3 servers available, which have to manage everything (VMs, monitors and OSDs).
All 3 servers have the latest community packages installed.
Here is, what I tested so far:

Code:
root@pve1:~# ceph -s
  cluster 9efa1aeb-21e5-46ec-9087-a4b2ea28b32c
  health HEALTH_OK
  monmap e21: 3 mons at {0=172.31.255.3:6789/0,1=172.31.255.2:6789/0,2=172.31.255.1:6789/0}
  election epoch 42, quorum 0,1,2 2,1,0
  osdmap e37: 7 osds: 7 up, 7 in
  pgmap v9880: 512 pgs, 1 pools, 0 bytes data, 0 objects
  248 MB used, 1634 GB / 1634 GB avail
  512 active+clean
Code:
root@pve1:~# time pvesm status
CEPH  rbd 1  0  0  0 100.00%
LVM1  lvm 1  143204352  125829120  17375232 88.37%
freenas  nfs 1  840947328  652749888  120921664 84.87%

real 5m1.050s
user 0m0.608s
sys 0m0.116s

Any hints are much appreciated.

Cheers,
Martin
 
Last edited:
Hello Thomas,

thank you for your reply. I used pveceph and followed the wiki entry you mentioned above.

Code:
# cat /etc/pve/ceph.conf
[global]
  auth client required = cephx
  auth cluster required = cephx
  auth service required = cephx
  cluster network = 172.31.255.0/24
  filestore xattr use omap = true
  fsid = 9efa1aeb-21e5-46ec-9087-a4b2ea28b32c
  keyring = /etc/pve/priv/$cluster.$name.keyring
  osd journal size = 5120
  osd pool default min size = 1
  public network = 172.31.255.0/24

[osd]
  keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.1]
  host = pve2
  mon addr = 172.31.255.2:6789

[mon.2]
  host = pve1
  mon addr = 172.31.255.1:6789

[mon.0]
  host = pve3
  mon addr = 172.31.255.3:6789
Code:
# cat /etc/pve/storage.cfg
nfs: freenas
  server 145.16.214.21
  path /mnt/pve/freenas
  export /mnt/backup/proxmox
  maxfiles 2
  content backup,iso,vztmpl
  options vers=3

dir: local
  disable
  path /var/lib/vz
  content rootdir,images,iso,vztmpl
  maxfiles 0

lvm: LVM1
  vgname pve
  content images,rootdir
  nodes pve1

lvm: LVM2
  vgname pve
  content rootdir,images
  nodes pve2

lvm: LVM3
  vgname pve
  content rootdir,images
  nodes pve3

rbd: CEPH
  monhost pve1,pve2,pve3
  pool ceph
  content rootdir,images
  username admin
 
Last edited:
I checked "ps -ax" and found a couple of
Code:
31641 ?  Sl  0:00 /usr/bin/rados -p rbd -m pve1,pve2,pve3 -n client.admin --keyring /etc/pve/priv/ceph/ceph.keyring --auth_supported cephx df
32239 ?  Sl  0:00 /usr/bin/rados -p rbd -m pve1,pve2,pve3 -n client.admin --keyring /etc/pve/priv/ceph/ceph.keyring --auth_supported cephx df
32271 ?  Sl  0:00 /usr/bin/rados -p rbd -m pve1,pve2,pve3 -n client.admin --keyring /etc/pve/priv/ceph/ceph.keyring --auth_supported cephx df
32382 ?  Sl  0:00 /usr/bin/rados -p rbd -m pve1,pve2,pve3 -n client.admin --keyring /etc/pve/priv/ceph/ceph.keyring --auth_supported cephx df

When I enter the same command to the shell, I get the following:
Code:
# /usr/bin/rados -p rbd -m pve1,pve2,pve3 -n client.admin --keyring /etc/pve/priv/ceph/ceph.keyring --auth_supported cephx df
2016-05-06 13:04:58.257543 7fccdaebf700  0 -- :/4051575739 >> 145.16.214.52:6789/0 pipe(0x29d4050 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x29d1170).fault
2016-05-06 13:05:01.257743 7fccd0ad4700  0 -- :/4051575739 >> 145.16.214.53:6789/0 pipe(0x7fccc8000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fccc8004ef0).fault
2016-05-06 13:05:04.257958 7fccdaebf700  0 -- :/4051575739 >> 145.16.214.52:6789/0 pipe(0x7fccc80081b0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fccc800c450).fault
2016-05-06 13:05:07.258020 7fccd0ad4700  0 -- :/4051575739 >> 145.16.214.51:6789/0 pipe(0x7fccc8000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fccc80065d0).fault
2016-05-06 13:05:10.258445 7fccdaebf700  0 -- :/4051575739 >> 145.16.214.52:6789/0 pipe(0x7fccc80081b0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fccc80058b0).fault

But when I omit "-m pve1,pve2,pve3", it looks much better:
Code:
# /usr/bin/rados -p rbd -n client.admin --keyring /etc/pve/priv/ceph/ceph.keyring --auth_supported cephx df
pool name  KB  objects  clones  degraded  unfound  rd  rd KB  wr  wr KB
rbd  0  0  0  0  0  0  0  0  0
  total used  200048  0
  total avail  1288033512
  total space  1288233560

So I assue an issue with the monitors
Any further things I can check?

Cheers,
Martin
 
please use IP addresses for the monitor hosts and separate them with ";" in /etc/pve/storage.cfg
 
I hope you all had a refreshing weekend but I still need your help.

Cheers,
Martin
 
which ips did you use?

i see from the logs, that you use 172.31.255.x for the monitors
but the names pve1,pve2,pve3 resolve to 145.16.214.y
you have to use the 172.31.255.x ips because the monitors listen only there
 
Hello Dominik,

great, that solved my issue.
Thank you very much.

Cheers,
Martin
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!