I just upgraded all 6 proxmox nodes and now Ceph cluster stopped talking to each other. I had Ceph 0.72 previously. After upgrade it is 0.80. Did i miss a upgrade step??
Last edited:
Hi,I just upgraded all 6 proxmox nodes and now Ceph cluster stopped talking to each other. I had Ceph 0.72 previously. After upgrade it is 0.80. Did i miss a upgrade step??
Upgrade daemons in the following order:
Monitors
OSDs
MDSs and/or radosgw
If the ceph-mds daemon is restarted first, it will wait until all OSDs have been upgraded before finishing its startup sequence. If the ceph-mon daemons are not restarted prior to the ceph-osd daemons, they will not correctly register their new capabilities with the cluster and new features may not be usable until they are restarted a second time.
mon warn on legacy crush tunables = false
netstat -an | grep 6789 | grep -i listen
df -k
du -hs /var/lib/ceph/mon/*/store.db
ceph-mon -i b -d -c /etc/ceph/ceph.conf
root@proxmox4:~# ls -lsa /var/lib/ceph/mon/ceph-b/store.db/LOCK
0 -rw-r--r-- 1 root root 0 Feb 3 2014 /var/lib/ceph/mon/ceph-b/store.db/LOCK
root@proxmox4:~# fuser /var/lib/ceph/mon/ceph-b/store.db/LOCK
/var/lib/ceph/mon/ceph-b/store.db/LOCK: 6357
Hi,
perhaps more than one mon-process?
use 0 instead of b ;-)
enough space inside /var/lib/ceph free?Code:root@proxmox4:~# ls -lsa /var/lib/ceph/mon/ceph-b/store.db/LOCK 0 -rw-r--r-- 1 root root 0 Feb 3 2014 /var/lib/ceph/mon/ceph-b/store.db/LOCK root@proxmox4:~# fuser /var/lib/ceph/mon/ceph-b/store.db/LOCK /var/lib/ceph/mon/ceph-b/store.db/LOCK: 6357
Udo
root@CA-00-01-01-01:/etc/ceph# ls -lsa /var/lib/ceph/mon/ceph-0/store.db/LOCK
0 -rw-r--r-- 1 root root 0 Jul 26 17:09 /var/lib/ceph/mon/ceph-0/store.db/LOCK
root@CA-00-01-01-01:/etc/ceph# fuser /var/lib/ceph/mon/ceph-0/store.db/LOCK
/var/lib/ceph/mon/ceph-0/store.db/LOCK: 3199
2014-09-18 01:46:10.879483 7fcd13e80700 0 monclient(hunting): authenticate timed out after 300
2014-09-18 01:46:10.879523 7fcd13e80700 0 librados: client.admin authentication error (110) Connection timed out
Error connecting to cluster: TimedOut
but you mon is running (and till 26.07.14!!) - what is the output of0 is what i used actually. Since thats the mon ID. With ls -lsa and fuser this is what i get:
Code:root@CA-00-01-01-01:/etc/ceph# ls -lsa /var/lib/ceph/mon/ceph-0/store.db/LOCK 0 -rw-r--r-- 1 root root 0 Jul 26 17:09 /var/lib/ceph/mon/ceph-0/store.db/LOCK root@CA-00-01-01-01:/etc/ceph# fuser /var/lib/ceph/mon/ceph-0/store.db/LOCK /var/lib/ceph/mon/ceph-0/store.db/LOCK: 3199
Plenty of space on local OS disk. All 6 nodes behaving exactly the same. None of the MON wont come on.
ps aux | grep 3199
Hi,I an interested because I would like to try to install one or more MDS and it appears to be very easy using ceph-deploy.
I am curious how come you are using ceph-deploy and not pveceph?
The recommended Proxmox method is to use pveceph.
Is there an advantage to use ceph-deploy?
What changes must we do to use ceph-deploy and still take advantage of the Proxmox cluster replication and ceph management?
I an interested because I would like to try to install one or more MDS and it appears to be very easy using ceph-deploy.
Hi Wasim,netstat shows it is listening.
After trying to run mon in foreground this is what i got:
~# ceph-mon -i 0 -d -c /etc/ceph/ceph.conf
2014-09-18 01:29:10.057089 7f2700d13780 0 ceph version 0.80.5 (xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx), process ceph-mon, pid 7267
2014-09-18 01:29:10.058536 7f2700d13780 -1 asok(0x3444d20) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-mon.0.asok': (17) File exists
IO error: lock /var/lib/ceph/mon/ceph-0/store.db/LOCK: Resource temporarily unavailable
IO error: lock /var/lib/ceph/mon/ceph-0/store.db/LOCK: Resource temporarily unavailable
2014-09-18 01:29:10.058655 7f2700d13780 -1 failed to create new leveldb store
Looks like MONs not starting at all.
apt-get install --only-upgrade ceph ceph-common ceph-fs-common ceph-fuse ceph-mds libcephfs1 python-ceph
root@proxmox2:~# /etc/init.d/ceph restart
=== mon.a ===
=== mon.a ===
Stopping Ceph mon.a on proxmox2...done
=== mon.a ===
Starting Ceph mon.a on proxmox2...
IO error: lock /var/lib/ceph/mon/ceph-a/store.db/LOCK: Resource temporarily unavailable
IO error: lock /var/lib/ceph/mon/ceph-a/store.db/LOCK: Resource temporarily unavailable
2014-09-26 21:25:53.146853 7f6b1133b780 -1 failed to create new leveldb store
failed: 'ulimit -n 32768; /usr/bin/ceph-mon -i a --pid-file /var/run/ceph/mon.a.pid -c /etc/ceph/ceph.conf --cluster ceph '
Starting ceph-create-keys on proxmox2...
root@proxmox2:~# fuser /var/lib/ceph/mon/ceph-a/store.db/LOCK
/var/lib/ceph/mon/ceph-a/store.db/LOCK: 652183
root@proxmox2:~# ps aux | grep 652183
root 652183 1.4 0.4 934032 134872 ? Sl Jun16 2084:34 ceph-mon -i a
root 773100 0.0 0.0 7792 948 pts/4 S+ 21:27 0:00 grep 652183
root@proxmox2:~# kill 652183
root@proxmox2:~# kill 652183
root@proxmox2:~# kill 652183
-bash: kill: (652183) - No such process
root@proxmox2:~# /etc/init.d/ceph restart
=== mon.a ===
=== mon.a ===
Stopping Ceph mon.a on proxmox2...done
=== mon.a ===
Starting Ceph mon.a on proxmox2...
Starting ceph-create-keys on proxmox2...
root@proxmox2:~# ceph -s
cluster 591db070-15c1-4c7a-b107-67717bdb87d9
health HEALTH_WARN some monitors are running older code
...
I had 4 OSDs defined on each proxmox cluster member. The ceph.conf file has nothing below the monitor configuration.
I see no reference of any OSDs in /etc/pve/ceph.
Hi,...
Rebooting makes no difference and I do not know where to begin.
How can I manually shutdown and restart Ceph in a controlled manner?
Serge
ceph health detail
ceph -s
ceph osd tree
Hi Serge,I am getting this for all:
2014-10-02 20:55:37.279808 7fde044b3700 0 -- :/1261613 >> 10.10.10.50:6789/0 pipe(0xd151f0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0xd15460).fault
2014-10-02 20:55:40.279868 7fde043b2700 0 -- :/1261613 >> 10.10.10.51:6789/0 pipe(0xd14390 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0xd14600).fault
2014-10-02 20:55:43.280361 7fde044b3700 0 -- :/1261613 >> 10.10.10.50:6789/0 pipe(0xd15bf0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0xd15e60).fault
2014-10-02 20:55:46.280463 7fde043b2700 0 -- :/1261613 >> 10.10.10.51:6789/0 pipe(0xd16160 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0xd163d0).fault
...
/etc/init.d/ceph restart
/etc/init.d/ceph stop
/etc/init.d/ceph start
Has no output and no effect.
Serge
netstat -an | grep 6789
ls /var/lib/ceph/mon/
ceph-mon -i 0 -d -c /etc/ceph/ceph.conf
df -h