[SOLVED] No ceph-mon@<MON-ID>.service after upgrade

Raymond Burns

Member
Apr 2, 2013
333
1
18
Houston, Texas, United States
I know this is long, but I have tried to include every command possible to help with troubleshooting.

I cannot systemctl start ceph-mon.<MON-ID>.service. It is not there.

I needed to upgrade from Hammer to Jewel in order to install a new node.
I've done this several times before. I am trying my best to follow the wiki.
I Have read https://pve.proxmox.com/wiki/Ceph_Hammer_to_Jewel over ten times.
But I CANNOT figure this out.
I think I may have stumbled upon some kind of BUG.

I will list my version below, and list every command that I think will be helpful to you. I have a lot of data, and I don't think that I did anything wrong outside of the commands of the wiki.

Code:
prox-e:~# pveversion
pve-manager/4.4-22/2728f613 (running kernel: 4.4.98-6-pve)

Code:
root@
prox-e:~# cat /etc/ceph/ceph.conf
[global]
         auth client required = cephx
         auth cluster required = cephx
         auth service required = cephx
         cluster network = 192.168.12.0/24
         filestore xattr use omap = true
         fsid = 94e14478-19e4-4b5f-89e5-aff03579632d
         keyring = /etc/pve/priv/$cluster.$name.keyring
         osd journal size = 5120
         osd pool default min size = 1
         public network = 192.168.12.0/24

[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.1]
         host = prox-f
         mon addr = 192.168.12.25:6789

[mon.3]
         host = prox-e
         mon addr = 192.168.12.24:6789

[mon.0]
         host = prox-b
         mon addr = 192.168.12.21:6789

[mon.2]
         host = prox-c
         mon addr = 192.168.12.22:6789

Starting from the beginning of the wiki.

Code:
prox-e:~# cat /etc/apt/sources.list.d/ceph.list
deb http://download.ceph.com/debian-jewel jessie main

Code:
prox-e:~# systemctl stop ceph-mon.3.1513810191.138222489.service

Code:
prox-e:~# df -h

Filesystem        Size  Used Avail Use% Mounted on
udev               10M     0   10M   0% /dev
tmpfs             4.8G   18M  4.7G   1% /run
rpool/ROOT/pve-1  217G  2.5G  214G   2% /
tmpfs              12G   45M   12G   1% /dev/shm
tmpfs             5.0M     0  5.0M   0% /run/lock
tmpfs              12G     0   12G   0% /sys/fs/cgroup
rpool             214G  128K  214G   1% /rpool
rpool/ROOT        214G  128K  214G   1% /rpool/ROOT
rpool/data        214G  128K  214G   1% /rpool/data
/dev/fuse          30M   44K   30M   1% /etc/pve

/dev/sdl1         2.8T  2.1T  660G  77% /var/lib/ceph/osd/ceph-37

/dev/sdn1         2.8T  1.9T  930G  67% /var/lib/ceph/osd/ceph-31
/dev/sdc1         2.8T  1.9T  856G  70% /var/lib/ceph/osd/ceph-39
/dev/sdb1         2.8T  2.1T  705G  75% /var/lib/ceph/osd/ceph-36
/dev/sdj1         2.8T  2.0T  762G  73% /var/lib/ceph/osd/ceph-29
/dev/sda1         1.9T  1.2T  660G  65% /var/lib/ceph/osd/ceph-40
/dev/sdf1         2.8T  1.8T 1003G  65% /var/lib/ceph/osd/ceph-33
/dev/sdd1         2.8T  1.9T  854G  70% /var/lib/ceph/osd/ceph-30
/dev/sdh1         2.8T  2.1T  657G  77% /var/lib/ceph/osd/ceph-28
/dev/sdm1         2.8T  1.6T  1.2T  59% /var/lib/ceph/osd/ceph-38
/dev/sdi1         2.8T  1.8T  1.1T  63% /var/lib/ceph/osd/ceph-32
/dev/sdk1         2.8T  2.0T  785G  72% /var/lib/ceph/osd/ceph-34
/dev/sdg1         1.9T  1.2T  716G  62% /var/lib/ceph/osd/ceph-27

Code:
prox-e:~# readlink -f /var/lib/ceph/osd/ceph-37/journal
/dev/sdl2

Code:
prox-e:~# blkid -o udev -p /dev/sdl2
ID_PART_ENTRY_SCHEME=gpt
ID_PART_ENTRY_NAME=ceph\x20journal
ID_PART_ENTRY_UUID=d3d49ff8-6234-4f9b-ba3f-d19cbd902318
ID_PART_ENTRY_TYPE=45b0969e-9b03-4f30-b4c6-b4b80ceff106
ID_PART_ENTRY_NUMBER=2
ID_PART_ENTRY_OFFSET=2048
ID_PART_ENTRY_SIZE=10483713
ID_PART_ENTRY_DISK=8:176

Code:
root@prox-e:~# ls -halt /var/lib/ceph/
total 29K
drwxr-xr-x  2 ceph ceph  4 Mar 14 20:41 tmp
drwxr-x---  9 ceph ceph  9 Mar 13 16:58 .
drwxr-xr-x 47 root root 48 Mar 13 16:57 ..
drwxr-xr-x  2 ceph ceph  2 Oct  4 10:17 mds
drwxr-xr-x 20 ceph ceph 20 Mar 28  2017 osd
drwxr-xr-x  2 ceph ceph  3 Mar 28  2017 bootstrap-mds
drwxr-xr-x  2 ceph ceph  3 Mar 28  2017 bootstrap-rgw
drwxr-xr-x  2 ceph ceph  3 Mar 28  2017 bootstrap-osd
drwxr-xr-x  3 ceph ceph  3 Mar 28  2017 mon
root@prox-e:~# ls -halt /var/lib/ceph/osd/
total 20K
drwxr-xr-x  3 ceph ceph 217 Mar 14 16:18 ceph-37
drwxr-xr-x  3 ceph ceph 217 Mar 14 16:16 ceph-28
drwxr-xr-x  3 ceph ceph 217 Mar 14 16:15 ceph-34
drwxr-xr-x  3 ceph ceph 217 Mar 14 16:13 ceph-29
drwxr-xr-x  3 ceph ceph 217 Mar 14 16:11 ceph-31
drwxr-xr-x  3 ceph ceph 217 Mar 14 16:10 ceph-32
drwxr-xr-x  3 ceph ceph 217 Mar 14 16:08 ceph-27
drwxr-xr-x  3 ceph ceph 217 Mar 14 16:07 ceph-33
drwxr-xr-x  3 ceph ceph 217 Mar 14 16:05 ceph-30
drwxr-xr-x  3 ceph ceph 217 Mar 14 16:04 ceph-40
drwxr-xr-x  3 ceph ceph 217 Mar 14 16:02 ceph-36
drwxr-xr-x  3 ceph ceph 217 Mar 14 16:01 ceph-38
drwxr-xr-x  3 ceph ceph 217 Mar 14 15:59 ceph-39
drwxr-x---  9 ceph ceph   9 Mar 13 16:58 ..
drwxr-xr-x 20 ceph ceph  20 Mar 28  2017 .
drwxr-xr-x  2 root root   2 Mar 28  2017 ceph-41
drwxr-xr-x  2 ceph ceph   2 Mar 28  2017 ceph-35
drwxr-xr-x  2 ceph ceph   2 Mar 28  2017 ceph-26
drwxr-xr-x  2 ceph ceph   2 Mar 28  2017 ceph-1
drwxr-xr-x  2 ceph ceph   2 Mar 28  2017 ceph-0
root@prox-e:~# ls -halt /var/lib/ceph/osd/ceph-37/
total 65K
-rw-r--r--   1 ceph ceph    0 Mar 14 20:19 systemd
drwxr-xr-x   3 ceph ceph  217 Mar 14 16:18 .
drwxr-xr-x 225 ceph ceph 8.0K Jan 22 10:51 current
drwxr-xr-x  20 ceph ceph   20 Mar 28  2017 ..
-rw-r--r--   1 ceph ceph    3 Mar 28  2017 active
-rw-------   1 ceph ceph   57 Mar 28  2017 keyring
-rw-r--r--   1 ceph ceph    6 Mar 28  2017 ready
-rw-r--r--   1 ceph ceph   53 Mar 28  2017 superblock
-rw-r--r--   1 ceph ceph    4 Mar 28  2017 store_version
-rw-r--r--   1 ceph ceph  610 Mar 28  2017 activate.monmap
-rw-r--r--   1 ceph ceph    3 Mar 28  2017 whoami
-rw-r--r--   1 ceph ceph   21 Mar 28  2017 magic
-rw-r--r--   1 ceph ceph   37 Mar 28  2017 journal_uuid
-rw-r--r--   1 ceph ceph   37 Mar 28  2017 fsid
-rw-r--r--   1 ceph ceph   37 Mar 28  2017 ceph_fsid
lrwxrwxrwx   1 ceph ceph   58 Mar 28  2017 journal -> /dev/disk/by-partuuid/d3d49ff8-6234-4f9b-ba3f-d19cbd902318

Code:
root@prox-e:~# cat /etc/systemd/system/ceph.service
[Unit]
Description=PVE activate Ceph OSD disks
After=pve-cluster.service
Requires=pve-cluster.service

[Service]
ExecStart=/usr/sbin/ceph-disk --log-stdout activate-all
Type=oneshot

[Install]
WantedBy=multi-user.target

Code:
root@prox-e:~# ls -halt /etc/systemd/system/
total 61K
drwxr-xr-x  2 root root  15 Mar 14 20:41 ceph-osd.target.wants
drwxr-xr-x  2 root root  30 Mar 14 16:16 multi-user.target.wants
-rw-r--r--  1 root root 220 Mar 14 16:16 ceph.service
drwxr-xr-x 15 root root  18 Mar 14 16:16 .
drwxr-xr-x  2 root root   5 Mar 13 16:59 ceph.target.wants
drwxr-xr-x  6 root root  13 May  7  2017 ..
drwxr-xr-x  2 root root   4 Mar 28  2017 sockets.target.wants
drwxr-xr-x  2 root root   6 Mar 27  2017 sysinit.target.wants
drwxr-xr-x  2 root root   4 Mar 27  2017 getty.target.wants
lrwxrwxrwx  1 root root  31 Mar 27  2017 sshd.service -> /lib/systemd/system/ssh.service
lrwxrwxrwx  1 root root  35 Mar 27  2017 syslog.service -> /lib/systemd/system/rsyslog.service
drwxr-xr-x  2 root root   4 Dec  9  2016 local-fs.target.wants
drwxr-xr-x  2 root root   4 Dec  9  2016 zfs-mount.service.wants
drwxr-xr-x  2 root root   6 Dec  9  2016 zfs.target.wants
drwxr-xr-x  2 root root   3 Dec  9  2016 zfs-share.service.wants
drwxr-xr-x  2 root root   3 Dec  9  2016 halt.target.wants
drwxr-xr-x  2 root root   3 Dec  9  2016 poweroff.target.wants
drwxr-xr-x  2 root root   3 Dec  9  2016 reboot.target.wants

At this point, systemctl start ceph-mon.3.service does not work, nor does tab completion of systemctl start ceph-mon

I seem to have lost my ceph monitor.
I've rebooted several times since then because I did lose 2 hard disks during the ceph ownership change which took 2 days.

Code:
root@prox-e:~# ceph -s
2018-03-14 21:14:43.777375 7f02a8415700  0 -- :/3284778895 >> 192.168.12.21:6789/0 pipe(0x7f02a405fa40 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f02a405d4d0).fault
2018-03-14 21:14:46.777440 7f02a8314700  0 -- :/3284778895 >> 192.168.12.24:6789/0 pipe(0x7f0298000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0298001f90).fault
2018-03-14 21:14:49.777929 7f02a8415700  0 -- :/3284778895 >> 192.168.12.22:6789/0 pipe(0x7f0298005160 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0298006420).fault
^C^CTraceback (most recent call last):
  File "/usr/bin/ceph", line 954, in <module>
    retval = main()
  File "/usr/bin/ceph", line 858, in main
    prefix='get_command_descriptions')
  File "/usr/lib/python2.7/dist-packages/ceph_argparse.py", line 1308, in json_command
    inbuf, timeout, verbose)
  File "/usr/lib/python2.7/dist-packages/ceph_argparse.py", line 1185, in send_command_retry
    return send_command(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/ceph_argparse.py", line 1239, in send_command
    cluster.mon_command, cmd, inbuf, timeout)
  File "/usr/lib/python2.7/dist-packages/ceph_argparse.py", line 1155, in run_in_thread
    t.start()
  File "/usr/lib/python2.7/threading.py", line 750, in start
    self.__started.wait()
  File "/usr/lib/python2.7/threading.py", line 621, in wait
    self.__cond.wait(timeout)
  File "/usr/lib/python2.7/threading.py", line 340, in wait
    waiter.acquire()
KeyboardInterrupt

I saw these troubleshooting in other threads, so I will list them here.

Code:
prox-e:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0   1.8T  0 disk
├─sda1   8:1    0   1.8T  0 part /var/lib/ceph/osd/ceph-40
└─sda2   8:2    0     5G  0 part
sdb      8:16   0   2.7T  0 disk
├─sdb1   8:17   0   2.7T  0 part /var/lib/ceph/osd/ceph-36
└─sdb2   8:18   0     5G  0 part
sdc      8:32   0   2.7T  0 disk
├─sdc1   8:33   0   2.7T  0 part /var/lib/ceph/osd/ceph-39
└─sdc2   8:34   0     5G  0 part
sdd      8:48   0   2.7T  0 disk
├─sdd1   8:49   0   2.7T  0 part /var/lib/ceph/osd/ceph-30
└─sdd2   8:50   0     5G  0 part
sde      8:64   0   2.7T  0 disk
sdf      8:80   0   2.7T  0 disk
├─sdf1   8:81   0   2.7T  0 part /var/lib/ceph/osd/ceph-33
└─sdf2   8:82   0     5G  0 part
sdg      8:96   0   1.8T  0 disk
├─sdg1   8:97   0   1.8T  0 part /var/lib/ceph/osd/ceph-27
└─sdg2   8:98   0     5G  0 part
sdh      8:112  0   2.7T  0 disk
├─sdh1   8:113  0   2.7T  0 part /var/lib/ceph/osd/ceph-28
└─sdh2   8:114  0     5G  0 part
sdi      8:128  0   2.7T  0 disk
├─sdi1   8:129  0   2.7T  0 part /var/lib/ceph/osd/ceph-32
└─sdi2   8:130  0     5G  0 part
sdj      8:144  0   2.7T  0 disk
├─sdj1   8:145  0   2.7T  0 part /var/lib/ceph/osd/ceph-29
└─sdj2   8:146  0     5G  0 part
sdk      8:160  0   2.7T  0 disk
├─sdk1   8:161  0   2.7T  0 part /var/lib/ceph/osd/ceph-34
└─sdk2   8:162  0     5G  0 part
sdl      8:176  0   2.7T  0 disk
├─sdl1   8:177  0   2.7T  0 part /var/lib/ceph/osd/ceph-37
└─sdl2   8:178  0     5G  0 part
sdm      8:192  0   2.7T  0 disk
├─sdm1   8:193  0   2.7T  0 part /var/lib/ceph/osd/ceph-38
└─sdm2   8:194  0     5G  0 part
sdn      8:208  0   2.7T  0 disk
├─sdn1   8:209  0   2.7T  0 part /var/lib/ceph/osd/ceph-31
└─sdn2   8:210  0     5G  0 part
sdo      8:224  0 232.9G  0 disk
├─sdo1   8:225  0  1007K  0 part
├─sdo2   8:226  0 232.9G  0 part
└─sdo9   8:233  0     8M  0 part
zd0    230:0    0     8G  0 disk [SWAP]

Code:
prox-e:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

auto bond0
iface bond0 inet manual
        slaves eth0 eth1
        bond_miimon 100
        bond_mode active-backup

auto bond1
iface bond1 inet manual
        slaves eth2 eth3
        bond_miimon 100
        bond_mode active-backup

auto vlan600
iface vlan600 inet manual
        vlan-raw-device bond0

auto vmbr0
iface vmbr0 inet static
        address 10.1.12.24
        netmask 255.255.252.0
        gateway 10.1.12.6
        bridge_ports bond0
        bridge_stp off
        bridge_fd 0

auto vmbr1
iface vmbr1 inet static
        address 10.4.12.24
        netmask 255.255.255.0
        bridge_ports vlan600
        bridge_stp off
        bridge_fd 0
        bridge_vlan_aware yes
        network 10.4.12.0

auto vmbr2
iface vmbr2 inet static
        address 192.168.12.24
        netmask 255.255.255.0
        bridge_ports bond1
        bridge_stp off
        bridge_fd 0
        bridge_vlan_aware yes
        network 192.168.12.0

Code:
root@prox-e:~# systemctl status
● prox-e
    State: degraded
     Jobs: 13 queued
   Failed: 1 units
    Since: Wed 2018-03-14 20:18:52 CDT; 58min ago
   CGroup: /
           ├─1 /sbin/init
           └─system.slice
             ├─ksmtuned.service
             │ ├─ 3102 /bin/bash /usr/sbin/ksmtuned
             │ └─23692 sleep 60
             ├─dbus.service
             │ └─3067 /usr/bin/dbus-daemon --system --address=systemd: --nofo
             ├─cron.service
             │ └─3275 /usr/sbin/cron -f
             ├─nfs-common.service
             │ ├─2918 /sbin/rpc.statd
             │ └─2932 /usr/sbin/rpc.idmapd
             ├─pve-ha-lrm.service
             │ └─3975 pve-ha-lr
             ├─postfix.service
             │ ├─3298 /usr/lib/postfix/master
             │ ├─3302 pickup -l -t unix -u -c
             │ └─3303 qmgr -l -t unix -u
             ├─spiceproxy.service
             │ ├─3093 spiceprox
             │ └─3095 spiceproxy worke
             ├─open-iscsi.service
             │ ├─2905 /usr/sbin/iscsid
             │ └─2908 /usr/sbin/iscsid
             ├─system-ceph\x2dosd.slice
             │ ├─ceph-osd@39.service
             │ │ └─control
             │ │   ├─23030 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │ │   └─23336 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             │ ├─ceph-osd@32.service
             │ │ └─control
             │ │   ├─23046 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │ │   └─23358 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             │ ├─ceph-osd@40.service
             │ │ └─control
             │ │   ├─23034 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │ │   └─23357 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             │ ├─ceph-osd@34.service
             │ │ └─control
             │ │   ├─23043 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │ │   └─23341 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             │ ├─ceph-osd@28.service
             │ │ └─control
             │ │   ├─23024 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │ │   └─23313 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             │ ├─ceph-osd@36.service
             │ │ └─control
             │ │   ├─23020 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │ │   └─23275 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             │ ├─ceph-osd@38.service
             │ │ └─control
             │ │   ├─23021 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │ │   └─23290 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             │ ├─ceph-osd@31.service
             │ │ └─control
             │ │   ├─23032 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │ │   └─23342 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             │ ├─ceph-osd@33.service
             │ │ └─control
             │ │   ├─23038 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │ │   └─23354 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             │ ├─ceph-osd@27.service
             │ │ └─control
             │ │   ├─23050 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │ │   └─23323 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             │ ├─ceph-osd@29.service
             │ │ └─control
             │ │   ├─23026 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │ │   └─23315 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             │ ├─ceph-osd@37.service
             │ │ └─control
             │ │   ├─23036 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │ │   └─23356 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             │ └─ceph-osd@30.service
             │   └─control
             │     ├─23027 /bin/sh /usr/lib/ceph/ceph-osd-prestart.sh --clust
             │     └─23316 /usr/bin/python /usr/bin/ceph --cluster=ceph --nam
             ├─corosync.service
             │ └─3332 corosync
             ├─pve-firewall.service
             │ └─3874 pve-firewal
             ├─pve-cluster.service
             │ └─3176 /usr/bin/pmxcfs
             ├─atd.service
             │ └─3042 /usr/sbin/atd -f
             ├─systemd-journald.service
             │ └─1649 /lib/systemd/systemd-journald
             ├─pve-ha-crm.service
             │ └─3972 pve-ha-cr
             ├─systemd-timesyncd.service
             │ └─2379 /lib/systemd/systemd-timesyncd
             ├─rrdcached.service
             │ └─3085 /usr/bin/rrdcached -l unix:/var/run/rrdcached.sock -j /
             ├─pvestatd.service
             │ ├─ 3921 pvestat
             │ └─21184 /usr/bin/rados -p Ceph_Triple -m 192.168.12.25,192.168
             ├─ssh.service
             │ ├─ 3127 /usr/sbin/sshd -D
             │ ├─ 6001 sshd: root@pts/0
             │ ├─ 6023 -bash
             │ ├─23802 systemctl status
             │ └─23803 pager
             ├─systemd-logind.service
             │ └─3057 /lib/systemd/systemd-logind
             ├─watchdog-mux.service
             │ └─3038 /usr/sbin/watchdog-mux
             ├─system-getty.slice
             │ └─getty@tty1.service
             │   ├─ 3138 /bin/login --
             │   ├─10541 -bash
             │   └─22313 ssh prox-f
             ├─pvefw-logger.service
             │ └─2408 /usr/sbin/pvefw-logger
             ├─systemd-udevd.service
             │ └─1662 /lib/systemd/systemd-udevd
             ├─rpcbind.service
             │ └─2901 /sbin/rpcbind -w
             ├─rsyslog.service
             │ └─3112 /usr/sbin/rsyslogd -n
             ├─smartd.service
             │ └─3050 /usr/sbin/smartd -n
             ├─lxc-monitord.service
             │ └─3169 /usr/lib/x86_64-linux-gnu/lxc/lxc-monitord --daemon
             ├─ntp.service
             │ └─3094 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 110:118
             ├─pveproxy.service
             │ ├─3076 pveprox
             │ ├─3077 pveproxy worke
             │ ├─3078 pveproxy worke
             │ └─3079 pveproxy worke
             ├─lxcfs.service
             │ └─3044 /usr/bin/lxcfs /var/lib/lxcfs/
             └─pvedaemon.service
               ├─3966 pvedaemo
               ├─3967 pvedaemon worke
               ├─3968 pvedaemon worke
               └─3969 pvedaemon worke

Code:
root@prox-e:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content iso,vztmpl,backup

zfspool: local-zfs
        pool rpool/data
        sparse
        content rootdir,images

rbd: Ceph_Triple
        monhost 192.168.12.25 192.168.12.21 192.168.12.22
        krbd
        username admin
        content images,rootdir
        pool Ceph_Triple

Code:
root@prox-e:~# cat /etc/apt/sources.list
deb http://ftp.us.debian.org/debian jessie main contrib
# PVE pve-no-subscription repository provided by proxmox.com, NOT recommended for production use
deb http://download.proxmox.com/debian jessie pve-no-subscription
# security updates
deb http://security.debian.org jessie/updates main contrib

Code:
root@prox-e:~# ping 192.168.12.25
PING 192.168.12.25 (192.168.12.25) 56(84) bytes of data.
64 bytes from 192.168.12.25: icmp_seq=1 ttl=64 time=0.158 ms
64 bytes from 192.168.12.25: icmp_seq=2 ttl=64 time=0.189 ms
^C
--- 192.168.12.25 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.158/0.173/0.189/0.020 ms
root@prox-e:~# ping 192.168.12.21
PING 192.168.12.21 (192.168.12.21) 56(84) bytes of data.
64 bytes from 192.168.12.21: icmp_seq=1 ttl=64 time=0.190 ms
^C
--- 192.168.12.21 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.190/0.190/0.190/0.000 ms
root@prox-e:~# ping 192.168.12.22
PING 192.168.12.22 (192.168.12.22) 56(84) bytes of data.
64 bytes from 192.168.12.22: icmp_seq=1 ttl=64 time=0.138 ms
^C
--- 192.168.12.22 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.138/0.138/0.138/0.000 ms
 
Last edited:
oh yes,
I forgot to list:
Code:
root@prox-e:~# pvecm status
Quorum information
------------------
Date:             Wed Mar 14 21:31:51 2018
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000004
Ring ID:          1/4988
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.4.12.21
0x00000003          1 10.4.12.22
0x00000004          1 10.4.12.24 (local)
0x00000002          1 10.4.12.25

I have a separate Corosync network.
 
Also
Code:
prox-e:~# pveversion -v
proxmox-ve: 4.4-107 (running kernel: 4.4.98-6-pve)
pve-manager: 4.4-22 (running version: 4.4-22/2728f613)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.98-6-pve: 4.4.98-107
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-54
qemu-server: 4.0-115
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.1-9~pve4
pve-container: 1.0-104
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
ceph: 10.2.10-1~bpo80+1
 
please double check that the ceph-mon package is actually installed..

the output of "ceph versions" would also be interesting and the last few entries from /var/apt/history.log pertaining to the Ceph upgrade might also help shed some light onto your issue.
 
As you can see in the log. I upgraded to the latest Proxmox 4.4 with a working CEPH cluster on 2018-03-11. I made sure the PVE Cluster was healthy, quorum and running fine with the latest packages.

Then I proceeded to run the Hammer to Jewel upgrade on 2018-03-13

The only reason I was doing any of this was to add an additional CEPH host which would have been Proxmox 5.1. So I wanted to update my working systems to the latest. Other than that, I didn't touch this system. It was working for a long while, so I left it be.

Code:
prox-e:~# cat /var/log/apt/history.log

Start-Date: 2018-03-11  18:01:31
Commandline: apt-get dist-upgrade
Install: pve-kernel-4.4.98-6-pve:amd64 (4.4.98-107, automatic)
Upgrade: bind9-host:amd64 (9.9.5.dfsg-9+deb8u13, 9.9.5.dfsg-9+deb8u15), libio-socket-ssl-perl:amd64 (2.002-2+deb8u2, 2.002-2+deb8u3), libdbi1:amd64 (0.9.0-4, 0.9.0-4+deb8u1), libdb5.3:amd64 (5.3.28-9, 5.3.28-9+deb8u1), liblwres90:amd64 (9.9.5.dfsg-9+deb8u13, 9.9.5.dfsg-9+deb8u15), ncurses-term:amd64 (5.9+20140913-1, 5.9+20140913-1+deb8u2), libtinfo5:amd64 (5.9+20140913-1+b1, 5.9+20140913-1+deb8u2), libquadmath0:amd64 (4.9.2-10, 4.9.2-10+deb8u1), libpve-common-perl:amd64 (4.0-95, 4.0-96), libxfixes3:amd64 (5.0.1-2+b2, 5.0.1-2+deb8u1), libdns100:amd64 (9.9.5.dfsg-9+deb8u13, 9.9.5.dfsg-9+deb8u15), perl:amd64 (5.20.2-3+deb8u8, 5.20.2-3+deb8u9), libisc-export95:amd64 (9.9.5.dfsg-9+deb8u13, 9.9.5.dfsg-9+deb8u15), libisccfg90:amd64 (9.9.5.dfsg-9+deb8u13, 9.9.5.dfsg-9+deb8u15), libssl1.0.0:amd64 (1.0.1t-1+deb8u6, 1.0.1t-1+deb8u7), perl-base:amd64 (5.20.2-3+deb8u8, 5.20.2-3+deb8u9), isc-dhcp-common:amd64 (4.3.1-6+deb8u2, 4.3.1-6+deb8u3), libbind9-90:amd64 (9.9.5.dfsg-9+deb8u13, 9.9.5.dfsg-9+deb8u15), tcpdump:amd64 (4.9.0-1~deb8u1, 4.9.2-1~deb8u1), pve-manager:amd64 (4.4-15, 4.4-22), libgssapi-krb5-2:amd64 (1.12.1+dfsg-19+deb8u2, 1.12.1+dfsg-19+deb8u4), libx11-xcb1:amd64 (1.6.2-3, 1.6.2-3+deb8u1), cpp-4.9:amd64 (4.9.2-10, 4.9.2-10+deb8u1), libxi6:amd64 (1.7.4-1+b2, 1.7.4-1+deb8u1), libicu52:amd64 (52.1-8+deb8u5, 52.1-8+deb8u6), libncurses5:amd64 (5.9+20140913-1+b1, 5.9+20140913-1+deb8u2), qemu-server:amd64 (4.0-110, 4.0-115), libxcursor1:amd64 (1.1.14-1+b1, 1.1.14-1+deb8u1), libkrb5-3:amd64 (1.12.1+dfsg-19+deb8u2, 1.12.1+dfsg-19+deb8u4), libgdk-pixbuf2.0-common:amd64 (2.31.1-2+deb8u5, 2.31.1-2+deb8u7), libcups2:amd64 (1.7.5-11+deb8u1, 1.7.5-11+deb8u2), sensible-utils:amd64 (0.0.9, 0.0.9+deb8u1), libtiff5:amd64 (4.0.3-12.3+deb8u4, 4.0.3-12.3+deb8u5), sudo:amd64 (1.8.10p3-1+deb8u4, 1.8.10p3-1+deb8u5), libx11-data:amd64 (1.6.2-3, 1.6.2-3+deb8u1), libkadm5clnt-mit9:amd64 (1.12.1+dfsg-19+deb8u2, 1.12.1+dfsg-19+deb8u4), openssh-server:amd64 (6.7p1-5+deb8u3, 6.7p1-5+deb8u4), libncursesw5:amd64 (5.9+20140913-1+b1, 5.9+20140913-1+deb8u2), ncurses-bin:amd64 (5.9+20140913-1+b1, 5.9+20140913-1+deb8u2), libgcc1:amd64 (4.9.2-10, 4.9.2-10+deb8u1), procmail:amd64 (3.22-24, 3.22-24+deb8u1), libxtst6:amd64 (1.2.2-1+b1, 1.2.2-1+deb8u1), openssh-sftp-server:amd64 (6.7p1-5+deb8u3, 6.7p1-5+deb8u4), dnsutils:amd64 (9.9.5.dfsg-9+deb8u13, 9.9.5.dfsg-9+deb8u15), base-files:amd64 (8+deb8u9, 8+deb8u10), gnupg:amd64 (1.4.18-7+deb8u3, 1.4.18-7+deb8u4), libxml-libxml-perl:amd64 (2.0116+dfsg-1+deb8u1, 2.0116+dfsg-1+deb8u2), pve-cluster:amd64 (4.0-52, 4.0-54), ssh:amd64 (6.7p1-5+deb8u3, 6.7p1-5+deb8u4), krb5-locales:amd64 (1.12.1+dfsg-19+deb8u2, 1.12.1+dfsg-19+deb8u4), libxml2:amd64 (2.9.1+dfsg1-5+deb8u4, 2.9.1+dfsg1-5+deb8u6), pve-qemu-kvm:amd64 (2.7.1-4, 2.9.1-9~pve4), perl-modules:amd64 (5.20.2-3+deb8u8, 5.20.2-3+deb8u9), pve-container:amd64 (1.0-101, 1.0-104), samba-libs:amd64 (4.2.14+dfsg-0+deb8u7+b1, 4.2.14+dfsg-0+deb8u9), openssh-client:amd64 (6.7p1-5+deb8u3, 6.7p1-5+deb8u4), wget:amd64 (1.16-1+deb8u2, 1.16-1+deb8u4), libkrad0:amd64 (1.12.1+dfsg-19+deb8u2, 1.12.1+dfsg-19+deb8u4), isc-dhcp-client:amd64 (4.3.1-6+deb8u2, 4.3.1-6+deb8u3), libnss3:amd64 (3.26-1+debu8u2, 3.26-1+debu8u3), smbclient:amd64 (4.2.14+dfsg-0+deb8u7+b1, 4.2.14+dfsg-0+deb8u9), gpgv:amd64 (1.4.18-7+deb8u3, 1.4.18-7+deb8u4), libgfortran3:amd64 (4.9.2-10, 4.9.2-10+deb8u1), proxmox-ve:amd64 (4.4-92, 4.4-107), libgdk-pixbuf2.0-0:amd64 (2.31.1-2+deb8u5, 2.31.1-2+deb8u7), tzdata:amd64 (2017b-0+deb8u1, 2017c-0+deb8u1), openssl:amd64 (1.0.1t-1+deb8u6, 1.0.1t-1+deb8u7), libwbclient0:amd64 (4.2.14+dfsg-0+deb8u7+b1, 4.2.14+dfsg-0+deb8u9), rsync:amd64 (3.1.1-3, 3.1.1-3+deb8u1), libkdb5-7:amd64 (1.12.1+dfsg-19+deb8u2, 1.12.1+dfsg-19+deb8u4), libisccfg-export90:amd64 (9.9.5.dfsg-9+deb8u13, 9.9.5.dfsg-9+deb8u15), samba-common:amd64 (4.2.14+dfsg-0+deb8u7, 4.2.14+dfsg-0+deb8u9), libdns-export100:amd64 (9.9.5.dfsg-9+deb8u13, 9.9.5.dfsg-9+deb8u15), libgssrpc4:amd64 (1.12.1+dfsg-19+deb8u2, 1.12.1+dfsg-19+deb8u4), libstdc++6:amd64 (4.9.2-10, 4.9.2-10+deb8u1), libx11-6:amd64 (1.6.2-3, 1.6.2-3+deb8u1), libxrandr2:amd64 (1.4.2-1+b1, 1.4.2-1+deb8u1), libkadm5srv-mit9:amd64 (1.12.1+dfsg-19+deb8u2, 1.12.1+dfsg-19+deb8u4), libirs-export91:amd64 (9.9.5.dfsg-9+deb8u13, 9.9.5.dfsg-9+deb8u15), libkrb5support0:amd64 (1.12.1+dfsg-19+deb8u2, 1.12.1+dfsg-19+deb8u4), libk5crypto3:amd64 (1.12.1+dfsg-19+deb8u2, 1.12.1+dfsg-19+deb8u4), ncurses-base:amd64 (5.9+20140913-1, 5.9+20140913-1+deb8u2), libisccc90:amd64 (9.9.5.dfsg-9+deb8u13, 9.9.5.dfsg-9+deb8u15), libsmbclient:amd64 (4.2.14+dfsg-0+deb8u7+b1, 4.2.14+dfsg-0+deb8u9), libisc95:amd64 (9.9.5.dfsg-9+deb8u13, 9.9.5.dfsg-9+deb8u15), gcc-4.9-base:amd64 (4.9.2-10, 4.9.2-10+deb8u1), libcurl3-gnutls:amd64 (7.38.0-4+deb8u5, 7.38.0-4+deb8u9)
End-Date: 2018-03-11  18:08:20

Start-Date: 2018-03-13  16:57:47
Commandline: apt-get dist-upgrade
Install: ceph-fuse:amd64 (10.2.10-1~bpo80+1, automatic), libopts25:amd64 (5.18.4-3, automatic), libradosstriper1:amd64 (10.2.10-1~bpo80+1, automatic), libboost-random1.55.0:amd64 (1.55.0+dfsg-3, automatic), ceph-base:amd64 (10.2.10-1~bpo80+1, automatic), ceph-osd:amd64 (10.2.10-1~bpo80+1, automatic), ntp:amd64 (4.2.6.p5+dfsg-7+deb8u2, automatic), ceph-fs-common:amd64 (10.2.10-1~bpo80+1, automatic), libfcgi0ldbl:amd64 (2.4.0-8.3, automatic), ceph-mds:amd64 (10.2.10-1~bpo80+1, automatic), librgw2:amd64 (10.2.10-1~bpo80+1, automatic), ceph-mon:amd64 (10.2.10-1~bpo80+1, automatic), libboost-regex1.55.0:amd64 (1.55.0+dfsg-3, automatic)

Upgrade: python-cephfs:amd64 (0.94.10-1~bpo80+1, 10.2.10-1~bpo80+1), librbd1:amd64 (0.94.10-1~bpo80+1, 10.2.10-1~bpo80+1), librados2:amd64 (0.94.10-1~bpo80+1, 10.2.10-1~bpo80+1), python-ceph:amd64 (0.94.10-1~bpo80+1, 10.2.10-1~bpo80+1), ceph:amd64 (0.94.10-1~bpo80+1, 10.2.10-1~bpo80+1), ceph-common:amd64 (0.94.10-1~bpo80+1, 10.2.10-1~bpo80+1), python-rbd:amd64 (0.94.10-1~bpo80+1, 10.2.10-1~bpo80+1), libcephfs1:amd64 (0.94.10-1~bpo80+1, 10.2.10-1~bpo80+1), python-rados:amd64 (0.94.10-1~bpo80+1, 10.2.10-1~bpo80+1)
End-Date: 2018-03-13  16:59:08

Start-Date: 2018-03-14  16:29:26
Commandline: apt-get dist-upgrade
Upgrade: libcurl3-gnutls:amd64 (7.38.0-4+deb8u9, 7.38.0-4+deb8u10)
End-Date: 2018-03-14  16:29:35

Start-Date: 2018-03-14  20:28:09
Commandline: apt-get install ethtool
Install: ethtool:amd64 (3.16-1)
End-Date: 2018-03-14  20:28:19
I am not able to run "ceph versions"

Code:
root@prox-e:~# ceph versions
2018-03-15 06:38:12.876841 7fe9f84d1700  0 -- :/1328948860 >> 192.168.12.21:6789/0 pipe(0x7fe9f405fa90 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fe9f405d520).fault
2018-03-15 06:38:15.877014 7fe9f83d0700  0 -- :/1328948860 >> 192.168.12.22:6789/0 pipe(0x7fe9e8000c80 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fe9e8001f90).fault
 
Also, I tried to list ALL of the packages that I have installed but it was too long. Below that is all of the packages using grep. I don't show any conflicting ceph packages.

The main reason this is so weird is because it happened to all 4 of my nodes. Which makes me think that it's a bug.

Code:
root@prox-e:~# dpkg-query -l | grep ceph
ii  ceph                                 10.2.10-1~bpo80+1                  amd64        distributed storage and file system
ii  ceph-base                            10.2.10-1~bpo80+1                  amd64        common ceph daemon libraries and management tools
ii  ceph-common                          10.2.10-1~bpo80+1                  amd64        common utilities to mount and interact with a ceph storage cluster
ii  ceph-fs-common                       10.2.10-1~bpo80+1                  amd64        common utilities to mount and interact with a ceph file system
ii  ceph-fuse                            10.2.10-1~bpo80+1                  amd64        FUSE-based client for the Ceph distributed file system
ii  ceph-mds                             10.2.10-1~bpo80+1                  amd64        metadata server for the ceph distributed file system
ii  ceph-mon                             10.2.10-1~bpo80+1                  amd64        monitor server for the ceph storage system
ii  ceph-osd                             10.2.10-1~bpo80+1                  amd64        OSD server for the ceph storage system
ii  libcephfs1                           10.2.10-1~bpo80+1                  amd64        Ceph distributed file system client library
ii  python-ceph                          10.2.10-1~bpo80+1                  amd64        Meta-package for python libraries for the Ceph libraries
ii  python-cephfs                        10.2.10-1~bpo80+1                  amd64        Python libraries for the Ceph libcephfs library

root@prox-e:~# dpkg-query -l | grep pve
ii  corosync-pve                         2.4.2-2~pve4+1                     amd64        Standards-based cluster framework (daemon and modules)
ii  dmeventd                             2:1.02.93-pve3                     amd64        Linux Kernel Device Mapper event daemon
ii  dmsetup                              2:1.02.93-pve3                     amd64        Linux Kernel Device Mapper userspace library
ii  grub-common                          2.02-pve5                          amd64        GRand Unified Bootloader (common files)
ii  grub-efi-amd64-bin                   2.02-pve5                          amd64        GRand Unified Bootloader, version 2 (EFI-AMD64 binaries)
ii  grub-efi-ia32-bin                    2.02-pve5                          amd64        GRand Unified Bootloader, version 2 (EFI-IA32 binaries)
ii  grub-pc                              2.02-pve5                          amd64        GRand Unified Bootloader, version 2 (PC/BIOS version)
ii  grub-pc-bin                          2.02-pve5                          amd64        GRand Unified Bootloader, version 2 (PC/BIOS binaries)
ii  grub2-common                         2.02-pve5                          amd64        GRand Unified Bootloader (common files for version 2)
ii  libcorosync4-pve                     2.4.2-2~pve4+1                     amd64        Standards-based cluster framework (libraries)
ii  libdevmapper-event1.02.1:amd64       2:1.02.93-pve3                     amd64        Linux Kernel Device Mapper event support library
ii  libdevmapper1.02.1:amd64             2:1.02.93-pve3                     amd64        Linux Kernel Device Mapper userspace library
ii  liblvm2app2.2:amd64                  2.02.116-pve3                      amd64        LVM2 application library
ii  liblvm2cmd2.02:amd64                 2.02.116-pve3                      amd64        LVM2 command library
ii  libnvpair1linux                      0.6.5.9-pve15~bpo80                amd64        Solaris name-value library for Linux
ii  libpve-access-control                4.0-23                             amd64        Proxmox VE access control library
ii  libpve-common-perl                   4.0-96                             all          Proxmox VE base library
ii  libpve-guest-common-perl             1.0-2                              all          Proxmox VE common guest-related modules
ii  libpve-http-server-perl              1.0-4                              all          Proxmox Asynchrounous HTTP Server Implementation
ii  libpve-storage-perl                  4.0-76                             all          Proxmox VE storage management library
ii  libuutil1linux                       0.6.5.9-pve15~bpo80                amd64        Solaris userland utility library for Linux
ii  libzfs2linux                         0.6.5.9-pve15~bpo80                amd64        OpenZFS filesystem library for Linux
ii  libzpool2linux                       0.6.5.9-pve15~bpo80                amd64        OpenZFS pool library for Linux
ii  lvm2                                 2.02.116-pve3                      amd64        Linux Logical Volume Manager
ii  lxc-pve                              2.0.7-4                            amd64        Linux containers usersapce tools
ii  lxcfs                                2.0.6-pve1                         amd64        LXC userspace filesystem
ii  novnc-pve                            0.5-9                              amd64        HTML5 VNC client
ii  pve-cluster                          4.0-54                             amd64        Cluster Infrastructure for Proxmox Virtual Environment
ii  pve-container                        1.0-104                            all          Proxmox VE Container management tool
ii  pve-docs                             4.4-4                              all          Proxmox VE Documentation
ii  pve-firewall                         2.0-33                             amd64        Proxmox VE Firewall
ii  pve-firmware                         1.1-11                             all          Binary firmware code for the pve-kernel
ii  pve-ha-manager                       1.0-41                             amd64        Proxmox VE HA Manager
ii  pve-kernel-4.4.35-1-pve              4.4.35-77                          amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-4.4.44-1-pve              4.4.44-84                          amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-4.4.59-1-pve              4.4.59-87                          amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-4.4.67-1-pve              4.4.67-92                          amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-4.4.98-6-pve              4.4.98-107                         amd64        The Proxmox PVE Kernel Image
ii  pve-libspice-server1                 0.12.8-2                           amd64        SPICE remote display system server library
ii  pve-manager                          4.4-22                             amd64        The Proxmox Virtual Environment
ii  pve-qemu-kvm                         2.9.1-9~pve4                       amd64        Full virtualization on x86 hardware
ii  smartmontools                        6.5+svn4324-1~pve80                amd64        control and monitor storage systems using S.M.A.R.T.
ii  spl                                  0.6.5.9-pve8~bpo80                 amd64        Solaris Porting Layer user-space utilities for Linux
ii  tar                                  1.27.1+pve.3                       amd64        GNU version of the tar archiving utility
ii  zfs-initramfs                        0.6.5.9-pve15~bpo80                all          OpenZFS root filesystem capabilities for Linux - initramfs
ii  zfsutils                             0.6.5.9-pve15~bpo80                all          transitional package
ii  zfsutils-linux                       0.6.5.9-pve15~bpo80                amd64        command-line tools to manage OpenZFS filesystems

root@prox-e:~# dpkg-query -l | grep Prox
ii  libpve-access-control                4.0-23                             amd64        Proxmox VE access control library
ii  libpve-common-perl                   4.0-96                             all          Proxmox VE base library
ii  libpve-guest-common-perl             1.0-2                              all          Proxmox VE common guest-related modules
ii  libpve-http-server-perl              1.0-4                              all          Proxmox Asynchrounous HTTP Server Implementation
ii  libpve-storage-perl                  4.0-76                             all          Proxmox VE storage management library
ii  proxmox-ve                           4.4-107                            all          The Proxmox Virtual Environment
ii  pve-cluster                          4.0-54                             amd64        Cluster Infrastructure for Proxmox Virtual Environment
ii  pve-container                        1.0-104                            all          Proxmox VE Container management tool
ii  pve-docs                             4.4-4                              all          Proxmox VE Documentation
ii  pve-firewall                         2.0-33                             amd64        Proxmox VE Firewall
ii  pve-ha-manager                       1.0-41                             amd64        Proxmox VE HA Manager
ii  pve-kernel-4.4.35-1-pve              4.4.35-77                          amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-4.4.44-1-pve              4.4.44-84                          amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-4.4.59-1-pve              4.4.59-87                          amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-4.4.67-1-pve              4.4.67-92                          amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-4.4.98-6-pve              4.4.98-107                         amd64        The Proxmox PVE Kernel Image
ii  pve-manager                          4.4-22                             amd64        The Proxmox Virtual Environment
 
I thought I was missing something, so I tried to use pveceph, but no avail.

Code:
root@prox-e:~# pveceph createmon
monitor address '192.168.12.24:6789' already in use by 'mon.3'

root@prox-e:~# pveceph install -version jewel
download and import ceph repository keys
update available package list
Reading package lists... Done
Building dependency tree
Reading state information... Done
gdisk is already the newest version.
ceph is already the newest version.
ceph-common is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Could this be a network issue? Nothing has changed with my network, but I have no idea what is going on.
Code:
prox-c:~# journalctl |grep ceph
Mar 14 16:26:37 prox-c systemd[1]: Starting ceph target allowing to start/stop all ceph*@.service instances at once.
Mar 14 16:26:37 prox-c systemd[1]: Reached target ceph target allowing to start/stop all ceph*@.service instances at once.
Mar 14 16:26:37 prox-c systemd[1]: Starting ceph target allowing to start/stop all ceph-mds@.service instances at once.
Mar 14 16:26:37 prox-c systemd[1]: Reached target ceph target allowing to start/stop all ceph-mds@.service instances at once.
Mar 14 16:26:37 prox-c systemd[1]: Starting ceph target allowing to start/stop all ceph-mon@.service instances at once.
Mar 14 16:26:37 prox-c systemd[1]: Reached target ceph target allowing to start/stop all ceph-mon@.service instances at once.
Mar 14 16:26:37 prox-c systemd[1]: Starting ceph target allowing to start/stop all ceph-osd@.service instances at once.
Mar 14 16:26:37 prox-c systemd[1]: Reached target ceph target allowing to start/stop all ceph-osd@.service instances at once.
Mar 14 16:27:03 prox-c kernel: Key type ceph registered
Mar 14 16:27:03 prox-c kernel: libceph: loaded (mon/osd proto 15/24)
Mar 14 16:27:03 prox-c kernel: libceph: mon2 192.168.12.22:6789 socket error on write
Mar 14 16:27:14 prox-c kernel: libceph: mon1 192.168.12.21:6789 socket closed (con state CONNECTING)
Mar 14 16:27:24 prox-c kernel: libceph: mon0 192.168.12.25:6789 socket closed (con state CONNECTING)
Mar 14 16:27:34 prox-c kernel: libceph: mon0 192.168.12.25:6789 socket closed (con state CONNECTING)
Mar 14 16:27:44 prox-c kernel: libceph: mon1 192.168.12.21:6789 socket closed (con state CONNECTING)
Mar 14 16:27:54 prox-c kernel: libceph: mon0 192.168.12.25:6789 socket closed (con state CONNECTING)
Mar 14 16:30:39 prox-c kernel: libceph: mon2 192.168.12.22:6789 socket error on write
 
https://pve.proxmox.com/wiki/Ceph_Hammer_to_Jewel
Code:
Start the daemon
To ensure that Ceph startup in the correct order you should do the following steps.

cp /usr/share/doc/pve-manager/examples/ceph.service /etc/systemd/system/ceph.service
systemctl daemon-reload
systemctl enable ceph.service
The first daemon which we start is the Monitor, but from now on we use systemd.

systemctl start ceph-mon@<MON-ID>.service
systemctl enable ceph-mon@<MON-ID>.service
Did you do this step also?
 
Yes, but I did not do:
Code:
systemctl start ceph-mon@<MON-ID>.service
systemctl enable ceph-mon@<MON-ID>.service

The ceph-mon@<MON-ID>.service was not there to be started.
Tab completion did not work, and manually setting ceph-mon@3.service did not work.

Code:
prox-e:~# systemctl start ceph-mon.3.service
Failed to start ceph-mon.3.service: Unit ceph-mon.3.service failed to load: No such file or directory.
 
systemctl start ceph-mon.3.service
The ceph-mon service has not been enabled and yes it doesn't exist at first.

Try a:
Code:
systemctl start ceph-mon@3.service
systemctl enable ceph-mon@3.service
Or a:
Code:
systemctl start ceph-mon@prox-e.service
systemctl enable ceph-mon@prox-e.service
 
well this is new...
Code:
prox-e:~# journalctl -u ceph-osd@34.service
-- Logs begin at Wed 2018-03-14 20:18:57 CDT, end at Thu 2018-03-15 07:58:43 CDT. --
Mar 14 20:19:22 prox-e systemd[1]: Starting Ceph object storage daemon...
Mar 14 20:19:25 prox-e ceph-osd-prestart.sh[3265]: 2018-03-14 20:19:25.314261 7f193d75b700 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-34/keyring: (2) No such fil
Mar 14 20:19:25 prox-e ceph-osd-prestart.sh[3265]: 2018-03-14 20:19:25.314315 7f193d75b700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
Mar 14 20:19:25 prox-e ceph-osd-prestart.sh[3265]: 2018-03-14 20:19:25.314319 7f193d75b700  0 librados: osd.34 initialization error (2) No such file or directory
Mar 14 20:19:25 prox-e ceph-osd-prestart.sh[3265]: Error connecting to cluster: ObjectNotFound
Mar 14 20:19:25 prox-e systemd[1]: Started Ceph object storage daemon.
Mar 14 20:19:26 prox-e ceph-osd[3820]: 2018-03-14 20:19:26.386364 7fb989287800 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-34: (2) No such file or direc
Mar 14 20:19:26 prox-e systemd[1]: ceph-osd@34.service: main process exited, code=exited, status=1/FAILURE
Mar 14 20:19:26 prox-e systemd[1]: Unit ceph-osd@34.service entered failed state.
Mar 14 20:19:26 prox-e systemd[1]: [/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Mar 14 20:19:26 prox-e systemd[1]: [/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Mar 14 20:19:26 prox-e systemd[1]: [/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Mar 14 20:19:45 prox-e systemd[1]: ceph-osd@34.service holdoff time over, scheduling restart.
Mar 14 20:19:45 prox-e systemd[1]: Stopping Ceph object storage daemon...

Double checking permissions:
Code:
prox-e:~# ls -halt /var/lib/ceph/osd/ceph-34
total 69K
-rw-r--r--   1 ceph ceph   0 Mar 14 20:39 systemd
drwxr-xr-x   3 ceph ceph 217 Mar 14 16:15 .
drwxr-xr-x 199 ceph ceph 12K Jan 22 10:54 current
drwxr-xr-x  20 ceph ceph  20 Mar 28  2017 ..
-rw-r--r--   1 ceph ceph   3 Mar 28  2017 active
-rw-------   1 ceph ceph  57 Mar 28  2017 keyring
-rw-r--r--   1 ceph ceph   6 Mar 28  2017 ready
-rw-r--r--   1 ceph ceph  53 Mar 28  2017 superblock
-rw-r--r--   1 ceph ceph   4 Mar 28  2017 store_version
-rw-r--r--   1 ceph ceph 610 Mar 28  2017 activate.monmap
-rw-r--r--   1 ceph ceph   3 Mar 28  2017 whoami
-rw-r--r--   1 ceph ceph  21 Mar 28  2017 magic
-rw-r--r--   1 ceph ceph  37 Mar 28  2017 journal_uuid
-rw-r--r--   1 ceph ceph  37 Mar 28  2017 fsid
-rw-r--r--   1 ceph ceph  37 Mar 28  2017 ceph_fsid
lrwxrwxrwx   1 ceph ceph  58 Mar 28  2017 journal -> /dev/disk/by-partuuid/928eb448-90ba-417f-9e2f-82b6e58141eb

Code:
root@prox-e:~# readlink -f /var/lib/ceph/osd/ceph-34/journal
/dev/sdk2

root@prox-e:~# ls -halt /dev/sdk2
brw-rw---- 1 ceph ceph 8, 162 Mar 14 20:18 /dev/sdk2

I must have missed a step somewhere.
Should permissions be "ceph:ceph" or "ceph:root"?
 
Is the monitor running now? Does the cluster have a quorum?

Does the directory '/var/lib/ceph/osd/ceph-34' and the file keyring exist?
 
OH THANK GOD!

Code:
systemctl enable ceph-mon@3.service

That seems to have worked for some reason. Then I was able to start the service.
So maybe the wiki should reverse those. Also, I was under the impression that the service would be listed somewhere in systemd. So after starting manually did not work, I tried "tab completion" in hopes that it would be somewhere

For anyone else having similar problems, you have to manually enable service according to the proper monitor id.
I found my monitor ID by using "pveceph createmon" which said this monitor was already created at mon.<MON-ID>

My CEPH is operational right now. Thanks everyone!
 
OH THANK GOD!

Code:
systemctl enable ceph-mon@3.service

That seems to have worked for some reason. Then I was able to start the service.
So maybe the wiki should reverse those. Also, I was under the impression that the service would be listed somewhere in systemd. So after starting manually did not work, I tried "tab completion" in hopes that it would be somewhere


the ceph-mon@, ceph-osd@, ceph-mgr@ (needed for Luminous and later) and ceph-mds@ (not needed for PVE unless you manually setup CephFS) are so called template units. the part after the @ is the ID of the respective instance. so only the template exists after upgrading, and naturally systemd does not know about any ceph IDs so there can't be tab completion there. once you have enabled the instance (e.g., ceph-mon@1), systemd knows about it and tab completes it for start or stop actions and the like. the ceph-osd service instances get runtime-enabled and started automatically (via udev/ceph-disk), so only the monitors (and managers for Luminous) need to be enabled and started manually, once.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!