Got an older 7x node 3.4 testlab (running Ceph Hammer 0.94.9 on 4x of the nodes and only VMs on 3x nodes), which we wanted to patch up today, but after rebooting our OSD won't start, seems ceph can't connect to ceph cluster. Wondering why that might be?
Previous version before patching:
Version after patching:
Ceph status, monitor starts but none OSD starts
Attempt to start OSDs fails due to timeout
Like cli ceph can't connect to cluster, but what would ceph do to cnx w/cluster open an unix or tcp socket to what/where?
Previous version before patching:
root@node2:~# pveversion -verbose
proxmox-ve-2.6.32: 3.4-177 (running kernel: 2.6.32-46-pve)
pve-manager: 3.4-15 (running version: 3.4-15/e1daa307)
pve-kernel-2.6.32-45-pve: 2.6.32-174
pve-kernel-2.6.32-46-pve: 2.6.32-177
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-20
qemu-server: 3.4-9
pve-firmware: 1.1-5
libpve-common-perl: 3.0-27
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-35
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-27
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
Version after patching:
root@node1:~# pveversion
pve-manager/3.4-16/40ccc11c (running kernel: 2.6.32-48-pve)
root@node1:~# pveversion -verbose
proxmox-ve-2.6.32: 3.4-187 (running kernel: 2.6.32-48-pve)
pve-manager: 3.4-16 (running version: 3.4-16/40ccc11c)
pve-kernel-2.6.32-48-pve: 2.6.32-187
pve-kernel-2.6.32-46-pve: 2.6.32-177
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-20
qemu-server: 3.4-9
pve-firmware: 1.1-6
libpve-common-perl: 3.0-27
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-35
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-28
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
Ceph status, monitor starts but none OSD starts
root@node1:~# /etc/init.d/ceph status
=== osd.7 ===
osd.7: not running.
=== osd.4 ===
osd.4: not running.
=== osd.16 ===
osd.16: not running.
=== osd.5 ===
osd.5: not running.
=== osd.17 ===
osd.17: not running.
=== osd.6 ===
osd.6: not running.
=== mon.2 ===
mon.2: running {"version":"0.94.9"}
Attempt to start OSDs fails due to timeout
root@node1:~# /etc/init.d/ceph start osd.4
=== osd.4 ===
failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.4 --keyring=/var/lib/ceph/osd/ceph-4/keyring osd crush create-or-move -- 4 0.13 host=node1 root=default'
Like cli ceph can't connect to cluster, but what would ceph do to cnx w/cluster open an unix or tcp socket to what/where?