After update of Nov 14 - monitors fail to start

PutnamCountyIT · Nov 15, 2019

I ran the updates which installed a new kernel. after the reboot the monitor did not start. Attempted to start from command line:

systemctl status ceph-mon@proxp01.service
● ceph-mon@proxp01.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Fri 2019-11-15 11:57:55 EST; 47s ago
Process: 11469 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id proxp01 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 11469 (code=exited, status=1/FAILURE)

Nov 15 11:57:55 putsproxp01 systemd[1]: ceph-mon@proxp01.service: Service RestartSec=10s expired, scheduling restart.
Nov 15 11:57:55 putsproxp01 systemd[1]: ceph-mon@proxp01.service: Scheduled restart job, restart counter is at 5.
Nov 15 11:57:55 putsproxp01 systemd[1]: Stopped Ceph cluster monitor daemon.
Nov 15 11:57:55 putsproxp01 systemd[1]: ceph-mon@proxp01.service: Start request repeated too quickly.
Nov 15 11:57:55 putsproxp01 systemd[1]: ceph-mon@proxp01.service: Failed with result 'exit-code'.
Nov 15 11:57:55 putsproxp01 systemd[1]: Failed to start Ceph cluster monitor daemon.

the log tail looks like this:

2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: Options.compaction_readahead_size: 0
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: Compression algorithms supported:
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kZSTDNotFinalCompression supported: 0
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kZSTD supported: 0
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kXpressCompression supported: 0
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kLZ4HCCompression supported: 1
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kLZ4Compression supported: 1
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kBZip2Compression supported: 0
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kZlibCompression supported: 1
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kSnappyCompression supported: 1
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: Fast CRC32 supported: Supported on x86
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: [db/db_impl.cc:563] Shutdown complete
2019-11-15 11:57:45.560 7f5f099fc440 -1 rocksdb: IO error: while open a file for lock: /var/lib/ceph/mon/ceph-proxp01/store.db/LOCK: Permission denied
2019-11-15 11:57:45.560 7f5f099fc440 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-proxp01': (22) Invalid argument

I have 6 nodes, they are all doing the same thing.

t.lamprecht · Nov 16, 2019

Did you tried to start it as root directly once?

What does

Code:

ls -al /var/lib/ceph/mon/*
find /var/lib/ceph/ ! -user ceph

outputs? Especially the second one should only output the bootstrap-osd ceph keyring, and maybe the crash directory.
Are all, especially the files ceph tries to access in the logs, owend by the "ceph" user?

PutnamCountyIT · Nov 18, 2019

Thank you for the tip...
the /var/lib/ceph/mon and osd files and directories where not owned by ceph. a quick chown -R ceph:ceph mon osd fixed that.
Also noticed that timesyncd was not configured on all nodes. see https://pve.proxmox.com/wiki/Time_Synchronization if your nodes are having time sync issue.

Search

Search

After update of Nov 14 - monitors fail to start

PutnamCountyIT

Member

t.lamprecht

Proxmox Staff Member

PutnamCountyIT

Member