After update of Nov 14 - monitors fail to start

Sep 11, 2019
26
1
8
54
I ran the updates which installed a new kernel. after the reboot the monitor did not start. Attempted to start from command line:

systemctl status ceph-mon@proxp01.service
ceph-mon@proxp01.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Fri 2019-11-15 11:57:55 EST; 47s ago
Process: 11469 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id proxp01 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 11469 (code=exited, status=1/FAILURE)

Nov 15 11:57:55 putsproxp01 systemd[1]: ceph-mon@proxp01.service: Service RestartSec=10s expired, scheduling restart.
Nov 15 11:57:55 putsproxp01 systemd[1]: ceph-mon@proxp01.service: Scheduled restart job, restart counter is at 5.
Nov 15 11:57:55 putsproxp01 systemd[1]: Stopped Ceph cluster monitor daemon.
Nov 15 11:57:55 putsproxp01 systemd[1]: ceph-mon@proxp01.service: Start request repeated too quickly.
Nov 15 11:57:55 putsproxp01 systemd[1]: ceph-mon@proxp01.service: Failed with result 'exit-code'.
Nov 15 11:57:55 putsproxp01 systemd[1]: Failed to start Ceph cluster monitor daemon.

the log tail looks like this:

2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: Options.compaction_readahead_size: 0
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: Compression algorithms supported:
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kZSTDNotFinalCompression supported: 0
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kZSTD supported: 0
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kXpressCompression supported: 0
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kLZ4HCCompression supported: 1
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kLZ4Compression supported: 1
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kBZip2Compression supported: 0
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kZlibCompression supported: 1
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: kSnappyCompression supported: 1
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: Fast CRC32 supported: Supported on x86
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work
2019-11-15 11:57:45.560 7f5f099fc440 4 rocksdb: [db/db_impl.cc:563] Shutdown complete
2019-11-15 11:57:45.560 7f5f099fc440 -1 rocksdb: IO error: while open a file for lock: /var/lib/ceph/mon/ceph-proxp01/store.db/LOCK: Permission denied
2019-11-15 11:57:45.560 7f5f099fc440 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-proxp01': (22) Invalid argument

I have 6 nodes, they are all doing the same thing.
 
Did you tried to start it as root directly once?

What does
Code:
ls -al /var/lib/ceph/mon/*
find /var/lib/ceph/ ! -user ceph

outputs? Especially the second one should only output the bootstrap-osd ceph keyring, and maybe the crash directory.
Are all, especially the files ceph tries to access in the logs, owend by the "ceph" user?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!