Thank you spirit! Work around fixed the problem and system is up and running. So the conclusion was right! Super.
I just wanted to share that this post saved my butt. Thank you tmolberg and everyone else!For what its worth, I had a similar issue which presented itself with;
Code:Jul 15 00:28:53 sh-prox02 systemd[1]: ceph-osd@2.service: Scheduled restart job, restart counter is at 6. Jul 15 00:28:53 sh-prox02 systemd[1]: Stopped Ceph object storage daemon osd.2. Jul 15 00:28:53 sh-prox02 systemd[1]: ceph-osd@2.service: Consumed 14.647s CPU time. Jul 15 00:28:53 sh-prox02 systemd[1]: ceph-osd@2.service: Start request repeated too quickly. Jul 15 00:28:53 sh-prox02 systemd[1]: ceph-osd@2.service: Failed with result 'signal'. Jul 15 00:28:53 sh-prox02 systemd[1]: Failed to start Ceph object storage daemon osd.2.
Digging through the logs revealed;
Code:bluefs _allocate allocation failed, needed 0x400000
Looking at this bugreport there were some comments stating that changing bluestore_allocator from hybrid over to bitmap seem to triage the issue.
For my part I've set bluestore_allocator to bitmap on my pure nvme osds which allowed the osds to start.
https://tracker.ceph.com/issues/50656
EDIT: I was presented with this issue some time after upgrading from ceph octopus to pacific after a recent upgrade to pve7 from 6.4.
Weirdly enough it seemed only to affect NVME specific drives. All other drives (spinners (with db/wal on nvme lvm) and ghetto ssds) have started fine and seem to be operational.
ceph osd tree | grep -v root | grep \ nvme | awk '{ system ("ceph config set osd." $1 " bluestore_allocator bitmap")}'