`pveceph createosd` not creating mountpoints or mounting the new OSD if the mountpoint exists

Dec 19, 2017
3
0
6
Background:
I have a 6-host cluster that I built and imaged with the PVE4.4 ISO, and am working on a process of upgrading this cluster to PVE 5.1 that is installed on top of a vanilla Debian Stretch image as per the various wiki instructions:
This is part of a project to upgrade the production cluster my company runs. I'm using this as a test cluster to hammer out any problems in the process so we don't break our production cluster.

I start by upgrading ceph on all 6 of the nodes, then I pick a node to pull. I destroy any OSDs and monitors it has, migrate all VMs away, and power it off to install debian stretch on it, and then proxmox 5.1 on top of that. I then add the node back into the cluster, then add it to ceph. This goes MOSTLY* smoothly. The main issues I have are that when I try to create an OSD to replace the ones that were on the host before I pulled it, pveceph doesn't perform some crucial steps.

First, it does not create a mountpoint for the new OSD. If I manually create the mount point and give it the right ownership (ceph:ceph) it will have and create the OSD with pveceph, it will not mount the OSD. It also does not update systemd to autostart the OSD upon reboot. I was able to get around this by manually adding the partition UUIDs to fstab and manually doing a systemctl daemon-reload, but I just noticed that I did not need to do this in proxmox 4.4. Is this new behavior in PVE 5.1, or is it because I installed the software on a Debian installation instead of from an ISO?

Secondly, I noticed that if the OSD I am recreating was bluestore before destroying it, neither the destruction or creation process overwrite the old fsid left from the OSD. This means that even if I delete the partitions on the disk and recreate the OSD, the 2nd partition will have a stale fsid and the OSD won't start. I had to use dd to write zeros to the front of the disk in order for the old FSID to be wiped out. Is this intentional? I would think that if you are destroying an OSD, the fsid should similarly be destroyed.

Lastly, when I go pull and re-add the next node, I'm unable to join it to the cluster by specifying the IP of a node that I installed with Debian; I get 'unable to copy ssh ID: exit code 1'. I get I can join it by specifying the IP of a 4.4 node though. The authorized_keys files have the same ownership and permissions between all the nodes, so I'm not sure what's going on there.
Edit: never mind on that last one; it's because of the ssh config I have on the Stretch hosts. Disregard this last issue.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!