`pveceph createosd` not creating mountpoints or mounting the new OSD if the mountpoint exists

Discussion in 'Proxmox VE: Installation and configuration' started by Daniel De Lellis, Jan 25, 2018.

  1. Daniel De Lellis

    Daniel De Lellis New Member

    Joined:
    Dec 19, 2017
    Messages:
    3
    Likes Received:
    0
    Background:
    I have a 6-host cluster that I built and imaged with the PVE4.4 ISO, and am working on a process of upgrading this cluster to PVE 5.1 that is installed on top of a vanilla Debian Stretch image as per the various wiki instructions:
    This is part of a project to upgrade the production cluster my company runs. I'm using this as a test cluster to hammer out any problems in the process so we don't break our production cluster.

    I start by upgrading ceph on all 6 of the nodes, then I pick a node to pull. I destroy any OSDs and monitors it has, migrate all VMs away, and power it off to install debian stretch on it, and then proxmox 5.1 on top of that. I then add the node back into the cluster, then add it to ceph. This goes MOSTLY* smoothly. The main issues I have are that when I try to create an OSD to replace the ones that were on the host before I pulled it, pveceph doesn't perform some crucial steps.

    First, it does not create a mountpoint for the new OSD. If I manually create the mount point and give it the right ownership (ceph:ceph) it will have and create the OSD with pveceph, it will not mount the OSD. It also does not update systemd to autostart the OSD upon reboot. I was able to get around this by manually adding the partition UUIDs to fstab and manually doing a systemctl daemon-reload, but I just noticed that I did not need to do this in proxmox 4.4. Is this new behavior in PVE 5.1, or is it because I installed the software on a Debian installation instead of from an ISO?

    Secondly, I noticed that if the OSD I am recreating was bluestore before destroying it, neither the destruction or creation process overwrite the old fsid left from the OSD. This means that even if I delete the partitions on the disk and recreate the OSD, the 2nd partition will have a stale fsid and the OSD won't start. I had to use dd to write zeros to the front of the disk in order for the old FSID to be wiped out. Is this intentional? I would think that if you are destroying an OSD, the fsid should similarly be destroyed.

    Lastly, when I go pull and re-add the next node, I'm unable to join it to the cluster by specifying the IP of a node that I installed with Debian; I get 'unable to copy ssh ID: exit code 1'. I get I can join it by specifying the IP of a 4.4 node though. The authorized_keys files have the same ownership and permissions between all the nodes, so I'm not sure what's going on there.
    Edit: never mind on that last one; it's because of the ssh config I have on the Stretch hosts. Disregard this last issue.
     
    #1 Daniel De Lellis, Jan 25, 2018
    Last edited: Jan 25, 2018
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,538
    Likes Received:
    221
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice