[SOLVED] Reinstall Proxmox-Ceph Node after Crash

Discussion in 'Proxmox VE: Installation and configuration' started by Fladi, Sep 16, 2018.

  1. Fladi

    Fladi New Member

    Joined:
    Feb 27, 2015
    Messages:
    19
    Likes Received:
    4
    Hi all,

    we had a major outage of our cluster consisting of 3 PVE Host which servers Containers and VMs. In addition there are 3 Proxmox based Ceph servers.
    I was in the preperation to switch from running 4.4 to 5.x.

    1. ceph 3 didn't come up after reboot. It turned out that the HBA controller with the boot disks is no longer available to the bios and thus can't be used as a boot device. It's not so the controller. This is running fine. Tried different controllers and no extra controller is viewable in the bios. I got around this with attaching the boot-disks to an sas-controller on board.

    2. While working on 1) another ceph-node crashed. Ceph-1 had two sata-dom with the proxmox installation. Both seem to be damaged. Can't access them anymore.

    So, Ceph-3 has booted with a lot of errors on zfs but is running. Ceph-Cluster is in the progress of healing and some VM are available again. But overall-state is still error (see image)

    However I would like to get the Ceph-1 online again as soon as possible. I have an "unused" SSD in there which is configured as a part of a separate SSD-Pool. As this is empty I could use it as a system-disk and boot from it.

    What would be the best way to reinstall Ceph-1? Install "normal" giving the same ip/id as before and joining cluster? Will ceph detect the osd on this machine again? Reinstall as "new" node?

    Oh, what a weekend...

    Thanks and best regards
     

    Attached Files:

  2. Fladi

    Fladi New Member

    Joined:
    Feb 27, 2015
    Messages:
    19
    Likes Received:
    4
    Mist, falsches Forum erwischt. Kann ein Mod das in das engl. verschieben?
     
  3. Fladi

    Fladi New Member

    Joined:
    Feb 27, 2015
    Messages:
    19
    Likes Received:
    4
    I solved this. Will update this post with description later....
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice