How to replace the Proxmox OS disk, for a node in a cluster?

Discussion in 'Proxmox VE: Installation and configuration' started by victorhooi, Apr 5, 2019.

  1. victorhooi

    victorhooi Member

    Joined:
    Apr 3, 2018
    Messages:
    132
    Likes Received:
    6
    We have a 3-node Proxmox cluster, with Ceph HA storage.

    The primary OS disk (i.e. Proxmox) is one of the nodes is failing. We are hoping to replace it - but will be going from a 2.5" SSD to a M.2 SSD.

    What is the safest way of doing this?

    Will a 1:1 copy of the disk suffice, and we simply change boot order?

    Or do we have to do a fresh install, and somehow re-integrate it back into both the Proxmox cluster, as well as the Ceph cluster?
     
  2. victorhooi

    victorhooi Member

    Joined:
    Apr 3, 2018
    Messages:
    132
    Likes Received:
    6
    Also - I just realised the new M.2 disk is smaller (i.e. 1TB) versus the old 2.5" disk (2TB).

    So a 1:1 bit copy won't work.

    Is there another way to migrate a Proxmox installation from one disk to another?

    Or an easy way to reinstall Proxmox, say, and re-import all the configuration, including cluster and Ceph details?
     
  3. lhorace

    lhorace Member

    Joined:
    Oct 17, 2015
    Messages:
    141
    Likes Received:
    14
    Hey there,

    From a live iso, rsync should do it, reinstall the bootloader including recreating grub.cfg, and update /etc/fstab / with the new UUID of the new disk?

    Cheers
     
  4. victorhooi

    victorhooi Member

    Joined:
    Apr 3, 2018
    Messages:
    132
    Likes Received:
    6
    OK - so boot to a live Linux DVD/CD.

    And then run rsync?

    How does this work though, if Proxmox is setup on an LVM drive etc?
     
  5. lhorace

    lhorace Member

    Joined:
    Oct 17, 2015
    Messages:
    141
    Likes Received:
    14
    It's entirely up to you, if you want to continue to use LVM, then you would obviously create the VG, PV, LV before hand. If you want to drop LVM, you can just format using whatever FS you want. They rsync data and make sure everything points to the new disk. Alternatively, you can also look into https://linux.die.net/man/8/pvmove.

    Cheers
     
  6. victorhooi

    victorhooi Member

    Joined:
    Apr 3, 2018
    Messages:
    132
    Likes Received:
    6
    Right - so I can re-create the LVM setup, then use something like pvmove.

    I was thinking of doing ZFS, actually.

    Is there some way of re-installing via the ProxMox installer (to get partitions setup), then somehow overwriting it with the config from the old disk? Or is that a stupid idea?
     
  7. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    559
    Likes Received:
    59
    safest way is to eject the node from the cluster, reinstall proxmox, and rejoin.
    Alternatively you can use partclone instead of 1:1 copy to copy each partition and copy restore any disk images from backup assuming you have any on the boot device once you've rejoined the cluster.
     
  8. victorhooi

    victorhooi Member

    Joined:
    Apr 3, 2018
    Messages:
    132
    Likes Received:
    6
    OK, I'm happy to reinstall Proxmox and rejoin.

    I noticed Proxmox 5.4 just came out - so we could create a new USB installer for that.

    This node was node #1 (i.e. the first node in the cluster) - but I assume that's not relevant now.

    I can the rejoin the Proxmox cluster. How about on the Ceph side - how should I rejoin it there?
     
  9. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    559
    Likes Received:
    59
    IF you named the node the same, be aware that you'll have ssh key conflicts you will need to resolve manually.
    IF you named the node differently, you'll need to clean up your crush map to remove mention of the old node, but then the process is the same:
    # pveceph init. ​
    if you only have 3 nodes, you also want to create a monitor:
    # pveceph createmon​
     
  10. victorhooi

    victorhooi Member

    Joined:
    Apr 3, 2018
    Messages:
    132
    Likes Received:
    6
    I plan on naming the node the same.

    How do I resolve the SSH key conflicts?
     
  11. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    559
    Likes Received:
    59
    once you rejoined the cluster, perform the following on each node as root, where [nodename] is the name of the replaced node:
    sed -i '/[nodename]/d' /root/.ssh/known_hosts
    ssh [nodename]
    (will prompt you to accept the node signature)
     
  12. victorhooi

    victorhooi Member

    Joined:
    Apr 3, 2018
    Messages:
    132
    Likes Received:
    6
  13. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    559
    Likes Received:
    59
    Its easiest if you do. reattaching existing ceph resources on another node is possible but difficult.
     
  14. victorhooi

    victorhooi Member

    Joined:
    Apr 3, 2018
    Messages:
    132
    Likes Received:
    6
    I was able to do this last night. Do you think this is worth documenting in the Proxmox wiki?

    First part was to remove the node from Ceph.
    1. From the Web UI, I went to Ceph, then Monitor, and removed the Ceph monitor daemon on that node.
    2. I then went to Ceph, CephFS, and removed the MDS (Metadata Server) that was on that node as well.
    3. I then went to Ceph, OSD and for the four OSDs on that node, I clicked on "Out", then "Stop", then "Destroy".
    Second part was to remove the node (following the steps here):

    I ran pvecm nodes to list the nodes:
    Code:
    root@vwnode2:~# pvecm nodes
    
    Membership information
    ----------------------
        Nodeid      Votes Name
             1          1 10.7.17.3
             2          1 10.7.17.4 (local)
             3          1 10.7.17.5
    
    I then shutdown the node.

    There are two quirks (or maybe bugs?) here:
    1. When you run pvecm nodes again, with the node shutdown, that shutdown node doesn't appear at all - even as an "offline" mode. There doesn't appear to be any flags to show it?
    2. The value in the "Name" column is the IP address, but not the actual "name" of the node you need to pass to pvecm delnode. It's just I happened to know what it was in this case.
    Does anybody know if this is expected behaviour, or if I should file bugs for these two?

    Code:
    After the node is shutdown, I then delete it:
    root@vwnode2:~# pvecm delnode vwnode1
    Killing node 1
    
    The third part is removing the SSH key - alexskysilk@ mentioned above about removing it from /root/.ssh/known_hosts - however, in my case, this file didn't exist. Because I am running a Proxmox cluster, hte known_hosts file actually seems to be in the shared PVE cluster filesystem. Hence, on one of the remaining two nodes, I edited /etc/pve/priv/known_hosts, and removed the line for that node.

    I then restarted the node, and reinstalled Proxmox from scratch.

    (I did notice at this point I was getting a whole bunch of errors about invalid options in a cephpoolname_cephfs.secret file at the console. Not sure why this was, but it seemed to go away once I got Ceph running again).

    I booted it up, configured the network interfaces and static IPs as before (including for my Ceph interfaces), and was able to rejoin the cluster.

    For Ceph, it was simply running:
    Code:
    pveceph install
    pveceph init
    
    I then re-created the Ceph monitor daemon and CephFS MDS via the Web UI.

    For the OSDs, I had to do this manually via the command-line, as I wanted four OSDs per disk (which isn't possible via the Web UI - maybe that's a feature request).

    Code:
    ceph-volume lvm zap --destroy /dev/nvme0n1
    ceph-volume lvm batch --osds-per-device 4 /dev/nvme0n1
    
     
    alexskysilk likes this.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice