CEPH Cluster - sharing ZFS problem

michal_jot · Jul 14, 2022

Hi, i created 2nodes HA cluster. Cluster JOIN, installed ceph, create OSD's on both servers, but when i trying to set replication i have an error "no replicateable device found" or something like that, i set one "local-zfs" (created on the 1st server [ctservertest]) to both of the servers. Now i have an issue - when i try to click on shared disk [local-zfs] i have an error like on the screenshot below. Can you help me to solve them?

aaron · Jul 14, 2022

I think you are conflating a few things here. For VM replication to work, you need ZFS storages that are available on both nodes with the same name and same ZFS pool underneath.

Looks like the top node does not have a matching ZFS pool. Did you install it exactly the same way as the other?

You do not need Ceph at all in such a situation. It is a different storage technology than ZFS and needs at least 3 nodes to work properly.

Rather, destroy these OSDs and all other Ceph services. Then use the disks to create a ZFS pool on each of your nodes. Ideally with a form of redundancy (Mirror - 2 disks, or RAID 10 - 4 or 6 or ... disks). Check out this section in the admin guide regarding considerations when creating a ZFS pool: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_zfs_raid_considerations

If you create the ZFS pools via the GUI, leave the "Add Storage" checkbox enabled for the first node. On the second node, disable it. Then edit the storage (Datacenter->Storage) and edit the list of nodes to include the second node as well.

Lastly, if you want to run a small 2 node cluster with the Proxmox VE HA stack, you will need a 3rd vote to have a majority of votes if one of the nodes goes down. You don't need a full Proxmox VE node, but can make use of the QDevice mechanism: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support

michal_jot · Jul 15, 2022

Ok, but when i want to test live VM migration from 1st to 2nd server i need to set replication right?

aaron · Jul 15, 2022

Live migration can also work without a shared storage or replication. It will take a while though. By using a shared storage, the disk image itself does not need to be transferred. By using the VM replication between nodes, only the delta since the last replication needs to be transferred. If you want to use HA, then you need the disk image available on all nodes.
Either due to a shared storage, or via the replication. With replication though, you will most likely have some data loss, depending on how long after the last successful replication the node fails.

michal_jot · Jul 15, 2022

Ok, so i have 2 nodes. I want to automatically copying VMs from 1st server to 2nd server when a server failure occours. So what i need to do this?

1.Create Cluster + Join 2nd node to it
2.Create RBD?
3.Create VM with disk stored on RBD shared storage
4.Where i can set automatically live VM migrate from failed server to working server?

bbgeek17 · Jul 15, 2022

michal_jot said:
Ok, so i have 2 nodes. I want to automatically copying VMs from 1st server to 2nd server when a server failure occours. So what i need to do this?

This would be physically impossible. If the server has failed, i.e. unavailable, you cant copy data from it automatically.

michal_jot said:
1.Create Cluster + Join 2nd node to it

For a Highly Available cluster you need a minimum of 3 nodes.
https://forum.proxmox.com/threads/minimum-requirements-for-full-high-availability-pve-ceph.42015/

michal_jot said:
2.Create RBD?

For a properly configured Ceph cluster you need a minimum of 3 nodes.
https://forum.proxmox.com/threads/minimum-requirements-for-full-high-availability-pve-ceph.42015/

michal_jot said:
3.Create VM with disk stored on RBD shared storage
4.Where i can set automatically live VM migrate from failed server to working server?

https://pve.proxmox.com/wiki/High_Availability_Cluster

Overall you are thinking in the right direction, for a seamless HA you need a proper cluster with shared storage.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

B.Otto · Jul 15, 2022

There are several things to clarify regarding HA:

Virtualization: For a properly working cluster you need at least three nodes. If one node in a 2-node environment fails, the other wont be able to get the majority vote and fence itself, which ends up in all nodes not working.
There is a workaround if you take a third hardware device and define it as a quorum device.

Storage: In order to start the VM on another node, the data (harddisk) must be availiable on a surviving node. For that you need a shared storage, so either another storage box connected to all nodes where the data is saved, or a software solution like CEPH.
I dont know if that is possible with ZFS. Since replication is done asynchronously, you would end up with the VM state at the last replication point and face data loss (and consistency problems too).
CEPH/RBD: 'The' software-based shared storage solution in Proxmox. Do note that CEPH needs at least 3 Nodes - a workaround similar to the Quorum device above does not exist. The block storage prepared by CEPH is in the form of RADOS block devices (RBD).

aaron · Jul 15, 2022

michal_jot said:
Ok, so i have 2 nodes. I want to automatically copying VMs from 1st server to 2nd server when a server failure occours. So what i need to do this?

Forget anything related to Ceph and RBD in a 2-node cluster

You need 3 or more nodes to make use of Ceph.

For a 2-node cluster the following can work:

- Create a ZFS Pool on both nodes with the same name. Make sure to have some kind of redunancy on the pool level (mirror, raid 10).
- The storage configuration (Datacenter -> Storage) should be there for this pool (more later)
- Make sure that all disks of the VMs are on that storage.
- Create a replication job (shortest interval is each minute */1

Once the initial replication is done, a live migration should be very fast.
You can now combine that with the HA functionality, which will start the VM on the other node, should the first one (where is used to run) fail.

But for that to work, you need a 3rd vote in the Proxmox VE cluster. A Proxmox VE cluster works by forming a majority. If one of 2 nodes is down, the remaining node has only 50% of the votes and cannot start a VM because it does not have the majority of votes.

You can use the QDevice to get one more vote, without a full Proxmox VE installation. The external part can run on a Raspberry Pi or on another machine you have that is not part of the cluster. With that, the remaining node + the Qdevice still has 2 out of 3 votes and can start the VM.

The only downside with replicated VMs and HA is, that the VM will start with the disk image that was transferred in the last successful replication. Depending on the interval, this can be more or less.

Setting up local storage with the same name on multiple nodes:
- Create the ZFS pool on the first node, give it a name and let the "Add Storage" Checkbox enabled.
- On the second node, create the ZFS pool as well, but disable the "Add Storage" checkbox.
- under Datacenter->Storage edit the storage. In the top right, you have a list of nodes. Select the second node. This tells Proxmox VE on which nodes it can expect that storage to be present.

Search

Search

CEPH Cluster - sharing ZFS problem

michal_jot

New Member

Attachments

aaron

Proxmox Staff Member

michal_jot

New Member

aaron

Proxmox Staff Member

michal_jot

New Member

bbgeek17

Distinguished Member

B.Otto

Active Member

aaron

Proxmox Staff Member

We value your privacy