Below is my current home lab setup with regards to disks. This is a pretty standard ZFS style setup that you will all be familiar with. Note that its still in the experimental stage so I can tare things down if I want.
Node one:
Node two is exactly the same.
My reasoning for two servers is that if I need to take one down for maintenance, I can migrate VMs over to the other and therefore keep my network running. Also Node one TrueNAS is the file server while node two TrueNAS is the backup server.
I liked the concept of high availability but it was not something I was interested in…. until I tried it this weekend by adding a Raspberry Pi QDevice. Now I think its ace and want to use it . Failing over small VMs is not a problem but my TrueNAS setup with the disks being passed through via a HBA is a different story . I’ve always known the ultimate setup is having a separate storage area network but for a home lab it results in too many servers $$$.
At the weekend I discovered that you can run Ceph on the Proxmox nodes. This changes everything!!! If I run Ceph I could then failover TrueNAS. I might actually just use plain old Samba on a container rather than TrueNAS but you get the idea, I need to move a VM with TBs of data.
I’ve been reading up on Ceph but there are a few things I’m not sure about:
As you can probably tell from my post I am very new to Ceph.
Node one:
- OS = 2 x 500GB SSDs in ZFS raid 1 mirror
- VM pool = 2 x 1TB SSDs in ZFS raid 1 mirror
- TrueNAS disk pass-through via HBA in IT-Mode = 2 x 4TB SSDs in ZFS raid 1 mirror. Plan being I can add more disks over time to convert to ZFS raid 5 or 6 etc if I need more storage space.
Node two is exactly the same.
My reasoning for two servers is that if I need to take one down for maintenance, I can migrate VMs over to the other and therefore keep my network running. Also Node one TrueNAS is the file server while node two TrueNAS is the backup server.
I liked the concept of high availability but it was not something I was interested in…. until I tried it this weekend by adding a Raspberry Pi QDevice. Now I think its ace and want to use it . Failing over small VMs is not a problem but my TrueNAS setup with the disks being passed through via a HBA is a different story . I’ve always known the ultimate setup is having a separate storage area network but for a home lab it results in too many servers $$$.
At the weekend I discovered that you can run Ceph on the Proxmox nodes. This changes everything!!! If I run Ceph I could then failover TrueNAS. I might actually just use plain old Samba on a container rather than TrueNAS but you get the idea, I need to move a VM with TBs of data.
I’ve been reading up on Ceph but there are a few things I’m not sure about:
- It seems you need a minimum of 3 nodes. Sounds a lot like Proxmox clustering. The question is - do all 3 nodes need to be full blown servers with all the disks. With Proxmox clustering I can get away with a simple QDevice. I can easily get a 3rd device for Ceph and also use it as the 3rd Proxmox node and therefore free up my Raspberry Pi for other uses. My main concern is disks. If I need to buy more 4TB SSDs for the file server thats a lot of money for a home lab. Does Ceph need to store the data across all 3 nodes? In my mind I’m thinking why cant two nodes have a full copy of the data and the 3rd node is there just for voting. This might not be how Ceph works though which is why I’ve written this post. I wouldn’t plan on having any VMs running on the 3rd node.
- Do I have too many disks with my current setup? For example, having two 1TB drives for the VM pool in ZFS raid 1 mirror makes sense when using ZFS but do I still need the two disks if using Ceph? If one disk failed then another node would have the data so no need for 2nd disk?
- If the data is present on all nodes – which node would serve the data to Proxmox? Ideally Proxmox would get the data from the Ceph disks that are on the same physical machine. Seems wasteful from a network bandwidth point of view to be using a VMs disk stored on a different node if there is a copy of the data on the same node.
As you can probably tell from my post I am very new to Ceph.