Disk setup recommendation for new cluster

silke · Apr 16, 2025

Hi, I just bought some new hardware and would like to get some recommendation how to get a good disk setup.
My still very uninformed (no prior experience, just lots of little bits and pieces from the net) idea is to do it like this:
I have three new machines with three NVMe slots each and a few older Intel NUCs.
- put a 1TB in either of the three new machines for the OS (Proxmox)
- additional 2x 4TB to the first two of these with either ZFS or Ceph for VMs and Containers
- additional 2x 2TB for the third new machine with the same file system as the first two
- an old NUC as backup server with external storage (USB and NAS)

The idea is to be able to mainly use the first two new machines for my VMs with the option to migrate them back and forth for maintenance. The third is just to get HA and to have some redundency, it has a little less CPU power and RAM.

Do you think this is a sensible idea? Of course I am very open for improvement suggestions. But even if it is ok as thought out I still have a few questions:
- The Proxmox installation will also create VM storage on the first disk. Is there still an easy way to just backup the OS?
- Any good use for the additional storage on the first disk? I guess it cannot be included in the HA setup, can it?
- ZFS or Ceph? Any good tutorial you can recommend to set up HA with either of these?
- Thinking massive desaster: Is it possible to put a portable Proxmox system (and backup server?) on a USB disk? If everything productive is gone, I would like to get a random PC, plug the USB disk in, boot, perhaps set the IP, recover a few really important VMs from the USB disk or an offsite backup and be back in business -- at least with an emergency system?

UdoB · Apr 16, 2025

silke said:
Hi, I just bought some new hardware and would like to get some recommendation

My recommendation: first have a problem. Then look for solutions. Then choose one sololution. Then buy hardware.

Of course you can start at the end of the line too

silke said:
- ZFS or Ceph?

ZFS!

[TUTORIAL] Thread 'FabU: This is just a small setup with limited resources and only a few disks, should I use ZFS at all?'

Jan 3, 2025

Should I use ZFS at all?

For once this requires a disclaimer first: I am using ZFS nearly everywhere, where it is easily possible, though exceptions do exists. In any case I am definitely biased pro ZFS.

That said..., the correct answer is obviously: “Yes, of course” ;-)

Integrity

ZFS assures integrity. It will deliver exactly the very same data when you read it as was written at some point of time in the past.

“But all filesystems do this, right?” Well, there is more to it. ZFS works hard to actively assure the correctness. An...

For Ceph to be useful and stable you need more than three nodes:

[TUTORIAL] Thread 'FabU: can I use Ceph in a _very_ small cluster?'

Dec 26, 2024

Ceph is great, but it needs some resources above the theoretical minimum to work reliably. My assumptions for the following text:

you want to use Ceph because... why not?
you want to use High Availability - which requires Shared Storage (note that a complete solution needs more things like a redundant network stack and power supplies)
you want to start as small (and cheap) as possible, because this is... “only” a Homelab

You plan for three Nodes. Each node has s single dedicated disk for use as an “OSD”. This is the documented...

waltar · Apr 16, 2025

UdoB said:
My recommendation: first have a problem.

Super approach !!

UdoB said:
Then look for solutions. Then choose one sololution. Then buy hardware.

Yeah

silke · Apr 16, 2025

UdoB said:
My recommendation: first have a problem. Then look for solutions. Then choose one sololution. Then buy hardware.

What led you to the conclusion I didn't follow this sequence?

- Problem: Get a fast and reliable virtualisation system
- Solution: Recent hardware with proxmox and at least three nodes for HA.
- Buy the hardware
- Install Proxmox
- Configure it in a way that best suits the intended solution

The last point is where I am now.

In the past I had several old (7. Generation) NUCs running VMware Workstation. Since it is (semi-)commercial not just a homelab I needed more speed and after a few hardware breakdowns I wanted less downtime in case of problems. A VMware based solution was either not good enough or far too expensive. This way the above Solution seemed natural to me. But even with the general idea resolved there are lots of problems concerning the details. This is why I hope for help here.

Thanks for the advice about the file system. Any ideas concerning my other questions?

UdoB · Apr 16, 2025

silke said:
What led you to the conclusion I didn't follow this sequence?

Well, the very first sentence in post #1 ;-)

silke said:
Any ideas concerning my other questions?

The bottom block of #1?

silke said:
- The Proxmox installation will also create VM storage on the first disk. Is there still an easy way to just backup the OS?

No. Not officially. Node backup is on the roadmap for some time now, maybe it comes with the next release. You can find some scripts and a command line client for PBS. Personally I do not use these.

silke said:
- Any good use for the additional storage on the first disk? I guess it cannot be included in the HA setup, can it?

Why not? The default setup will create the storage named "local-zfs" directly on the ZFS pool named "rpool". It is useable for VMs (and also for Containers) and these can get replicated by automatic schedule. I do that every two hours for most VMs.

silke said:
- ZFS or Ceph? Any good tutorial you can recommend to set up HA with either of these?

The normal documentation should be sufficient.

silke said:
- Thinking massive desaster: Is it possible to put a portable Proxmox system (and backup server?) on a USB disk?

No.

You have a cluster. When one node dies just add another (new) one. And clean up by removing the dead one.

Just start playing with it. Start with the idea that the first construct is purely for learning. Test it. Damage it, repair it. Make backups; restore backups. Look for pitfalls. Verify that it does what you want.

Don't forget a backup system, following the 3-2-1 strategy. Opt for one or more PBS' with SSDs if possible. Make sure that backups are created automatically. Verify them by actually restoring some. Until you did this all backups are "Heisenberg-Backups" with unknown state = unknown if restore actually works.

waltar · Apr 16, 2025

UdoB said:
You have a cluster. When one node dies just add another (new) one. And clean up by removing the dead one.

Just start playing with it. Start with the idea that the first construct is purely for learning. Test it. Damage it, repair it. Make backups; restore backups. Look for pitfalls. Verify that it does what you want.

Couldn't better said !!
And restores should be tested from time to time again and again for working.

silke · Apr 16, 2025

UdoB said:
Well, the very first sentence in post #1 ;-)

I still cannot see how leaving out the decision history can imply that there were no prior thoughts - but I get your smiley.

Your answers are really helpful Thanks a lot! I already intend (and started already) playing around the way you describe. Since the cluster is not yet productive I can test everything extensively similar to your suggestion, then I plan a complete reinstall for the production environment.

Still one question left because I just saw a YouTube video about ZFS replication where the replication did not work for the local storage, only after the VM was moved to a cluster-ZFS a replication target could be selected. How did you configure the local storage to enable replication?

And one more: since I have three NVMe slots, do you think it is a good idea to install Proxmox on a single disk (no RAID) and have the remaining two for a RAID1 as cluster storage for VMs? Or use just two disks with RAID1 and everything on these (if I can figure out how to setup replication there, see above)?

UdoB · Apr 16, 2025

silke said:
How did you configure the local storage to enable replication?

All nodes should have the same storage configuration. After a normal installation each one has an "rpool" = three instances, one on each node.

There is only one storage definition for this "rpool", named "local-zfs". This storage is declared to be available on all nodes, it is "shared". The names must be consistent on all node members. Under this precondition you can replicate VMs from one node to another.

And you can move VMs from one node to another, both offline (with the VM stopped) and online = while the VM is continuously running.

The above is "works as expected".

Then there is a chance for a hardware malfunction - this is when "High Availability", "HA" kicks in. It needs "Shared Storage". ZFS replication is accepted as such - while it isn't really shared. In a disaster, when a node crashes hard, a configured VM is restarted on another node. That's basically all it does. The data on the dead node had been replicated some time ago (days, hours, minutes - your choice). The data changed since that last replication has been running is lost. For some usecases this is a show stopper. For my VMs this is acceptable, for both in my job and in the Homelab.

silke said:
And one more: since I have three NVMe slots, do you think it is a good idea to install Proxmox on a single disk (no RAID) and have the remaining two for a RAID1 as cluster storage for VMs? Or use just two disks with RAID1 and everything on these (if I can figure out how to setup replication there, see above)?

Good question for which I have no good answer. In my Homelab I have some Mini PC with the same options: two * SATA + one single NVMe. I really want to have redundancy and so I went with 2*SATA for the main OS and the mentioned "rpool" with a single mirrored vdev. In this incarnation everything is stored on this single ZFS pool.

I have no good recommendation for what to do with a single NVMe. I can not use it as a "Special Device" without redundancy. And a cheap NVMe as a Cache (or a SLOG) makes no sense for Enterprise SSDs.

((
Anecdote, not a recommendation: then I had those unused but single-per-node NVMe. This was two years ago and it was the initial beginning of my Ceph journey. I wanted to utilize those three single NVMe and I went the road described in my linked "FabU" ... "because... why not?". At the end it was six nodes with 12 OSDs (plus three nodes w/o OSDs). My slow Homelab (2.5 GBit/s really is too slow!) and the high energy costs made me drop that approach a short while ago. Now I am again shrinking down the number of nodes and return to ZFS only, kicking Ceph completely out of my boat.
))

silke · Apr 16, 2025

Your ancdote led me to a new idea:
Is it possible to do replication between Ceph storage and ZFS? Then I could use the third single disk for Ceph and define a replication from this Ceph storage to the ZFS storage. Expected result: As long as Ceph is working I have real HA. If it fails somehow because it is not really reliable in this minimal setup, I still have the "almost HA" on ZFS from the last replication. Nonsense or doable?
Network speed should not be a problem. I have two 10G ports on each machine. With these it should be possible to setup a fast network between the nodes. And if that still is too slow I found a tutorial how to setup the USB4 Thunderbolt ports for a network with at least 20G speed.

UdoB · Apr 17, 2025

silke said:
Is it possible to do replication between Ceph storage and ZFS?

No, sorry. The term "replication" describes exclusively the ZFS copying capability, using the send/receive mechanism to minimize the actual traffic.

Search

Search

Disk setup recommendation for new cluster

silke

New Member

UdoB

Distinguished Member

[TUTORIAL] Thread 'FabU: This is just a small setup with limited resources and only a few disks, should I use ZFS at all?'

Should I use ZFS at all?

Integrity

[TUTORIAL] Thread 'FabU: can I use Ceph in a _very_ small cluster?'

waltar

Renowned Member

silke

New Member

UdoB

Distinguished Member

waltar

Renowned Member

silke

New Member

UdoB

Distinguished Member

silke

New Member

UdoB

Distinguished Member

We value your privacy

Disk setup recommendation for new cluster

New Member

Distinguished Member

[TUTORIAL] Thread 'FabU: This is just a small setup with limited resources and only a few disks, should I use ZFS at all?'

Should I use ZFS at all?​

Integrity​

[TUTORIAL] Thread 'FabU: can I use Ceph in a _very_ small cluster?'

Renowned Member

New Member

Distinguished Member

Renowned Member

New Member

Distinguished Member

New Member

Distinguished Member

We value your privacy

Should I use ZFS at all?

Integrity