Disk setup recommendation for new cluster

silke

New Member
Apr 15, 2025
15
10
3
Hi, I just bought some new hardware and would like to get some recommendation how to get a good disk setup.
My still very uninformed (no prior experience, just lots of little bits and pieces from the net) idea is to do it like this:
I have three new machines with three NVMe slots each and a few older Intel NUCs.
- put a 1TB in either of the three new machines for the OS (Proxmox)
- additional 2x 4TB to the first two of these with either ZFS or Ceph for VMs and Containers
- additional 2x 2TB for the third new machine with the same file system as the first two
- an old NUC as backup server with external storage (USB and NAS)

The idea is to be able to mainly use the first two new machines for my VMs with the option to migrate them back and forth for maintenance. The third is just to get HA and to have some redundency, it has a little less CPU power and RAM.

Do you think this is a sensible idea? Of course I am very open for improvement suggestions. But even if it is ok as thought out I still have a few questions:
- The Proxmox installation will also create VM storage on the first disk. Is there still an easy way to just backup the OS?
- Any good use for the additional storage on the first disk? I guess it cannot be included in the HA setup, can it?
- ZFS or Ceph? Any good tutorial you can recommend to set up HA with either of these?
- Thinking massive desaster: Is it possible to put a portable Proxmox system (and backup server?) on a USB disk? If everything productive is gone, I would like to get a random PC, plug the USB disk in, boot, perhaps set the IP, recover a few really important VMs from the USB disk or an offsite backup and be back in business -- at least with an emergency system?
 
Hi, I just bought some new hardware and would like to get some recommendation
My recommendation: first have a problem. Then look for solutions. Then choose one sololution. Then buy hardware.

Of course you can start at the end of the line too :)

- ZFS or Ceph?

ZFS!


For Ceph to be useful and stable you need more than three nodes:
 
My recommendation: first have a problem. Then look for solutions. Then choose one sololution. Then buy hardware.
What led you to the conclusion I didn't follow this sequence?

- Problem: Get a fast and reliable virtualisation system
- Solution: Recent hardware with proxmox and at least three nodes for HA.
- Buy the hardware
- Install Proxmox
- Configure it in a way that best suits the intended solution

The last point is where I am now.

In the past I had several old (7. Generation) NUCs running VMware Workstation. Since it is (semi-)commercial not just a homelab I needed more speed and after a few hardware breakdowns I wanted less downtime in case of problems. A VMware based solution was either not good enough or far too expensive. This way the above Solution seemed natural to me. But even with the general idea resolved there are lots of problems concerning the details. This is why I hope for help here.

Thanks for the advice about the file system. Any ideas concerning my other questions?
 
What led you to the conclusion I didn't follow this sequence?
Well, the very first sentence in post #1 ;-)

Any ideas concerning my other questions?
The bottom block of #1?

- The Proxmox installation will also create VM storage on the first disk. Is there still an easy way to just backup the OS?
No. Not officially. Node backup is on the roadmap for some time now, maybe it comes with the next release. You can find some scripts and a command line client for PBS. Personally I do not use these.
- Any good use for the additional storage on the first disk? I guess it cannot be included in the HA setup, can it?
Why not? The default setup will create the storage named "local-zfs" directly on the ZFS pool named "rpool". It is useable for VMs (and also for Containers) and these can get replicated by automatic schedule. I do that every two hours for most VMs.
- ZFS or Ceph? Any good tutorial you can recommend to set up HA with either of these?
The normal documentation should be sufficient.
- Thinking massive desaster: Is it possible to put a portable Proxmox system (and backup server?) on a USB disk?
No.

You have a cluster. When one node dies just add another (new) one. And clean up by removing the dead one.

Just start playing with it. Start with the idea that the first construct is purely for learning. Test it. Damage it, repair it. Make backups; restore backups. Look for pitfalls. Verify that it does what you want.

Don't forget a backup system, following the 3-2-1 strategy. Opt for one or more PBS' with SSDs if possible. Make sure that backups are created automatically. Verify them by actually restoring some. Until you did this all backups are "Heisenberg-Backups" with unknown state = unknown if restore actually works.

:)
 
You have a cluster. When one node dies just add another (new) one. And clean up by removing the dead one.

Just start playing with it. Start with the idea that the first construct is purely for learning. Test it. Damage it, repair it. Make backups; restore backups. Look for pitfalls. Verify that it does what you want.
Couldn't better said !!
And restores should be tested from time to time again and again for working.
 
  • Like
Reactions: Johannes S and UdoB
Well, the very first sentence in post #1 ;-)
I still cannot see how leaving out the decision history can imply that there were no prior thoughts - but I get your smiley.

Your answers are really helpful Thanks a lot! I already intend (and started already) playing around the way you describe. Since the cluster is not yet productive I can test everything extensively similar to your suggestion, then I plan a complete reinstall for the production environment.

Still one question left because I just saw a YouTube video about ZFS replication where the replication did not work for the local storage, only after the VM was moved to a cluster-ZFS a replication target could be selected. How did you configure the local storage to enable replication?

And one more: since I have three NVMe slots, do you think it is a good idea to install Proxmox on a single disk (no RAID) and have the remaining two for a RAID1 as cluster storage for VMs? Or use just two disks with RAID1 and everything on these (if I can figure out how to setup replication there, see above)?
 
  • Like
Reactions: Johannes S and UdoB
How did you configure the local storage to enable replication?
All nodes should have the same storage configuration. After a normal installation each one has an "rpool" = three instances, one on each node.

There is only one storage definition for this "rpool", named "local-zfs". This storage is declared to be available on all nodes, it is "shared". The names must be consistent on all node members. Under this precondition you can replicate VMs from one node to another.

And you can move VMs from one node to another, both offline (with the VM stopped) and online = while the VM is continuously running.

The above is "works as expected".

Then there is a chance for a hardware malfunction - this is when "High Availability", "HA" kicks in. It needs "Shared Storage". ZFS replication is accepted as such - while it isn't really shared. In a disaster, when a node crashes hard, a configured VM is restarted on another node. That's basically all it does. The data on the dead node had been replicated some time ago (days, hours, minutes - your choice). The data changed since that last replication has been running is lost. For some usecases this is a show stopper. For my VMs this is acceptable, for both in my job and in the Homelab.

And one more: since I have three NVMe slots, do you think it is a good idea to install Proxmox on a single disk (no RAID) and have the remaining two for a RAID1 as cluster storage for VMs? Or use just two disks with RAID1 and everything on these (if I can figure out how to setup replication there, see above)?
Good question for which I have no good answer. In my Homelab I have some Mini PC with the same options: two * SATA + one single NVMe. I really want to have redundancy and so I went with 2*SATA for the main OS and the mentioned "rpool" with a single mirrored vdev. In this incarnation everything is stored on this single ZFS pool.

I have no good recommendation for what to do with a single NVMe. I can not use it as a "Special Device" without redundancy. And a cheap NVMe as a Cache (or a SLOG) makes no sense for Enterprise SSDs.

((
Anecdote, not a recommendation: then I had those unused but single-per-node NVMe. This was two years ago and it was the initial beginning of my Ceph journey. I wanted to utilize those three single NVMe and I went the road described in my linked "FabU" ... "because... why not?". At the end it was six nodes with 12 OSDs (plus three nodes w/o OSDs). My slow Homelab (2.5 GBit/s really is too slow!) and the high energy costs made me drop that approach a short while ago. Now I am again shrinking down the number of nodes and return to ZFS only, kicking Ceph completely out of my boat.
))
 
  • Like
Reactions: waltar and silke
Your ancdote led me to a new idea:
Is it possible to do replication between Ceph storage and ZFS? Then I could use the third single disk for Ceph and define a replication from this Ceph storage to the ZFS storage. Expected result: As long as Ceph is working I have real HA. If it fails somehow because it is not really reliable in this minimal setup, I still have the "almost HA" on ZFS from the last replication. Nonsense or doable?
Network speed should not be a problem. I have two 10G ports on each machine. With these it should be possible to setup a fast network between the nodes. And if that still is too slow I found a tutorial how to setup the USB4 Thunderbolt ports for a network with at least 20G speed.
 
Is it possible to do replication between Ceph storage and ZFS?
No, sorry. The term "replication" describes exclusively the ZFS copying capability, using the send/receive mechanism to minimize the actual traffic.