Proxmox VE configuration considerations

JohnyMielony

New Member
Feb 12, 2024
12
0
1
Hi,
I'm building Virtual Environment, I've got a one server for this time, but going to build cluster after migrating physical servers to VMs.
As CEPH is no option in my configuration I wanted to go with ZFS and offline migrating of VMs.
I have a dilemma, because someone before me ordered server with SATA SSDs instead of SAS SSDs, I've ordered new SAS SSDs, but I've got those SATA drives left.
So which is better, installing OS on those SATA drives, and use SAS drives for ZFS pool and place where all of VMs are going to sit, or just use those SAS drives as one single RAIDZ for OS and VMs?
 
SATA SSDs is not a bad thing, as long as they're enterprise SSDs. AFAIK, the most noticeable difference is the speed: SAS is mostly 12G nowaydays, whereas SATA is still 6G.

Why is CEPH not an option for you? You will not have fun with ZFS in a cluster (= 3 nodes or two with qdevice).
 
All drives are enterprise grade. From my experience SAS drives are more durable, so not just speed matters.
I would like to have CEPH for online migrating, but from what I was told, I would need 10Gbps network and at least few nodes at the beginning.
I have 1Gbps network and one node at the start. Another nodes will be equipped with HDDs drives. You think that CEPH makes sense in these conditions?
For my use case offline migration is acceptable.
What problems should I expect with using ZFS in a cluster?
 
I have 1Gbps network and one node at the start. Another nodes will be equipped with HDDs drives. You think that CEPH makes sense in these conditions?
No, CEPH over 1 GBE is no option.

What problems should I expect with using ZFS in a cluster?
ZFS is not a cluster filesystem and you use it on the nodes in a cluster. Therefore you won't have the same features like almost-zero-loss-failover, easy live migration, easy replication setup including failback configuration, multinode setup, etc.
 
No, CEPH over 1 GBE is no option.
So that's why it's out of consideration.

ZFS is not a cluster filesystem and you use it on the nodes in a cluster. Therefore you won't have the same features like almost-zero-loss-failover, easy live migration, easy replication setup including failback configuration, multinode setup, etc.
I'm aware of that, and I accept that. I'm thinking about Gluster on top of ZFS, but I need to read more about it, if it make sense, and what are the drawbacks.

So what about my original question? One RAIDZ for OS and VMs, or one RAID(Z) for OS with SATA and second RAIDZ for VMs with SAS?
 
One RAIDZ for OS and VMs, or one RAID(Z) for OS with SATA and second RAIDZ for VMs with SAS?
The OS needs no IOPS and only at max 8 GiB space, so IMHO a separate pool is way too much. It can however help if you need to reinstall everything if you can't solve a problem (mostly applies to people not familiar with Debian and Linux). In your special circumstances, you need to create two pools due to SATA and SAS disks and I would not mix them. Is this optimal with respect to speed or utilisation? No, not at all ... but the hardware is like this. Technically, your system would be faster operating with only one pool, because you would not need to split your ARC with two pools and would have more slots available for future capacity expansions.

Are you really installing with RAIDz? Why not mirroring, which is ALWAYS faster. Performance in ZFS scales with the number of vdevs. RAIDz is slower and can yield padding issues and weird space usage, if you're not aware of the padding problem (just search in the forum). If you use two pools, just use one mirrored SSD vdev as a pool for the OS.
 
When I wrote RAIDZ for OS, I meant RAID1 equivalent of ZFS or hardware RAID1. I don't want to mix SATA and SAS in one pool.
SAS drives I want to combine into RAIDZ-1 (leaving option for future expansion if needed).
So in conclusion, there is no point of having two ZFS pools, one for OS, and second for VMs, as there is performance loss, right?
 
When I wrote RAIDZ for OS, I meant RAID1 equivalent of ZFS or hardware RAID1. I don't want to mix SATA and SAS in one pool.
Good.

SAS drives I want to combine into RAIDZ-1 (leaving option for future expansion if needed).
every zpool can be expanded with another vdev. Performancewise best option to "copy" the setup, so if you have two vdevs of 3 disks in raidz1, add another 3 disks in a raidz1.

So in conclusion, there is no point of having two ZFS pools, one for OS, and second for VMs, as there is performance loss, right?
Besides the "distinctness", the "arc-split" and the free slots in your machine: no, not in my book.
 
Besides the "distinctness", the "arc-split" and the free slots in your machine: no, not in my book.
No as no point?

So now is time for experimenting and testing. I installed Proxmox on SATA drives with ZFS mirroring just to see how it goes, in the meantime SAS drives arrived, so I will do some tests until my final configuration clears up.
I'm wondering... if I have one ZFS pool for everything, and then I turn on ZFS replication, will it replicate the OS files too?
 
if I have one ZFS pool for everything, and then I turn on ZFS replication, will it replicate the OS files too?
Replication in ZFS is done on a "per-dataset-basis" and each dataset is normally a KVM/QEMU VM disk. You could setup replication in the PVE gui and then it'll be the whole VM with all its internal zfs datasets (at least 1 for BIOS and 2 for UEFI VMs). So the files of the OS will not be replicated .
 
I obviously need to read more about ZFS.
For now I think my config is clear. I'll go with one pool. OS takes now less than 2GB of space, so practically no loss of space. Thanks for you help.
 
I will do that. That is one of the reasons why I'm building VE. To have place where I can play around without risk of breaking anything.

I created VM with PVE node installed on it. Then I created cluster on physical machine and added virtual node to it. I made it to test creating cluster when I have already running VMs on it. I didn't get any issues. So it looks that I can create cluster at any point in time and then attach nodes to it.
Am I right? Or it is better to create cluster at the begging with virtual nodes just for quorum purposes and then replace them with physical nodes?
 
You should not mix virtual PVE nodes with physical ones.
You may create a situation where you need quorum to start the PVE VM, but cannot get it without starting it.
 
  • Like
Reactions: LnxBil
I don't want to do that. My question was if when I want to have a cluster, I have to create it at the beginning when creating VE, or I can create it at any point of time? My test showed that it can be done whenever without harm to existing VMs. But maybe I should expect some issues over time, after running VE for some time as standalone server.
 
Hi,

My question was if when I want to have a cluster, I have to create it at the beginning when creating VE, or I can create it at any point of time? My test showed that it can be done whenever without harm to existing VMs. But maybe I should expect some issues over time, after running VE for some time as standalone server.
Yes, you can create a new cluster at any time from a node, regardless of whether it already has guests or not.

The only thing to note here is that you cannot join nodes to a cluster which already have VMs/CTs - only empty nodes can be joined to an existing cluster.
 
The only thing to note here is that you cannot join nodes to a cluster which already have VMs/CTs - only empty nodes can be joined to an existing cluster.
I know that new node must be empty. I just have to move existing services running on other machines to VE, and then prepare (check hardware setup, install PVE) that machine to join cluster. I don't know how long will it take me to do.

I tried to perform my last test, and it failed. The test was to try extend ZFS pool. I thought that ZFS can do everything that HW RAID can do and much more. Turns out not to be the case.
My test result: "cannot attach /dev/sdf to raidz1-0: can only attach to mirrors and top-level disks".
 
I tried to perform my last test, and it failed. The test was to try extend ZFS pool. I thought that ZFS can do everything that HW RAID can do and much more. Turns out not to be the case.
My test result: "cannot attach /dev/sdf to raidz1-0: can only attach to mirrors and top-level disks".
Actually, you tried to extend a vdev, that is not possible at the moment. Extending the zpool with another vdev would be possible.
 
I found that it is possible, but not in current release version of OpenZFS.
So in conclusion I can't extend my pool by adding one disk at a time. At least not now, maybe in few years.
I have to recreate pool, having only one pool for OS and VMs it means that I have to reinstall whole node.
It's fairly easy to do, I'll go with it.
 
Normally, you extend your zpool or your RAID in groups (vdev for ZFS or a raid group for RAID) and not single disks. At least that's what I did for the last 3 decades. I'm aware that some RAID controllers can do this, yet I never did and never saw the necessity. Have you done this often any why?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!