Proxmox with HDD SAS disks

zecas

Member
Dec 11, 2019
51
5
13
Hi,

I'm currently starting the process of creating 2 proxmox servers for very similar objectives, both with very similar specs.

The main hardware specs are:
- 2x Xeon E5-2695V2 processors;
- 128Gb PC3L 12800R RAM;
- SAS Controllers in HBA IT mode.

The usage will be to install a couple of VM machines:
- 1x Windows Server 2022 for user authentication and file sharing;
- 1x Windows Server 2022 for some services, like SQL Server Express with some small databases (all less than 400Mb, one of them with maximum 3Gb size, and I'm being very generous since that db currently has 1.5Gb running on another non virtual system);
- 1x or 2x Linux machines for some other services (a predicted possibility, once virtualization is possible/available).

For both servers, the user base will be around 1-3 persons, working with documents on a file share and an application that accesses SQL server.

I do believe these machines will be up to the task, you have a different opinion or should I have any concern about them?

Now the choice of disks is another thing that poses some questions on my mind. I was thinking about using some HDD disks I have laying around, in the following way:

- 1x HP EH0146FARWD 146GB 2.5" SAS 15K - For proxmox OS installation only, and storing the ISO repository;
- 4x HGST HUC109060CSS600 600GB 2.5" SAS 10K - ZFS Raid 10 for VM data, which will give me 1.2Gb of available space.

They are not comparable with SSD speeds, still they are all SAS enterprise grade disks, and I'm thinking that they would still be up to the task.

I've bought some enterprise SSDs for some other systems, but now to reduce costs and to put the HDDs for good use, I was considering them a possibility.

Backups are obviously important will be made to a remote location, both the proxmox OS settings and also the VM data.

What are your opinion about this solution with those HDDs?

Any opinions are very much appreciated.

Thank you.
 
if OS on single disk is ok for you, sure, why not.

this will be fine.

Well, I can put a pair of HP EH0146FARWD 146GB 2.5" SAS 15K and set them on Raid1 for proxmox OS. That would give me more reliability on the OS.

maybe power consumption is an argument pro ssd. with 2 x ssd you could substitute 5 spinning disks and get more performance.

Yes, going for SSDs would give me a lower power consumption and greater performance, even on SATA connectors. The thing is I wanted to keep costs as low as possible, and I have those disks laying around.


What about ashift settings?

Thinking about those HGST disks that will receive the VMs, should I set them with ashift=12? I'm having some difficulty in finding the sector size on those disks, and I've been reading on multiple places that ashift=12 should be the "default" option to rely on, even if the disks are 512b sector size.

Thank you.
 
You can inspect the disks with fdisk -l. It will show you what the disks is reporting to be using as the "physical sector size". In case it reports 512B you could use ashift=9. Just keep in mind that you later won't be able to replace a failed disk with a 4K sector size model when not using ashift=12.
 
You can inspect the disks with fdisk -l. It will show you what the disks is reporting to be using as the "physical sector size". In case it reports 512B you could use ashift=9. Just keep in mind that you later won't be able to replace a failed disk with a 4K sector size model when not using ashift=12.

Thank you for your tip, I'll check them out to see what they are reporting.

I'm aware that ashift can't be changed latter on, and if I replace one disk with a 4k sector size it will have a negative impact with some excessive writes, that's one of the reasons I read some places that nowadays it would always be a better choice to set ashift=12, whichever disks we are using, because the penalty of setting ashift=12 on a 512b sector size disk shouldn't be that much.

For that same reason, someone was also suggesting to define ashift=12 as the default on ZFS, as it would make more sense and would have less impact if someone just goes with the default value.
 
I'm aware that ashift can't be changed latter on, and if I replace one disk with a 4k sector size it will have a negative impact with some excessive writes, that's one of the reasons I read some places that nowadays it would always be a better choice to set ashift=12, whichever disks we are using, because the penalty of setting ashift=12 on a 512b sector size disk shouldn't be that much.
Yes, but ashift=9 also got its benefits when all disks are really 512B native. You will have less write and read amplification as you don't need to increase the volblocksize that much.

Lets for example say you want to run da 5 disk raidz1. Here you would need to atleast have a volblocksize that is 8 times the ashift or otherwise you will lose too much capacity because of padding overhead. Ashift=9 would allow you to use 8x 512B, so a a 4K volblocksize. With ashift=12 it would be 8x 4K, so a 32K volblocksize. Now lets say you do a lot of 4K sync writes. Every 4K read/write operation would need to read/write full 32K block, so 8 times the SSD wear and only 1/8 of the performance. Wouldn't be a problem with 512B sectors and a 4K volblocksize.

And then there is block level compression. Lets say you use a 8K volblocksize and ashift=12. No matter how good a 8K block is compressible, if it is less than 50% compressible, than it still consumes the full 8K, as ZFS can only use a multiple of the ashift. So every block has to be stored as either full 4K or 8K. With ashift=9 this would be less of a problem, as a block can be stored as a multiple of 512B. So a 40% compressible 8K block would only consume 5K of space instead of 8K, as it can be stored as 10x 512B blocks.
 
Last edited:
Yes, but ashift=9 also got its benefits when all disks are really 512B native. You will have less write and read amplification as you don't need to increase the volblocksize that much.

Lets for example say you want to run da 5 disk raidz1. Here you would need to atleast have a volblocksize that is 8 times the ashift or otherwise you will lose too much capacity because of padding overhead. Ashift=9 would allow you to use 8x 512B, so a a 4K volblocksize. With ashift=12 it would be 8x 4K, so a 32K volblocksize. Now lets say you do a lot of 4K sync writes. Every 4K read/write operation would need to read/write full 32K block, so 8 times the SSD wear and only 1/8 of the performance. Wouldn't be a problem with 512B sectors and a 4K volblocksize.

And then there is block level compression. Lets say you use a 8K volblocksize and ashift=12. No matter how good a 8K block is compressible, if it is less than 50% compressible, than it still consumes the full 8K, as ZFS can only use a multiple of the ashift. So every block has to be stored as either full 4K or 8K. With ashift=9 this would be less of a problem, as a block can be stored as a multiple of 512B. So a 40% compressible 8K block would only consume 5K of space instead of 8K, as it can be stored as 10x 512B blocks.

Thank you for that detailed answer. Still struggling to understand as much as I can...

As for the disk info, I've run an fdisk -l command and retrieved this results on the HP 146Gb and HGST 600gb disks:

Code:
Disk /dev/sdd: 136.73 GiB, 146815737856 bytes, 286749488 sectors
Disk model: EH0146FARWD    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/sdb: 558.79 GiB, 600000000000 bytes, 1171875000 sectors
Disk model: HUC10906 CLAR600
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

On proxmox installation, I would be choosing 2x HP 146Gb disks as RAID1 for the OS, with lz4 compression. But I was thinking that the proxmox installation would inspect those disks and provide the most correct ashift. Don't know if that is the case, because I was presented with an ashift=12, but it should be just an hardcoded default?

How can one tell if these are 512B native drives, when manufacturers may make their drives report that value for compatibility reasons (or any other reason), even if they are 4K drives (probably not the case here).

This ashift should be a much easier to choose setting...

I'll very much appreciate any opinions. Thanks in advance.
 
Last edited:
On proxmox installation, I would be choosing 2x HP 146Gb disks as RAID1 for the OS, with lz4 compression. But I was thinking that the proxmox installation would inspect those disks and provide the most correct ashift. Don't know if that is the case, because I was presented with an ashift=12, but it should be just an hardcoded default?
PVE isn't doing any ZFS optimizations. Its all just the ZFS defaults and you will have to choose better values yourself depending on your hardware, workload and pool layout.

How can one tell if these are 512B native drives, when manufacturers may make their drives report that value for compatibility reasons (or any other reason), even if they are 4K drives (probably not the case here).
You can't know when using SSDs. You can only see what the disks are reporting and here it is 512B for both physical and logical sectors. As you got HDDs it should be native 512B. So ashift=9 should be fine in case you don't plan to ever add or replace the disks that are 4K.
 
Last edited:
I have a very similar setup than yours, but I use two enterprise SSDs (instead of the 146 GB for the OS) in my zpool besides 22x 600 GB SAS 10k drives as ZFS special device in order to speed up my pool tremendously. So that I only have the ZFS pool without any dedicated OS disks. Maybe you can look into that.
 
I have a very similar setup than yours, but I use two enterprise SSDs (instead of the 146 GB for the OS) in my zpool besides 22x 600 GB SAS 10k drives as ZFS special device in order to speed up my pool tremendously. So that I only have the ZFS pool without any dedicated OS disks. Maybe you can look into that.

So you only have a single zpool and use the SSDs as cache?

But you still had to choose what ashift to use right? You made any testing with ashift=9 vs ashift=12, or in your case you may be switching latter on for 4k disks and opted for ashift=12 right from the start?

Thanks.
 
But you still had to choose what ashift to use right?
AFAIK it is per vdev, so you can extent the pool with bigger drives and remove the older ones.

ZFS special device is not a cache. It will offload the metadata from the HDDs to the SSDs for better IOPS performance
Yes and in addition you can put also actual data on them if you have data that needs more IOPS.
 
ZFS special device is not a cache. It will offload the metadata from the HDDs to the SSDs for better IOPS performance.
AFAIK it is per vdev, so you can extent the pool with bigger drives and remove the older ones.


Yes and in addition you can put also actual data on them if you have data that needs more IOPS.

Reality check: Whenever I feel I'm getting more into it, something new just pops up reminding me that I still have a long road ahead... :) But hey, that's the fun part of it, and that's what makes me keep coming back...

From what I've read in a few minutes about ZFS special devices, it really looks very nice, and one can put small files in them, increasing performance a bit further. Didn't hear about it, but it seems a feature that is already around for quite some time.

For my ashift selection, I don't know if I'll replace the drives (OS or main pool) with 4k disks. And upgrading an OS disk vs main pool disk would be different, since they will contain distinct data and therefore will have a distinct usage profile. Maybe the OS would receive SSDs (believe it's not just a replace but would also need to copy the boot data), and the main pool maybe would receive SSDs or higher capacity HDDs (for sure 4k at least). But again, maybe the main pool would be replaced with a completely new one and files moved across pools.
 
Reality check: Whenever I feel I'm getting more into it, something new just pops up reminding me that I still have a long road ahead... :) But hey, that's the fun part of it, and that's what makes me keep coming back...
Isn't that the case with everything? The more you know, the more you don't know ;)

For my ashift selection, I don't know if I'll replace the drives (OS or main pool) with 4k disks.
The main message here is that you can have per vdev ashift, so you can set whatever you want. IIRC, I have the same 600GB 10k disks and they are 512p, so ashift of 9 is perfect. I use them in stripped mirrors and they're perfect for storing compressible data. ashift 9 is very good if you run normal 4K or 8k blocks (volblocksize) on it that are compressible, so that you often only store less than that. This is only true for those 512p disks, with 4096p you will not have any advantages with compression with a regular volblocksize of 4K/8K.
 
The main message here is that you can have per vdev ashift, so you can set whatever you want. IIRC, I have the same 600GB 10k disks and they are 512p, so ashift of 9 is perfect. I use them in stripped mirrors and they're perfect for storing compressible data. ashift 9 is very good if you run normal 4K or 8k blocks (volblocksize) on it that are compressible, so that you often only store less than that. This is only true for those 512p disks, with 4096p you will not have any advantages with compression with a regular volblocksize of 4K/8K.

Ah, I just saw what you are saying about ashift setting per vdev, also the openzfs#566 commit allowed the setting to be included in add/attach/replace commands:

Code:
zpool add -o ashift=12 tank disk1
zpool attach -o ashift=12 tank disk1 disk2
zpool replace -o ashift=12 tank disk1 disk2

Which reminded me of a situation: will ZFS balance vdev contents when adding a new vdev?

Some time ago I read some info that said it didn't, maybe it does it already by current version?

So if I create a pool of:
  • vdev1: mirror 2x 600Gb HDD, 512b sector, ashift=9
  • vdev2: mirror 2x 600Gb HDD, 512b sector, ashift=9

Latter on, with about 50% usage, I add a new vdev to the pool, let's say:
  • vdev3: mirror 2x 4Tb HDD, 4k sector, ashift=12

So if I don't mix different sector disks, I can have the correct most performant ashift on each vdev that it will not impact performance, this is it, right?

But then:
  1. Will ZFS move some data from vdev1 and vdev2 to the new (clean) vdev3, to leverage data? Or how will it behave?
  2. In this scenario, I should only replace a failed HDD from vdev1 or vdev2 with another 512b sector disk, correct? To avoid a penalty of having a 4k disk inheriting an ashift=9 setting.
  3. Is there a possibility to remove a vdev from the pool? For instance if a disk fails from vdev1, instead of replacing it, I just add a vdev3 with 4k disks and ashift=12 as a full replacement of vdev1? Like moving data from vdev1 to vdev3 and removing vdev1 from the pool (honestly didn't yet search around for this, it just popped up as I was writing this message)

Thanks again.
 
Will ZFS move some data from vdev1 and vdev2 to the new (clean) vdev3, to leverage data? Or how will it behave?
No, just new writes. In order to rebalance, you have to send/receive your data.

In this scenario, I should only replace a failed HDD from vdev1 or vdev2 with another 512b sector disk, correct? To avoid a penalty of having a 4k disk inheriting an ashift=9 setting.
Yes, that would be best. I don't know how a vdev behaves if you have different ashift values.

Is there a possibility to remove a vdev from the pool? For instance if a disk fails from vdev1, instead of replacing it, I just add a vdev3 with 4k disks and ashift=12 as a full replacement of vdev1? Like moving data from vdev1 to vdev3 and removing vdev1 from the pool (honestly didn't yet search around for this, it just popped up as I was writing this message)
Yes, you can.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!