Recommendations - Proxmox Workstation

marklinton

Member
Oct 12, 2021
7
0
6
48
I'm looking for some guidance on optimizing / validating some design decisions.

- AMD threadripper platform
- 512gb RAM
- 8x NVME m.2 drives
- 8x SATA bulk storage drives

Currently I have the following storage setup with zfs pools:

rpool - root and boot pool with the PVE installation and used for storing ISOs
CoreNVME - main pool used for vm disk storage
CoreStorage - pool dedicated pool for additional fast storage on my vm desktop
CoreSpin - bulk storage pool, additional long term storage drives for VMs, location of the storage device allocated to backups (using a proxmox backup vm)

I used raidz1 for the main VMstorage pool and raidz2 for the spinning disks, thinking that this would provide a good layer of redundancy/capacity, but I'm not sure if this was the right decision. I also used a mirror for the dedicated VM storage with the hope that this would provide high-performance disk for the VM.

I currently use on of the VM's as my primary desktop (with a pass-through GPU) which is working quite well, but I don't know if my configuration is optimized or not, maybe there are some pointers on how to get some feedback on that. (including configuration of the assigned virtual disks (writeback, etc)).

If it's not optimized I'm also not sure how best to restructure it and retain the vm data.

Feedback and ideas are appreciated.




Code:
root@pve:~# zpool status
  pool: CoreNVME
 state: ONLINE
  scan: scrub repaired 0B in 00:45:05 with 0 errors on Sun Feb 11 01:09:06 2024
config:


    NAME                                              STATE     READ WRITE CKSUM
    CoreNVME                                          ONLINE       0     0     0
      raidz1-0                                        ONLINE       0     0     0
        nvme-Samsung_SSD_980_PRO_2TB_S6B0NG0R715972J  ONLINE       0     0     0
        nvme-Samsung_SSD_980_PRO_2TB_S6B0NG0R715938R  ONLINE       0     0     0
        nvme-Samsung_SSD_980_PRO_2TB_S6B0NG0R715974L  ONLINE       0     0     0
        nvme-Samsung_SSD_980_PRO_2TB_S6B0NG0R715459H  ONLINE       0     0     0


errors: No known data errors


  pool: CoreSpin
 state: ONLINE
  scan: scrub repaired 44K in 09:05:14 with 0 errors on Sun Feb 11 09:29:16 2024
config:


    NAME                                  STATE     READ WRITE CKSUM
    CoreSpin                              ONLINE       0     0     0
      raidz2-0                            ONLINE       0     0     0
        ata-ST8000VN0022-2EL112_ZA17XAX8  ONLINE       0     0     0
        ata-ST8000VN004-2M2101_WKD05QH4   ONLINE       0     0     0
        ata-ST8000VN0022-2EL112_ZA17XQ3M  ONLINE       0     0     0
        ata-ST8000VN0022-2EL112_ZA1BN8QK  ONLINE       0     0     0
        ata-ST8000VN004-2M2101_WKD09MZE   ONLINE       0     0     0
        ata-ST8000VN0022-2EL112_ZA1BMST8  ONLINE       0     0     0


errors: No known data errors


  pool: CoreStorage
 state: ONLINE
  scan: scrub repaired 0B in 02:41:33 with 0 errors on Sun Feb 11 03:05:37 2024
config:


    NAME                                STATE     READ WRITE CKSUM
    CoreStorage                         ONLINE       0     0     0
      mirror-0                          ONLINE       0     0     0
        nvme-CT4000P3SSD8_2240E67185E7  ONLINE       0     0     0
        nvme-CT4000P3SSD8_2240E6718625  ONLINE       0     0     0


errors: No known data errors


  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:10 with 0 errors on Sun Feb 11 00:24:15 2024
config:


    NAME                                       STATE     READ WRITE CKSUM
    rpool                                      ONLINE       0     0     0
      nvme-ADATA_SX8200PNP_2J3120088291-part3  ONLINE       0     0     0
      nvme-ADATA_SX8200PNP_2J3220095989-part3  ONLINE       0     0     0
 
CoreNVME - main pool used for vm disk storage
CoreStorage - pool dedicated pool for additional fast storage on my vm desktop
Why separate them? More disks mean better performance for everything. Or are you passing NVMe through to prevent ZFS/virtualization overhead?

I used raidz1 for the main VMstorage pool and raidz2 for the spinning disks, thinking that this would provide a good layer of redundancy/capacity, but I'm not sure if this was the right decision. I also used a mirror for the dedicated VM storage with the hope that this would provide high-performance disk for the VM.
Raidz won't be great for IOPS performance in case your workload care about such things and comes with some additional limitations (padding overhead, you can't remove vdevs, ...).
Your CoreNVME needs a volblocksize of 16K or even 64k to not waste capacity.
Your CoreSpin needs a volblocksize of 16K.

(including configuration of the assigned virtual disks (writeback, etc)).
When using ZFS you usually don't want to use writeback as this will cache the same data a third time in RAM.

PS:
Keep in mind that it is highly recommended to use Enterprise/Datacenter grade SSDs with power-loss protection when using ZFS and expecting good write performance and durability. Your SSDs aren't. Something like a Samsung PM983 would have been a better decision than those 980 PROs. Also make sure to upgrade the firmware of those in case you got them with the bugged firmware where the SSDs will destroy themselves within months.
 
Last edited:
  • Like
Reactions: Kingneutron
@Dunuin Thanks for the feedback, this is exactly what I was hoping to get help with.

They are separated just for the fact that they were not purchased at the same time and they are not all the same size 980s are 2TB, the two Crucial 4TB, and two of the ADATA 4TB. These are all installed using two PCIe 4x NVME expansion cards in two bifurcated slots.

I'm planning on replacing all of the spinning disks (and maybe some of the NVME) with two 15tb Intel U.2 NVMEs, I was thinking that a mirror? of those would be the best for IOPs.

Any suggestions on what my approach should be to migrating to new disks? although if one of the disk dies, I'm not too concerned about having to rebuild as I can recover the data from the PVE backups.

I've attached the list of the current properties for each pool as well (ashift set to 12 for all of the pools):

https://pastebin.com/jUJPCQfy
 
They are separated just for the fact that they were not purchased at the same time and they are not all the same size 980s are 2TB, the two Crucial 4TB, and two of the ADATA 4TB. These are all installed using two PCIe 4x NVME expansion cards in two bifurcated slots.
Wouldn't prevent a raid10 from working which then would give you 12TB of capacity (or 9.6TB as 20% should be kept free when using ZFS).
 
Ok good info to consider. Would adding an optane drive or two to help with caching with this setup? Something like the Intel P4800X?
 
An optane SLOG would highly help with sync writes as consumer SSDs got terrible wear and performance doing these. But then something like 16GB should be fine. You could use the remaining space for L2ARC but usually thats not that useful (especially when already reading from a raid10 of NVMe SSDs). Reading data from the raid10 NVMe pool might actually be faster than reading from a single disk L2ARC.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!