LVM Thin vs ZFS root pool

inxsible

Active Member
Feb 6, 2020
139
8
38
I have a 2U server with a rear drive caddy that supports 2x2.5" drives. I also have 2x 500GB HDDs that I sourced from older laptops. So I thought I'd setup a ZFS mirror as my boot pool and also use the remaining space for my VMs as they would be faster since they would be local. Secondly I can only have 2 drives max for the OS since the chassis only supports 2 drives. The 12 drives up front-- I intend to use them for a zfs pool of 2 vdevs of 6 drives each.

However, reading up on it a bit, I came across conflicting theories regarding the following two points
  1. some people advise against having the root OS and the VM storage on the same disk so that you don't lose VMs in case the OS bites the bullet or vice-a-versa
  2. ZFS also consumes memory.

I thought I can mitigate point 1 by simply having the backups of the LXCs and VMs on my NAS and simply restoring them from the backups if shit hits the fan. Any issues with that approach? Can someone elaborate on the need to keep the OS separate from the VM data (especially if we have backups stored on a separate machine)

Pros:​
  • Re-use old 500GB hdds
  • Re-use space on the 500GB rather than waste it only on the 20gb or so OS
  • HDDs would be better than consumer SSDs in terms of wear & tear
Cons:​
  • HDDs would be slower than SSDs

I am a bit more concerned about point 2 though. After installation of Proxmox, i plan on creating a RAIDZ2 pool of 6x3TB drives which will act as my NAS backup. I intend to have ZFS sync between my TrueNAS(bare metal) zfs and this new pool. Will the ZFS on the root pool compete for memory with the ZFS pool on the 3TB drives? My server is maxed out in terms of RAM at 32GB. So adding more RAM is not possible.

In the above described scenario:
  • Would it make more sense to use LVM Thin for root OS and only use ZFS for the main RAIDZ2 pool or can I use ZFS for both?
  • Should I separate the root OS (on the 500gb HDDs or 128GB SSDs) and the VM data over to the 6x3TB RAIDZ2 pool?
  • Any other root OS configuration that I haven't listed?

Thanks in advance.
 
Last edited:
I thought I can mitigate point 1 by simply having the backups of the LXCs and VMs on my NAS and simply restoring them from the backups if shit hits the fan. Any issues with that approach?
Backups are good and can get you started again. Separated disk can help in some cases to get you up and running again faster, but even there you should have restore-tested backups as nobody can guarantee that just one disk dies.
Can someone elaborate on the need to keep the OS separate from the VM data (especially if we have backups stored on a separate machine)
It decouples stuff, e.g., with ZFS booting with the newest features enabled has been a problem when using GRUB until recently, as GRUB's ZFS implementation was always a bit lacking and so it could be that a new ZFS + a zpool upgrade causing the system to not boot anymore - since PVE 6.4 that particular pitfall has been addressed by reusing the EFI vfat partition even for the non-EFI case, but there may be other more subtle ones still lurking.
Also, having both the systems IO and the guests workload IO go the same storage can interfere with each other, resulting in a less responsive system - depends on the general IO workload and performance of the storage though.
HDDs would be better than consumer SSDs in terms of wear & tear
IMO that's hardly ever true for consumer SSDs either, at least not for those released in the last 7 years or so, especially if TLC or better, the single issue you got there is missing powerloss protection, but I'd figure that also holds for your HDDs. Fact is, no moving parts make for a pretty low wear and tear, granted if you have a very high write workload, rewriting the whole space of the SSD every few weeks or so you get some more significant wear out, so this really depends on the specific workload you plan to have.

After installation of Proxmox, i plan on creating a RAIDZ2 pool of 6x3TB drives which will act as my NAS backup.
I'd recommend reading our ZFS raid level considerations in the docs:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_zfs_raid_considerations
Albeit if you only use it for backups it should be fine as RAIDZ2.

In general, it could be also worth to check out Proxmox Backup Server https://pbs.proxmox.com/docs/introduction.html
It can be co-installed on a Proxmox VE system and would allow for a more efficient backup strategy due to its deduplicate content addressable storage (deduplication with ZFS, while possible, is extremely memory hungry and cannot compete with application level dedup).
Will the ZFS on the root pool compete for memory with the ZFS pool on the 3TB drives? My server is maxed out in terms of RAM at 32GB. So adding more RAM is not possible.
Hmm, I do not think it'd be significantly more than in a combined pool, as the ARC is system-wide and workloads can also compete with each other if using the same pool.
 
@t.lamprecht , Firstly thank you for your detailed response.

Separated disk can help in some cases to get you up and running again faster, but even there you should have restore-tested backups as nobody can guarantee that just one disk dies.
Uptime is not a huge issue for me because my server sits in my home network. If it's down, it's down. I just don't want to lose hours trying to get my data back. So I think having vzdump backups on NAS will suffice in the beginning (see below regarding PBS)
as GRUB's ZFS implementation was always a bit lacking and so it could be that a new ZFS + a zpool upgrade causing the system to not boot anymore - since PVE 6.4 that particular pitfall has been addressed by reusing the EFI vfat partition even for the non-EFI case, but there may be other more subtle ones still lurking.
These are the kind of things that worry me -- the ones that you don't know upfront and cannot mitigate ahead of time. I currently have Proxmox running and I had used LVM for the root OS because I started using Proxmox during the 5.3 releases. I am now installing 7.1 on a new bigger server and was considering if going ZFS for the boot pool would make sense.
MO that's hardly ever true for consumer SSDs either, at least not for those released in the last 7 years or so, especially if TLC or better,
I didn't elaborate my statement, but I generally meant that the SMART attributes for HDDs are standardized to an extent, but SMART attributes for SSDs are all over the place and unless you use proprietary software by the SSD manufacturer some of the attributes are not clearly explained. Also SSDs provide "Remaining Life" -- but don't quite explain how they got that number -- as in how many sectors had they originally kept for wear leveling? How many are currently being used etc. At least the consumer SSDs don't provide much detailed info. I have a Kingston A400 and a PNY CS?00 which provide very very basic SMART data. There is a lot of data -- but very little of it can be useful to determine the current wear status of the drive.
granted if you have a very high write workload, rewriting the whole space of the SSD every few weeks or so you get some more significant wear out, so this really depends on the specific workload you plan to have.
Yeah, this leans mostly towards having separate OS pool (whether LVM or ZFS) and VM data. All my LXCs are always up and running. I use a lot of services, but I don't usually sit and take account of how heavy the writes are for each container -- if i need the service, I use it. So putting it all on SSDs may or may not reduce the life of those SSDs (I honestly don't know)
I have 2 HDDs, so initially I will use those. I will of course set up monitoring for the drives and replace them with SSDs once these HDDs die.
I'd recommend reading our ZFS raid level considerations in the docs:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_zfs_raid_considerations
Albeit if you only use it for backups it should be fine as RAIDZ2.
I do know the basics of ZFS -- having used it for TrueNAS. Although the ashift was something that I hadn't mucked around with it.
I have 512 formatted drives, Should I be using ashift=9 or ashift=12 for those? Does it depend on the drive format?
In general, it could be also worth to check out Proxmox Backup Server
Yes, after setting up the new Proxmox server, it is my intention to set up PBS on an old Core 2 Duo SFF desktop or on a HP T730 thin client. I have both these machines just lying around. I will add a 500GB or 1TB drive in either one for storage of all backups thereby moving away from vzdump backups. the SFF takes 3.5" or 2.5" drives whereas the thin client takes a mSATA SSD.
Hmm, I do not think it'd be significantly more than in a combined pool, as the ARC is system-wide and workloads can also compete with each other if using the same pool.
This is good to know. Although, after reading your post, I am leaning more towards an LVM root pool (tried and tested) and a ZFS pool for data storage. I still don't see the obvious advantage of ZFS root pool other than a bit of redundancy -- but again uptime is not very high on my list.

My strategy -- even for my Archlinux desktop is usually to backup the configs, home etc every 2 weeks. If something goes wrong, I just re-install and then copy the configs etc to the appropriate places to get back up and running. I might follow the same for the proxmox -- especially if i backup the configs properly.


Thanks again.
 
Went ahead with a ZFS root pool. Backed up the containers and VMs from my proxmox 6 install and then restored them on the Proxmox 7.1 install. Then created a new ZFS pool for data backup from my TrueNAS pool and set up ZFS Replication to run every night on my 4 important datasets. Everything works. Pretty uneventful, if you follow the directions correctly -- which is what you would want.


Next step is looking into PBS and deciding whether to use the HP T730 Thin Client or my old Core2Duo Dell Optiplex 755 SFF for the PBS install.
 
Last edited:
Next step is looking into PBS and deciding whether to use the HP T730 Thin Client or my old Core2Duo Dell Optiplex 755 SFF for the PBS install.
sha256 (check summing + CAS addressing) and AES-GCM (encryption) throughput can be relevant for PBS performance too, you can use the proxmox-backup-client benchmark to get an idea if any of those hosts performs those algorithms significantly better. Albeit, most often the storage IO is the bottleneck anyway.
 
  • Like
Reactions: inxsible
Thanks @t.lamprecht . I know that my Dell 755 is quite old with an E8200 cpu. HP thin client is relatively newer, but does not pack a punch. So it's a toss up between the two.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!