ZFS or BTRFS for Proxmox 3.1 ?

jinjer

Renowned Member
Oct 4, 2010
204
7
83
I would like to create a self-contained node for proxmox and need to choose between ZFS and BTRFS for the local storage.

I normally use ZFS for it's snapshot capabilities, but am not sure of it's a viable solution for a self-contained proxmox node.

I would try to keep both the root and the storage on the same disks, and this is a little no-no for zfs (it likes to use whole disks).

Has anyone tried booting proxmox from a ZFS multidisk storage? This is a grub issue, but perhaps also an initrd issue for proxmox.

On the other hand, btrfs can be booted from (altough with the same issues as zfs) but I'm not sure how reliable it is.

Any suggestions are welcome.

jinjer.
 
I would try to keep both the root and the storage on the same disks, and this is a little no-no for zfs (it likes to use whole disks).

Has anyone tried booting proxmox from a ZFS multidisk storage? This is a grub issue, but perhaps also an initrd issue for proxmox.

...if you only have that *one* and single disk, you could try and setup LVM on that disk first.
You may need to install Proxmox via the Debian install method ...not sure if wheezy installer nowadays does support booting from lvm though.

second option is to employ a small USB stick for boot
 
The ISO installer will by default make and LVM based installation. Regarding Debian then boot from LVM has been supported since Squeeze or Lenny.
 
Well,

something I did not think about is that I can always use a smaller partition and install proxmox on only two disks (softraid and lvm).

Then I can use the rest for a mirrored vdev of the zfs pool.

The rest of the disks can be used as raw disks in the zfs as other vdevs.
 
The ISO installer will by default make and LVM based installation. Regarding Debian then boot from LVM has been supported since Squeeze or Lenny.

True, but the boot partition will not be LVM based and AFAIU the OP, zfs as the one-and-only filesystem is the requirement here.
 
...this is not a real big deal.
Based on your first post, I've gathered that you only want to employ a single disk.
If you're happy with /boot being *not* on zfs, use the Debian way and use zfs for all filesystems on LVM, including root and storage
 
Sorry... I just need to redo the /var/lib/vz mount as I need using what is left from the original lvm partition and add the rest of the disks.

BTW, is there a way to install proxmox directly on top of a raid-1 setup without the need to manually hack it after installation?
 
No problem if you use a HW raid controller.
I'm a die hard that hates hardware raid on linux :)

Hey... I don't seem to find pve-headers for the pve-kernel and those are needed to install zfs on proxmox 3.1. Any hints?
 
I'm a die hard that hates hardware raid on linux :)

Hey... I don't seem to find pve-headers for the pve-kernel and those are needed to install zfs on proxmox 3.1. Any hints?

-> instructions as per wiki did not work? -> http://pve.proxmox.com/wiki/ZFS
  • make sure pve headers are installed. if not :
aptitude install pve-headers-$(uname -r)

Edit: bummer!...what a coincidence..looks like you're not the only one... -> http://forum.proxmox.com/threads/16326-pve-headers-install-problem
Edit2: check your /etc/sources/list ..pve-repos might be missing
 
Last edited:
I would look into using CEPH on top of ext4 instead of ZFS or BTRFS.

CEPH is pretty much out of question since you are shooting for only one node setup as far as i can tell. In order to setup CEPH you need more than one node or else it is just plain waste of time with hardly any benefit. For single node setup, ZFS cant be beat. I am not sure how mission critical your setup is, but the fact that you are going for Single node Proxmox tells me redundancy is not a big issue in your case. A simple headache less setup would be putting Proxmox on 1 SSD or 2 SSds if you need somewhat redundancy. Then use rest of the local HDDs to create ZFS volume. ZFS is resilient enough to tackle just about any disaster on a single node.
 
Meanwhile ZFS announced that it was ready for production [theregister.co.uk] on Linux only this spring (March) so it's not like it's that old and stable like people like to think.
Every year I hate LVM more and more and hope btrfs will finally become of age (ie. announce it's stable) but I fear they are not interested to do so easily.
The Better File System (btrFS) has dedup which would be a great help in reducing storage need, since most of OpenVZ containers are built with mostly same files.
Fact is hard-drives cost money and we are not all rich - so I for one would love to see solutions that save me money. There is an interesting thread about new file-systems here.
And I really hate the way LVM reboots after it's been used for a while. Who's idea was it that it's acceptable for a server to reboot for an hour? Specially when it's a headless server and you can't really see what is taking so long.
 
Last edited:
CEPH is pretty much out of question since you are shooting for only one node setup as far as i can tell. In order to setup CEPH you need more than one node or else it is just plain waste of time with hardly any benefit. For single node setup, ZFS cant be beat. I am not sure how mission critical your setup is, but the fact that you are going for Single node Proxmox tells me redundancy is not a big issue in your case. A simple headache less setup would be putting Proxmox on 1 SSD or 2 SSds if you need somewhat redundancy. Then use rest of the local HDDs to create ZFS volume. ZFS is resilient enough to tackle just about any disaster on a single node.
I ended up making a node with 8 x 1TB 2.5 hard disks spinning at 7200rpm.
proxmox was installed on two of these hard disks by manually partititioning and using only 64GB per disk. I them manually converted the install to md linux soft raid1.
The rest of the disks became the first vdev of the mirrored striped zfs pool. That is about 3.5 formatted capacity.

The node is mission critical, but down time is a minor issue. I have a secondary backup node that is receiving daily snapshots of all the zfs filesystems.

BTW: I must look at ceph for a more distributed solution. I can't make my mind whether to use ceph or glusterfs performance-wise.
 
Last edited:
The rest of the disks became the first vdev of the mirrored striped zfs pool. That is about 3.5 formatted capacity.
It is not recommended to share ZFS disks with other file systems as this can lead to performance issues and data loss.
"ZFS can use individual slices or partitions, though the recommended mode of operation is to use whole disks"
"For pools to be portable, you must give the zpool command whole disks, not just slices, so that ZFS can label the disks with portable EFI labels. Otherwise, disk drivers on platforms of different endianness will not recognize the disks."
http://www.manpagez.com/man/8/zpool/
 
It is not recommended to share ZFS disks with other file systems as this can lead to performance issues and data loss.
"ZFS can use individual slices or partitions, though the recommended mode of operation is to use whole disks"
"For pools to be portable, you must give the zpool command whole disks, not just slices, so that ZFS can label the disks with portable EFI labels. Otherwise, disk drivers on platforms of different endianness will not recognize the disks."
http://www.manpagez.com/man/8/zpool/

Thank you for pointing that out. My take is that some of this information is not current.

The issue endianness of different platforms is only theoretical. I don't plan to mount this pool on some motorola or risc hardware, as I don't plan mounting it in solaris too. I'm quite sure I can mount it on solaris amd64.

Regarding the issue of performance, it's a non-issue too. The proxmox base distro is eating around 10iops from the disks, which is not a problem with the 800+ iops from the 8 disks. This is only theoretical. Real world bonnie++ performance gives around 400 seeks/sec and about 480mb read, 380 mb write and 180 mb rewrite for the pool (not bad for only 8 disks).

The race conditions for using zfs disks accross pools might still be there. However the problem lies within zfs itself and not between different type of filesystems. I see no problems with running ext4+zfs on a single set disks. Also, it's not common practice to run log and cache devices on the same set of SSD disks with no issues (log is a negligible portion of the ssd, and the rest is pure l2arc cache).

From a cost perspective, this is $100 saved on a $400 disk disk pool and two sata ports free to be used for more disks.

So far, the array has passed several bonnie++ benchmarks and is now running a couple of KVM windows machines. I also copied a few TB of data back and forth from and to backup storage, and had no issues so far.
 
Also, it's not common practice to run log and cache devices on the same set of SSD disks with no issues (log is a negligible portion of the ssd, and the rest is pure l2arc cache).
If you know of common practice you must also be aware of the rule of thumb: As long your hardware supports adding more RAM then add more RAM instead of separating zil and l2arc to physical devices - RAM will always be faster than a device.
 
ops. I wrote "not common practice" while I meat "it is now common practice".

Sure, ram is king for zfs. At the end it all depends on your data and access pattern. Say you need a separate zil but are cost conscious: It's a pita to use a big mirror of SSD disks just to store 1-2Gig (at best) of zil. You partition the SSD and do a mirror zil on 2 disks (or three) and the rest goes to striped l2arc: this is standard setup.