proxmox 5.0 works great with btrfs! :D

Pablo Alcaraz

Member
Jul 6, 2017
53
8
13
54
Hello

First, I want to thanks and express my admiration to proxmox team and their great job! I am evaluating proxmox using 1 node and I would like to report my progress.

I installed a PVE node by creating a Debian 9 server and installing Proxmox 5.0 on top. I used btrfs as the main filesystem.

This node is running 1 postgresql 9.6 server inside a container with SDD local storage attached as a directory, meaning the database is located in an external btrfs volume outside the container. This databse deals with lot of queries and it is highly loaded using 16gb of RAM. So far, so good.

My idea is to have independent volumes, 1 by each storage unit with no RAID. I plan to implement HA by replicating snapshot of the btrfs volumes in other Proxmox instances. My HA needs are in the hour scope (I can have a server down up to 1 hour). We are good is we loose some hours of data so this configuration works for us. We believe there are better solutions than a RAID for data that does not change too much (I believe RAID volumes are needed near realtime data, but they have a cost in space and flexibility that is not needed to be paid with data that does not change too much).

I would like to point that btrfs is more stable now. Debian 9 has a reasonable recent version. We were using it from 2015 and it worked always great. It does not consume so much memory as a ZFS and it is not so intrusive from an administrative point of view.

I report all of this in the hope you consider to support btrfs in Proxmox for local storage and Storage Replication. Backup and Snapshots could benefit of btrfs too.

I would like to hear about other Proxmox deployments using btrfs. How did it work for you?

Regards,

Pablo
 
  • Like
Reactions: Joe Baker
Did you do the chattr +C for the database directories? I heard that is good to do on logfile directories and the like to disable copy-on-write feature for that file or folder. Since I use btrfs on my laptop, I find I at least need to do that for the directories that virtualbox machines run from. I imagine that at some point there will be schemas of folders which need to be modified in this way, or when applications are installing database directories, they will test for btrfs and use the chattr +C for those data directories.
 
  • Like
Reactions: Pablo Alcaraz
I did not do it. I am collecting some stats this week. I will apply next week so I have enough stats to compare.
 
Did you do the chattr +C for the database directories? I heard that is good to do on logfile directories and the like to disable copy-on-write feature for that file or folder.

btrfs without cow is a bad decision. Yes, because performance is bad for DB, it is good to disable cow. And you can have good performance with btrfs using btrfs(for most cases). Postgres is a exception, because use 8k blocksize, as btrfs. But others use a mixed values (mysql/mariadb/perconadb) need different values (16 k for database files, and 128 k for log files).
In this case btrfs can not be like zfs. Also on btrfs if you have a mirror, and one disk is broken, at reboot you will have problems.
And as a final note, without cow, you do not have any checksums, and no snapshots. Any checksums error are detected but are not correct at the runtime. Only scrub can correct this erors.
So btrfs it is good in terms of performance(memory usage), but not if you care about data safety. If you do not care about your data, btrfs is very good, like many software that are not ready for production.


Think at what you want :)
 
  • Like
Reactions: Alessandro 123
I did some experiments. Speed improvements using chattr +C are marginal in my postgresql deployment.

As @guletz pointed, it kinds of defeat the purpose of using btrfs if we deactivate cow and the 12% extra performance does not really pays off. I will continue using cow in this configuration.

I started using btrfs with ubuntu 14.04 (kernel 3.13). If something happened to a HDD in a multidisk volume (no only mirrors) using that btrfs version, you did loose all the data! It does not happen with kernel 4.4. I expect btrfs is more stable with a kernel 4.10.

I will do some tests next week in multi volume configurations just to confirm it. I do not need that in my configuration though, however it will be good to find out.

I believe it is not about caring. It is about reliability. We are good if we recover from a storage problem in 1 hour. But if the database is slow, it is BAD for us. We prefer to loose 1 hour of work one day than 4 hours of work everyday because speed is 50% compared with what we are getting now.
Therefore in my case, I am replacing extra HA that I do not need with performance:

* Postgresql server lives in a container so I do not have all the good things of a VM.
* It is connected to an external btrfs SDD storage using a directory path. If I need to move the container, I will have to do it by hand. No biggie.
* No RAID used. We will be able to add/replace SDD drives with drives of other sizes/speeds. This one is risky and noisy for me. But the fact is that RAID have some costs and we believe that we can have more flexibility if we provide HA on other levels.
* I am creating btrfs snapshots and rsync them to other volume. Recovering could be quick if I use a btrfs snapshot or slow if I have to copy data back.

The solution works, it is good enough to recover in 1 hour. That is what we need. ;)

Of course if I work in an exchange, I will use RAID 6, hot backups, redundant hardware, 1Tb RAM servers, bare metal, etc... and I will have other budget for operations.
 
Pablo,

... be smart and wise ;) When I was a junior IT admin, a old guy has tell me this: in any situation/enviroment, when you use X software, find in advance a replacement. Nothing is good forever. Be prepear at any moment to change your X soft with Y. I know this is not so simple, but in some occasion this ideea save my job.

Have a nice day, without any bad event :)
 
  • Like
Reactions: Pablo Alcaraz
@Pablo Alcaraz - could you explain what advantage you get with BTRFS vs ZFS? Seems like ZFS meets all of your requirements, is stable, and is portable across a wide variety of operating environments. On the other hand, BTRFS is nascent, its stability is subject to question, and it does't really add any new or specialized capabilities. Plus with ZFS you don't have to disable key capabilities (e.g., COW) to maintain performance.

I'd understand if you work working a development project and had some need that BTRFS could support, but it seems your main drivers are (1) software only raid-like disk groupings, (2) performance and (3) reliability.

Don't misunderstand - I'm not bashing. I'd really like to understand what motivates the desire for BTRFS.
 
in any situation/enviroment, when you use X software, find in advance a replacement.

I am doing that. Keep my backups in synchro with local and remote destinations and I do not promise what I cannot deliver. It is good advice and I appreciate it.
 
could you explain what advantage you get with BTRFS vs ZFS?

I like BTRFS because it is simpler than ZFS to manage (for me). I like of it that I can convert a volume from simple to duplicated (data/metadata duplicated in the same volume) to RAID 0, RAID 1, RAID 10 and back again online without interruptions while it is being used. I am not crazy so I am not considering using RAID 5 or 6 in BTRFS. I plan to upgrade my devices at different times with different drivers sizes. BTRFS supports adding different device sizes on the same volumes maximizing available space.

I am not using features where ZFS shines like online deduplication and stability. In my scenario BTRFS is useful because I must deal with commodity hardware bought at different times being redeployed in different server configurations. I cannot establish a policy to buy hardware (like I would like) without stress myself or my bosses. Basically my infrastructure is not stable enough :)

For example, I HAD a RAID 5 configuration by hardware and I was asked to provide more space, but I did not get the storage I needed to rebuild a bigger RAID 5 configuration: just 2 new SDD... In this scenario flexibility and online reconfiguration are better than forever stability.

I compensate HA with backups + restore procedures. My data is not online and it is generated and updated in batches. My organization can wait up to 4 hours without be affected so I have time to restore information. It is better for us to use any resource to process data faster than in a HA environment. This is because the nature of our work.

This eternal hardware redistribution is not the usual scenario. But it is not uncommon (I believe).
 
Huh? ZFS uses the amount you give it. The fact it has defaults shouldn't prevent you from tuning it for your situation. Unwise to pick an unstable filesystem over a very stable one.

I want to give it zero extra memory, no cache and I want it delivers fast like the wind. BTRFS let me do that. The prize is that its volumes will not last longer than ZFS volumes. I am good with that by compensating with backups + time to restore them.

My scenario is not common. We are crunching numbers and we are good if we can work faster even if we risk some hours of data loss. We can compensate that by reprocessing lost data. I am not saying ZFS is worst. Definitely BTRS is not better. But it is more flexible and easier to manage.
 
  • Like
Reactions: guletz
I am not sure. If it does not, I hope it integrates BTRFS one day. I am using BTRFS from 2017 with proxmox and it works fine. No issues at all.

You can use it with the Directory storage (type dir) in https://pve.proxmox.com/wiki/Storage and https://pve.proxmox.com/wiki/Storage:_Directory

btrfs is good when you look for similar features then ZFS but you do not have the hardware. It is solid and let you be happy with very few commands. If you are thinking on RAID in btrfs, it works fine on RAID 0, 1 and 10. RAID 5 and 6 are unstable (and they do not mean the same than Hardware Raid 5 or 6). You can check the status here: https://btrfs.wiki.kernel.org/index.php/Status

Other than that, you have deduplication, compression metadata or data multiplication even in JBOD if you like.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!