LizardFS anyone?

I have been using LizardFS for more than 10 years now. I have never lost any data because of failed hard drives, and I did had my fair share of Hard drive failures in the past 10 years. I ran 1 chunkserver for each disk in a lxc container, and the master in a container as well, all on the same server. Essentially I use Lizardfs as a software RAID running on a single machine. (a on-the-fly per folder configurable RAID... it's pretty neat)

Having said that, there are quirks that you have to be aware:

1. the project is stale for a few years now. It's not a big deal, but there are bugs that won't be fixed: https://github.com/leil-io/saunafs/issues/7

2. After 10 years, my Lizardfs Master server uses almost 5GB of ram (the amount is directly connected to the amount of files on lizard)

3. I do use EC2_1, EC6_2 (you need at least 8 chunk servers for this, but you get 2 disks of redundancy, with just 1.33% of usage) and simple duplicates from 2 to 6. Any essential data I leave on 4-6 duplicates, which means the data is copied over to 4 to 6 disks. I did lost files with EC2_1 before. Never lost EC6_2 though. From my experience, you need a minimum of 2 redundancies (3 disks) with Lizardfs/Moosef, or you risk loosing data.

4. The BIGGEST problem with booth LizardFS and Moosefs are the amount of time it takes to replicate the needed copies when files are undergoal, and most importantly, takes an extreme amount of time to DELETE chunks that are not needed. The only way to free up space is to change the master config so it uses much more resources to do things faster... only then you get replication/deletion going.

5. The most important undocumented problem that can cause lost of data: I jut found out last week, that the master server seems to "forget" to save the metadata file every hour. In 10 years, that wasn't a problem. But last week, The master crashed after I have added a few extra terabytes of data to the filesystem. In the morning, I notice all my files since 2025 March 11 disappear. Then I noticed the last metadata file was from March 11!
So, essentially, because it didn't save the metadata, when it restarted, it could only load the metadata from a month ago. So all the changes in the filesystem since then got lost.
The biggest issue with this is that all chunkfiles created to hold the data of the new files created in the last month are still on disk, and there's no way to find those orphans and delete then to free up space.
I am now waiting to see if the master will figure the orphan chunks by itself and delete then, but since chunk deletion is slow, it will take some time to free up that space.
To correct the "forgetfullness" problem, I setup an hourly cron job that runs "lizardfs-admin save-metadata", which forces the master server to save the metadata file. So this problem will never happen again.

So, after 10 years, I'm considering migrating from LizardFS to SaunaFS, which seems to be a fork of LizardFS that is under active development. There's even compatibility between chunkserves from Lizard and Sauna, so in theory it's possible to just start by migrating the master server...

LizardFS has worked for me pretty well for 10 years... It did gave me a few headaches (like last week), but it does the job pretty well, and I do have 2 disks of redundancy on about 32TB of data, that only uses 47TB of disk space.

my 2 cents... hope that helps someone who still thinks about LizardFS!

If you are, I would recommend to try SaunaFS instead, since it's in active development... But I'm not sure how trustful the code is... LizardFS has been good for me for the past 10 years!
 
Last edited:
I have been using LizardFS for more than 10 years now. I have never lost any data because of failed hard drives, and I did had my fair share of Hard drive failures in the past 10 years. I ran 1 chunkserver for each disk in a lxc container, and the master in a container as well, all on the same server. Essentially I use Lizardfs as a software RAID running on a single machine. (a on-the-fly per folder configurable RAID... it's pretty neat)

Having said that, there are quirks that you have to be aware:

1. the project is stale for a few years now. It's not a big deal, but there are bugs that won't be fixed: https://github.com/leil-io/saunafs/issues/7

2. After 10 years, my Lizardfs Master server uses almost 5GB of ram (the amount is directly connected to the amount of files on lizard)

3. I do use EC2_1, EC6_2 (you need at least 8 chunk servers for this, but you get 2 disks of redundancy, with just 1.33% of usage) and simple duplicates from 2 to 6. Any essential data I leave on 4-6 duplicates, which means the data is copied over to 4 to 6 disks. I did lost files with EC2_1 before. Never lost EC6_2 though. From my experience, you need a minimum of 2 redundancies (3 disks) with Lizardfs/Moosef, or you risk loosing data.

4. The BIGGEST problem with booth LizardFS and Moosefs are the amount of time it takes to replicate the needed copies when files are undergoal, and most importantly, takes an extreme amount of time to DELETE chunks that are not needed. The only way to free up space is to change the master config so it uses much more resources to do things faster... only then you get replication/deletion going.

5. The most important undocumented problem that can cause lost of data: I jut found out last week, that the master server seems to "forget" to save the metadata file every hour. In 10 years, that wasn't a problem. But last week, The master crashed after I have added a few extra terabytes of data to the filesystem. In the morning, I notice all my files since 2025 March 11 disappear. Then I noticed the last metadata file was from March 11!
So, essentially, because it didn't save the metadata, when it restarted, it could only load the metadata from a month ago. So all the changes in the filesystem since then got lost.
The biggest issue with this is that all chunkfiles created to hold the data of the new files created in the last month are still on disk, and there's no way to find those orphans and delete then to free up space.
I am now waiting to see if the master will figure the orphan chunks by itself and delete then, but since chunk deletion is slow, it will take some time to free up that space.
To correct the "forgetfullness" problem, I setup an hourly cron job that runs "lizardfs-admin save-metadata", which forces the master server to save the metadata file. So this problem will never happen again.

So, after 10 years, I'm considering migrating from LizardFS to SaunaFS, which seems to be a fork of LizardFS that is under active development. There's even compatibility between chunkserves from Lizard and Sauna, so in theory it's possible to just start by migrating the master server...

LizardFS has worked for me pretty well for 10 years... It did gave me a few headaches (like last week), but it does the job pretty well, and I do have 2 disks of redundancy on about 32TB of data, that only uses 47TB of disk space.

my 2 cents... hope that helps someone who still thinks about LizardFS!

If you are, I would recommend to try SaunaFS instead, since it's in active development... But I'm not sure how trustful the code is... LizardFS has been good for me for the past 10 years!
Updating in this thread, as @hradec already found the solution to their issue here :) https://github.com/leil-io/saunafs/issues/349
In summary, they got a problem with the system clock

I'll try SaunaFS, it's recommended by one of the team members of LizardFS, as already mentioned. This is the original source: https://github.com/lizardfs/lizardfs/issues/805#issuecomment-2238866486
 
Updating in this thread, as @hradec already found the solution to their issue here :) https://github.com/leil-io/saunafs/issues/349
In summary, they got a problem with the system clock

I'll try SaunaFS, it's recommended by one of the team members of LizardFS, as already mentioned. This is the original source: https://github.com/lizardfs/lizardfs/issues/805#issuecomment-2238866486
Urmas Rist from Leil Storage (https://leil.io) has ported SaunaFS to Debian and I've installed SaunaFS on my Proxmox cluster using deb packages that I built under Debian 13 (trixie) from sources uploaded by Urmas for a Debian ITP (Itent To Package) at: https://salsa.debian.org/hpc-team/saunafs. I'm using it as directory storage for backups and filesystem pass-through using "virtiofs". Leil Storage only support SaunaFS on Ubuntu at present, but are interested in the experiments we have been doing to evaluate SaunaFS as the SDS shared filesystem for Proxmox 9 running under a Debian 13 (trixie) host OS.
 
Last edited:
  • Like
Reactions: morph027 and UdoB
Hi all,

Leil Storage CEO is here (the company behind SaunaFS). I would like to clarify a few things to you as people who had interest in LFS in the past and may want to know more about directions the project has taken since then.

We hired all of the late LFS team early 2023 when the project finally went bankrupt, including the LFS creator. Idea was to use the code as a base, fix birth defects, technical debt and extend the FS to support technological innovations and scale unachievable earlier.

Now almost 3 years later, and thousands of merged PRs after, we have a very different, much more performant, much more stable product that still offers the best that LFS had to offer back in the day - simplicity and ease of use. There is no intention to maintain compatibility for long as we treat the project not as "extension" of LFS but rather as moving it much further to be the modern file system that is still built on great pillars of Google File System design as many file systems created years and even decades ago. These pillars are still solid and strong but they need to be re-architectured in certain ways and this is what we are doing. So we are definitely the "relative" but more like a grand grandson of Google File System as many systems out there, including one of the best known ones and related to LFS - MooseFS.

My question to you as to people who apparently were deeply familiar with the system, and with technology in general, what would you outline as critical to have, must have, and nice to have in functionality/features/tools for the distributed file system that would look close to ideal for you (assuming the love for LFS was real)? Even if your views on this can be from years past, it will be very valuable to know for us.

There is no sales motive between the lines. I am only looking for ways to collect opinions and feedback of fans from the past (and maybe today) to then see if it is something we can add to the product. Be it Proxmox use case or else. Your opinion will matter a lot and I will appreciate if you can share it here.

Thank you.
 
what would you outline as critical to have

I am not a multi-filesystem expert but more a user/administrator. My very limited but honest opinion is this: reliability, reliability, redundancy, performance, snapshots, compression, encryption. In this order.

Yes, reliability first and second. Compare it with (local) ZFS - it will never deliver damaged data. At least that's the plan ;-)

Probably the elephant in the network-room for distributed storage is Ceph, right? How might a comparison with SaunaFS look like, feature-wise?

Good luck! :-)
 
I'm running SaunaFS on a 3-node PVE cluster plus a PBS server, with 4@8TB Seagate Barracuda DM-SMR SATA drives in each one: I tried Ceph and performance was terrible. Reading around the issue, I now understand that Ceph works best with SSD's. I had similar experience running Ceph on a Beowulf cluster with DM-SMR drives, which is why I switched to LizardFS. However, the LizardFS project ended badly, with a FTBFS error resulting in it being removed from Debian. SaunaFS is being developed as a replacement for LizardFS by a team that includes the original LizardFS developers. It performs well on a 4-node Beowulf cluster under Ubuntu 22.04 and on that basis I tried to port it to Debian myself. I asked the SaunaFS developers to submit a Debian ITP so that, eventually, SaunaFS will be available in the Debian repositories (at present Leil Storage only support SaunaFS under Ubuntu 22.04 and 24.04). I think this is a great opportunity for Proxmox users to try out the SaunaFS SDS system under Debian 13 (trixie). Please post comments here if you do. I am, also, using ZFS with filesystem pass-through on a single-node Proxmox server and it works well.
 
Last edited:
  • Like
Reactions: UdoB
What is the use-case for SaunaFS in Proxmox, for image storage like CephFS or do you run VM's on it? Since you are using sata drives, did you run some fio tests or similar to compare the perf to ceph?
 
My use-case is shared storage for VM's using "virtiofs" filesystem pass-through, so I can run e.g. Slurm or have the same home directories on different VM's as I did on Beoulf clusters. I want to use Proxmox for node-provisioning, because it's more flexible and easier to administer remotely than running on bare-metal. Ceph ran so slowly on my DM-SMR drives that I switched, immediately, to LizardFS and now SaunaFS. I've been running the fio-cdm benchmark to 'tune' SaunaFS on my COTS hardware, but write performance is terrible on DM-SMR drives if SaunaFS uses "fsync" to flush writes to the disk immediately. Running without "fsync" allows the DM-SMR drives to catch up because writes are deferred. I've had about 30-60% of my 10g network bandwidth occupied and disks 60-80% busy during simultaneous Proxmox backups to 'Dir' storage from three PVE nodes. PBS is not a good use-case for SDS because my PBS server has to write to three other SaunaFS chunkservers as well as receiving the backup stream from the Proxmox VM being backed up. Let's just say that this is work in progress, but it's interesting and I think it has a lot of potential for COTS Proxmox clusters. Leil Storage are working on much more advanced support for HM-SMR drives, but that's not applicable to the type of small-scale COTS servers used in scientific laboratories that are of interest to me.
 
My question to you as to people who apparently were deeply familiar with the system, and with technology in general, what would you outline as critical to have, must have, and nice to have in functionality/features/tools for the distributed file system that would look close to ideal for you (assuming the love for LFS was real)? Even if your views on this can be from years past, it will be very valuable to know for us.
Laughs. easy question ;)

SO... as I am fairly certain you already understand, this can't have an answer because the variables will be different with application/use case. If you want to make a file system that is a do-all you're going to try to outceph ceph, and thats likely a losing proposition if for no other reason that they have such a long lead. Instead, I imagine that unless you have infinite developer resources you probably want to identify a vertical to attack- one that you may already have some in house expertise in and sales resources into.

Without that, I can just describe a few features that don't seem to exist in this space and could be killer features. in no specific order:
1. multithreaded OSD. NVMEs have redefined the capability of the lowest storage building blocks where all the old ways are effectively their own bottleneck. Crimson is still 2-3 years away from being on any production cluster I'd be deploying- there is an opportunity there.
2. dynamically adaptive pgs. in all current implementations I'm aware of, a pool as to be defined as a rigid one-rule-for-the-whole-pool. The ability to specify rules by block size, initiator groups, or other system or user supplied logic can be a gamechanger- 4k sync writes go to a rep3 on NVME, 128k writes go on 8k2n EC stripe, etc. ZFS is taking some positive steps here with the special device but taken to its logical conclusion can be a killer feature.
3. WORKING cache layers.
4. be faster than lustre on silos of raid.
5. CFS for a shared Lun (this is one of PVEs weakest points- not really the target of what you're doing but there's some overlap...)