Glusterfs is still maintained. Please don't drop support!

Who cares if it's old, if it works (vs. it not working)?
I do. Because running software which doesn't get security updates belong in museums, not in production, even in homelabs.
2) The goal here is to demonstrate with data, from within a VM, that gluster is still viable from a technology standpoint.
Thins won't change anything for ProxmoxVE though. GlusterFS support is deprecated and wil be removed in qemu and thus also from ProxmoxVE due to the stalled development by Redhat. Benchmarks ( no matter on which version) won't change this but convincing qemu developers that glusterfs support should be kept. Ideally also volunteering for maintaining glusterfs and it's support in qemu
 
  • Like
Reactions: UdoB
Seemed to have been a time-specific bug, because I currently do have at least one server with a MegaRAID SAS which is conveniently still in the Linux kernel: https://github.com/torvalds/linux/blob/master/drivers/scsi/megaraid/megaraid_sas.h - it works on modern Ubuntu kernels, you seem to be pointing to a very time-specific bug in the kernel around 6.8 which seems to have been resolved regarding JBOD mode? I actually have multiple servers running with Proxmox that have some form of MegaRAID controller in them (not my choice, conversion from VMware garbage).
:shrug: dunno if it is time specific.

(I mean, one would only be able to say this, in retrospect. At the time, when it happened, there was little means to know whether it was going to be a (permanent) issue or one that would've been resolved as a function of time. Either way, the point still remains: at the time, I couldn't update on account of it.

And now, whilst I could upgrade to PVE8, but then you'll invariably get others that'll ask the natural question "why not just upgrade to PVE9 anyways?" (and the answer to that question is because PVE9 brings with it, other issues. (cf. e1000 NIC issue).)

Thus, if said e1000 and 9361-8i works in PVE7, why break it?
NVIDIA Ethernet is cheaper than NVIDIA InfiniBand fabric and NVIDIA is pretty much the most expensive solution out there today. Arista is cheaper and they're still not a 'cheap' option whilst Arista has even lower latency options. Talking datacenter networks here. We just purchased ~300 usable ports worth of NVIDIA 400G IB switches with optics - that's a $600k investment and we don't even have the annual management software license or the NIC-side (ConnectX 8) and NIC-side optics or cabling, all-in all, I'm estimating $1.2M over a 5 year period. There is no 400G Ethernet fabric that costs $4k/link, it's about half to a quarter of that cost depending on your switch gear. I think it's a waste of money, but the religion of IB is strong amongst some people.
Depends on the ethernet adapter.

Right now, you can buy the MCX515A-CCAT off of eBay for $125.92. Conversely, you can by the IB version (MCX555-ECAT) off of eBay for $119.95.

Nvidia (read: Mellanox - and yes, I still call it Mellanox) is expensive because Mellanox has pretty much always been expensive. And it's only recently that other vendors are starting to come out with their own line of products, but in many cases, Mellanox still takes the crown. Myrinet tried. OmniPath tried. Mellanox/IB won.

In terms of what your company purchased - again, it depends.

You can pick up a Edgecore DCS510 AS9716-32D 32-Port 400GbE Bare Metal Switch with ONIE - Part ID: 9716-32D-O-AC-F-US from Colfax Direct, for example, for $16560 which would work out to $517.50 per port. Conversely, you could pick up a Mellanox Quantum-2 MQM9790 64-port Non-blocking Unmanaged NDR 400Gb/s InfiniBand Switch - Part ID: MQM9790-NS2F also from Colfax Direct for $31125 which works out to $486.328125 per port.

Cabling varies depending on how far your runs are going to be and whether the ends are QSFP-DD and/or QSFP112.

Either case, as the data that I have presented shows, you can get IB stuff cheaper than ethernet. And that was still very much the case, back in like 2019 when I bought my switch, because I think it was an 18-port 100 GbE switch that cost almost as much as my 36-port Mellanox 100 Gbps IB switch. I looked at it because the ConnectX-4 cards were VPI cards, so that means I could set the port LINK_TYPE to either ETH or IB using mstconfig. So I could've gone either way, and IB was cheaper than ETH. (At least now, the ETH premium over IB isn't as outrageous as it used to be. It's only a 6.4% premium now. It used to be anywhere from 15-40% more for 100 GbE vs. 100 Gbps IB.

IB is great, if you know how to take advantage of it.

(I didn't buy 100 Gbps IB for HDD based storage. I bought it because the HPC apps that I was running at the time, was able to regularly hit 80-90 Gbps out of the 100 Gbps possible for RDMA/MPI.)

The ability for me to run offload storage traffic onto said 100 Gbps IB was really just a bonus at that point.

My point was you can't push 100Gbps from a single spinning disk that gives at best 1-10Mbps of throughput (if not reading from cache).
Two things:
1) I agree that you can't push 100 Gbps from a single spinning disk. You might be able to hit 1 Gbps for sequential writes, where the HDD cache might have limited use/benefit. But that, again, wasn't the point of having 100 Gbps IB neither. It was a fringe/leftover benefit from running HPC applications that uses IB/RDMA/MPI.

2) This has very little to do with the performance difference between ceph (~5% of a drive's capability) vs. gluster (~22% of a drive's capability).
 
Last edited:
I do. Because running software which doesn't get security updates belong in museums, not in production, even in homelabs.
Yes, and as CVE-2026-43134, CVE-2026-43284, and CVE-2026-43500 shows, updates are great. [/s]

(i.e. if you didn't update your kernels, then you wouldn't have given yourself these LPE exploits that you otherwise, previously, didn't have.

Same thing with CVE-2024-3094, where, again, if you didn't update, then you wouldn't have given yourself this backdoor.

Your statement argues that updated software is more secure and yet, these are just four of the more recent CVEs where the CVSS is 7.8, 7.8, 7.8, and 10.0 respectively.

If you didn't update, they you might not "invited" these issues into your production systems, homelab or otherwise.

Who knows how many more others there are where it was an update that gave or exposed the system to issues, where, if you didn't update, your system would've been fine.


Ideally also volunteering for maintaining glusterfs and it's support in qemu
I don't program, therefore; any programming that I would do, to try and help/contribute, would be done entirely via vibe coding. And we've already seen what that's done to Nvidia's own GPU drivers.

If anything, my vibe-coding to "help" maintain gluster and/or qemu is a sure-fire way to kill off any remnants of gluster and/or qemu in the same way that Nvidia's vibe-coding of their own GPU drivers is a sure-fire way to kill of their own drivers.

Perhaps this is the real why you suggested it (so that I would be sure to kill it off, for good) by vibe-coding it, quite literally, to its own death.

(So far, no one has been able to answer my question "if a program is stable, then why does it matter that there aren't as many commits happening?".)

If that's the metric that qemu is as their rationale for dropping support for gluster, then by that logic, bad code that constantly needs to get fixed would win this battle/race because the number of commits per month for crappy code would be astronomical because you're always trying to fix something that's fundamentally and critically broken.

But in terms of commits per month, it'd be a winner according to that metric, and if that's what is what qemu devs use to determine what will be supported and what won't, then bad code, by this commits per month metric, would get more adoption than good code that doesn't need perpetual fixes all the time, just to get it to work properly in the first place.

And if that is the logic/metric that they're using, maybe I should vibe code glusterfs back to being supported by qemu because I can commit each of the garbage output that AI generates and thus, inflate the number of commits per month with AI/vibe coded slop, just to send the number of commits per month through the roof.

Something tells me that I can probably automate that with n8n.

(SIdebar: responses, but still no technical discussion about the fact that gluster is 4.4x faster than ceph (in terms of % of drive capabilities used). Interesting.)
 
There is a gluster-like replacement and there are others that regularly pop up that run entirely in userspace. Gluster being unstable as per my prior comment is just personal experience. If you do hammer it in real production, it is very likely to eat your data. Almost everything about the disperse volumes is highly unstable code and will bite you in the long run, and 3-way replicated volumes are no better than Ceph when it comes to data usage.

Once you get to thousands of file objects, every file/directory operation triggers network round-trips to traverse bricks for DHT lookups, especially negative lookups (eg new files) become a bottleneck.

It was once a great solution, I've used it about 15 years ago in production, but Ceph was in every way better. If you want a similar-to-Gluster system: LUSTRE is still around. MooseFS is a thing (it only has a 32-bit pointer, so not very useful if you have PBs worth of lots of small files)
 
  • Like
Reactions: Johannes S
If you didn't update, they you might not "invited" these issues into your production systems, homelab or otherwise.
This is a logical fallacy. Just because the security issues you linked are not pressent in an older kernel doesn't mean that your old kernel doesn't have any security issues. In fact it defenitifely will have known (and unknown) security issues which are already fixed in newer kernels. This will never change since it's quite hard if not outright impossible to proove that a certain software doesn't have any security issue. (For folks who don't mind math: Google for Turing halting problem and Computability theory. This is the same reason why antivirus tools are rather limited in their usecases and kind of threats they are able to detect).


Regarding your "do you want me to vibecode the gluster support": You are missing my point. Since the development funded by Redhat isn't done any more, somebody else needs to do or fund it since the qemu developers deprecated support for it. No amount of benchmark or bickering here will change this.
 
If you want a technical discussion, you need to do it on technical terms. Your benchmarks (like most benchmarks) are invalid because you're not benchmarking the 'same thing'. Moreover, you're benchmarking 1 thing (Windows?) which, if you know anything about storage - Windows NTFS always tells the subsystem it is doing an async write (yeah, NTFS is bad for your data), whereas with Ceph + QEMU any write is sync.

You can find some benchmarks from 2013 when Ceph was really young that already showed Ceph already winning various major points against Gluster, IOPS, latency but not throughput necessarily. And Ceph has improved remarkably in 13 years while Gluster kind of stagnated, by 2020 Ceph was a consistent winner in most benchmarks. But those are synthetic benchmarks, often in real-world situations, not a homelab. On ancient hardware, there is all kinds of reasons modern code won't do well. On consumer hardware even worse. Consumer hardware has a tendency to lie about data consistency as well and (especially SSD) will cache writes 'by default', Ceph (like ZFS) tends to avoid those problems even on consumer hardware.

So this is a consistent trope on this forum that ZFS and Ceph is 'slower' on consumer hardware than let's call it "naive storage" because "naive" benchmarks show a big (often unrealistic) gap. You see the opposite on real server hardware though. But Gluster is built really for hardware RAID with BBU, an open source alternative to the GPFS and other proprietary SAN systems, whereas Ceph (and ZFS) was literally invented to manage disks directly on cheap hardware, without mediator because around the early 2000s professionals started noticing from real life disaster stories that proprietary/hardware/software RAID, even with BBU cannot be trusted and does not provide the data guarantees and does not scale. I can tell you, Gluster won't scale past ~8 nodes, it will choke during rebuilds at today's scales (several TB), your data won't be consistent and available at all times in real world scenarios when things go wrong, being offline while a brick gets rebuilt is not great for business.

And again, nothing wrong with trying stuff out and experimenting. But here is the big thing: if you don't care about your data, then it doesn't matter what you use. If you care slightly about your data, but you don't need consistency of your data at all times between multiple nodes, then yeah, a replicated Gluster can do that, but most likely ZFS replicated once every minute or even every hour to another node will probably give you better guarantees about your data than Gluster.
 
Last edited:
Win11 VM running on Ceph finished the updates sometime between last night and this morning.

I got home from work and rebooted the VM so that said Win11 updates can go into effect.

You can see, from the screenshot below, when I started the reboot and how, almost an hour later, it's only 23% done through said Win11 updates (first round, post fresh Win11 VM install that was kicked off last night).

Yes, this is how slow ceph is.

Screenshot_2026-06-02_19-07-44.png
 
You’re running Windows 11 on a CPU from the Windows 7 era. That proves absolutely nothing other than that you can get hardware to do insane things. I can show you a Ceph cluster that does Windows 11 updates in less than 5 minutes on just 10G backbone with SAS drives.

Look at your storage pressure stat, what does it say?
 
Last edited:
I read through the chain, and I dont really understand what everyone is arguing about.

Yes, this is how slow ceph is.
Yes. ceph requires a minimum configuration (network topology, OSD count, etc) before its performant.

Who cares if it's old, if it works (vs. it not working)?
this the is the wrong question to ask. whether someone else cares or not isnt particularly relevent, despite certain forum members forcefully expressing their opinion. It isnt that they're wrong, its whether you are comfortable with operating a legacy environment without security or bugfixes. Its your headache to deal with. There was/is nothing inherently wrong with PVE7, or gluster- their devs simply moved on due to various drivers.

and onto the main event:
It is, for this reason, why a lot of companies don't deploy the bleeding edge technology for mission critical, production systems, because newer can break stuff and cause stuff to stop working which would be a huge problem, for said business.
I dont really know what you're referencing in terms of information or statistics. COMPANIES that deploy SOLUTIONS for their specific production use cases deploy supportable solutions. They have uptime requirements and security liabilities. As long as the solution meets MAC (minimum acceptance criteria) most CTOs are not going to care that much about the technology- although there are those who do. The flip side to your argument is that a company will not deploy unsupportable hardware or software as it will not meet their business insurance criteria.

With regards to IB- its a fantastic low latency transport, and is used in many use cases where latency is king. Unfortunately its also missing a TON of functionality used by more typical usecases (No layer 2 functionality to speak of) which limits it's utility for PVE, for example. Nevertheless, it can be used as long as your comfortable operating SMs in production and dont attempt to use them for vmbrs. I decomissioned all IB from my deployments when 100gbe became nearly as performant but massively cheaper, and with switches that can actually do stuff. I get you already own the hardware so the price isnt a feature for you; the real question you should be asking is what are you actually after?

If you're insistent on using this for whatever purpose- use it. no one will stop you. If you intend to operate it as an enterprise- you've been warned.
 
Last edited:
There is a gluster-like replacement and there are others that regularly pop up that run entirely in userspace. Gluster being unstable as per my prior comment is just personal experience. If you do hammer it in real production, it is very likely to eat your data. Almost everything about the disperse volumes is highly unstable code and will bite you in the long run, and 3-way replicated volumes are no better than Ceph when it comes to data usage.

Once you get to thousands of file objects, every file/directory operation triggers network round-trips to traverse bricks for DHT lookups, especially negative lookups (eg new files) become a bottleneck.

It was once a great solution, I've used it about 15 years ago in production
I haven't been able to find a list of glusterfs version history with its corresponding release date, but the idea of trying to find that was to see what version of glusterfs you were running, 15 years ago.

I'm probably going to spend more time, playing with it (because my current ceph cluster running on the mini PC will eventually kill said 2242 M.2 NVMe SSDs. (It's already at 33% wearout, so it's only a question of when rather than a question of if.)

There's only one way to find out how well (or how poorly) it performs - try it.

(But again, the current Win11 VM that's still rebooting, this is just one running on a brand new cluster that I set up last night.)


Once you get to thousands of file objects, every file/directory operation triggers network round-trips to traverse bricks for DHT lookups, especially negative lookups (eg new files) become a bottleneck.
I would have to imagine that's probably no better than erasure coded ceph where it has to read all of the blocks in, especially for a read-modify-write. (The video from 45Drives talks about other scenarios where performance is a problem with erasure coded ceph and again, given the results that I am sharing live (because it takes ceph soooo long to run), the very ceph erasure coded ceph performance issues are very real that enterprise grade U.2 NVMe and/or E1.S SDEFF NVMe SSD is only masking the fundamental fact that ceph has some very real erasure coded performance issues.)

I'll have to throw several thousand of files at my crappy gluster cluster and see what happens.

Like if you had a U.2 NVMe or E1.S EDSFF NVMe 5.0 x4 SSD that's capable of 12 GB/s sequential reads, if you were able to get 22% of that drive's performance capability, then instead of getting ~600 MB/s from each drive, you'd get 2.64 GB/s.


If you want a similar-to-Gluster system: LUSTRE is still around. MooseFS is a thing (it only has a 32-bit pointer, so not very useful if you have PBs worth of lots of small files)
Yeah, I looked briefly at Lustre, but I haven't figured out how to deploy it yet. (But AI can help with that.)