Hello,
I have a Proxmox cluster of two older HPE servers (DL380 gen7). These have a built-in hardware RAID controller on which 8 plates each with 146 GB hang. From this I built a RAID5 with a spare.
I installed Debian / Stretch and then Proxmox after the official instructions. Then I set up GlusterFS, also according to instructions.
The two servers are connected via 10 GBE (DAC cable) with each other. This link is also used by the GlusterFS.
There are about 15 VMs with Linux (also Debian / Stretch). Most of the VMs hardly make up an IO load. I installed Icinga with pnp4nagios in a VM. That makes a bit of IO load by writing the RRDs (despite rrdcached).
The VM has crashing the file system (btrfs) several times in the last few months, so hard that I had to reinstall the VM each time. Once I was able to follow the live - the kernel complained:
2019 Jun 3 11:30:34 lnxicinga01 BTRFS: error (device vda1) in cleanup_transaction: 1846: errno = -5 IO failure
Apparently there are always "hangers" in GluserFS. In the simplest case, after such a problem, the file system is only mounted on ro. In the worst case GlusterFS shows me a split brain - then the file system is also heavily damaged in most cases.
For two weeks I have run a second VM with Icinga and pnp4nagios. Since then, the mentioned problems occur at least once a day. On Friday, the file systems have completely passed away with three VMs - I have to rebuild these VMs completely again :-(
I have no idea what the problem could be. The RAID controllers show no errors, all disks work, there are no special kernel messages on the two Proxmox servers. The network connection also shows no problems or failures.
Is stability basically not good at GlusterFS? My problem is that I do not have my own SAN, so all data should be distributed through the Proxmox machines.
I would be very grateful for any kind of help or suggestions.
many Greetings
Meinhard
I have a Proxmox cluster of two older HPE servers (DL380 gen7). These have a built-in hardware RAID controller on which 8 plates each with 146 GB hang. From this I built a RAID5 with a spare.
I installed Debian / Stretch and then Proxmox after the official instructions. Then I set up GlusterFS, also according to instructions.
The two servers are connected via 10 GBE (DAC cable) with each other. This link is also used by the GlusterFS.
There are about 15 VMs with Linux (also Debian / Stretch). Most of the VMs hardly make up an IO load. I installed Icinga with pnp4nagios in a VM. That makes a bit of IO load by writing the RRDs (despite rrdcached).
The VM has crashing the file system (btrfs) several times in the last few months, so hard that I had to reinstall the VM each time. Once I was able to follow the live - the kernel complained:
2019 Jun 3 11:30:34 lnxicinga01 BTRFS: error (device vda1) in cleanup_transaction: 1846: errno = -5 IO failure
Apparently there are always "hangers" in GluserFS. In the simplest case, after such a problem, the file system is only mounted on ro. In the worst case GlusterFS shows me a split brain - then the file system is also heavily damaged in most cases.
For two weeks I have run a second VM with Icinga and pnp4nagios. Since then, the mentioned problems occur at least once a day. On Friday, the file systems have completely passed away with three VMs - I have to rebuild these VMs completely again :-(
I have no idea what the problem could be. The RAID controllers show no errors, all disks work, there are no special kernel messages on the two Proxmox servers. The network connection also shows no problems or failures.
Is stability basically not good at GlusterFS? My problem is that I do not have my own SAN, so all data should be distributed through the Proxmox machines.
I would be very grateful for any kind of help or suggestions.
many Greetings
Meinhard