i/o error in VM on glusterfs when one replica is down

foxriver76 · Mar 15, 2021

Hi all,

recently I changed my Proxmox setup to run a HA cluster based on GlusterFS. Now I noticed, that whenever I put one host of the GlusterFS down (e.g. restarting it after upgrades), the VM's running on the other Host gets a corrupted filesystem. The corrupted VM is then showing the following errors:

Bildschirmfoto von 2021-03-15 12-04-18.png

My GlusterFS configuration looks like:

Code:

root@proxmox-nuc:~# gluster volume info

Volume Name: pve
Type: Replicate
Volume ID: f2e8a3f0-b73f-4354-adbe-21a87f24b981
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.178.141:/data/proxmox/gv0
Brick2: 192.168.178.130:/data/proxmox/gv0
Brick3: 192.168.178.28:/data/proxmox/gv0 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
features.shard: on
cluster.self-heal-daemon: on

Has anyone an idea why this happens?

best regards

fox

Dominic · Mar 16, 2021

Not sure how arbiters work in GlusterFS, but I think quorum/arbiter might be the problem.

When I have 2 of 3 Gluster servers online than I can use dd successfully on a VM that I have stored on corresponding GlusterFS.

Code:

root@glusterHci143:~# gluster volume info
 
Volume Name: gv0
Type: Replicate
Volume ID: b7c62cc6-1635-488a-bbc5-70aa70322926
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.25.172:/data/brick1/gv0
Brick2: 192.168.25.173:/data/brick1/gv0
Brick3: 192.168.25.174:/data/brick1/gv0
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet



root@glusterHci143:~# gluster peer status
Number of Peers: 2

Hostname: 192.168.25.174
Uuid: 0900b5ff-3674-46a2-8741-06fc8d66d048
State: Peer in Cluster (Connected)

Hostname: 192.168.25.172
Uuid: d88a20b8-349b-4a07-99c7-d5edded7f458
State: Peer in Cluster (Disconnected)

As soon as I shut down another Gluster node (so only 1 of 3 left), I get the same I/O errors when running dd.

foxriver76 · Mar 16, 2021

Thanks for your reply. As far as I understand, the arbiter should keep the FS alive by maintaining the quorum even if one of the 2 hosts is down.
Maybe someone with more knowledge of GlusterFS or someone who has also a 2 + 1 setup running can provide me information if it's possible to keep the VM's alive.

What's really bad is that even if the second host is up again, the VM still is corrupted until I manually reboot the affected VMs.

Search

Search

i/o error in VM on glusterfs when one replica is down

foxriver76

New Member

Dominic

Proxmox Retired Staff

foxriver76

New Member