Proxmox with glusterfs storage backend: VMs get corrupted

schoeppi · Jun 1, 2023

Hi,

we are testing the current proxmox version with a glusterfs storage backend and have a strange issue with file getting corupted inside the virtual machines. For what reason ever from one moment to another binaries can not longer be executed, scripts are damaged and so on. In the logs I get errors like this:

May 30 11:22:36 ns1 dockerd[1234]: time="2023-05-30T11:22:36.874765091+02:00" level=warning msg="Running modprobe bridge br_netfilter failed with message: modprobe: ERROR: could not insert 'bridge': Exec format error\nmodprobe: ERROR: could not insert 'br_netfilter': Exec format error\ninsmod /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko \ninsmod /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko \n, error: exit status 1"

On such a broken system a file brings the following:

root@ns1:~# file /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko
/lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko: data
root@ns1:~#

On a normal system it looks like this:

root@gluster1:~# file /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko
/lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko: ELF 64-bit LSB
relocatable, x86-64, version 1 (SYSV), BuildID[sha1]=1084f7cfcffbd4c607724fba287c0ea7fc5775
root@gluster1:~#

there are not only kernel modules afected. I saw the same behaviour for scripts, icinga check modules, the sendmail binary and so on, I think it is totaly random :-(.

We have the problems with newly installed VMs, VMs cloned from a template create on our proxmox host and with VMs which we used before with libvirtd and migrated to our new proxmox machine. So IMHO it can not be related to the way we create new virtual machines. However, I have the feeling that the problem could be related to the qcow2 image format we are using, but I need todo more testing to be sure that raw images do not crash, because the problem is a bit hard to reproduce and the corruption happens absolutely randomly :-(.

We are using the following software:

root@proxmox1:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.104-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-1
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
root@proxmox1:~#

root@proxmox1:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content rootdir,iso,images,vztmpl,backup,snippets

zfspool: local-zfs
pool rpool/data
content images,rootdir
sparse 1

glusterfs: gfs_vms
path /mnt/pve/gfs_vms
volume gfs_vms
content images
prune-backups keep-all=1
server gluster1.linova.de
server2 gluster2.linova.de

root@proxmox1:~#

The config of a typical VM looks like this:

root@proxmox1:~# cat /etc/pve/qemu-server/101.conf
#ns1
agent: enabled=1,fstrim_cloned_disks=1
boot: c
bootdisk: scsi0
cicustom: user=local:snippets/user-data
cores: 1
hotplug: disk,network,usb
ide2: gfs_vms:101/vm-101-cloudinit.qcow2,media=cdrom,size=4M
ipconfig0: ip=10.200.32.9/22,gw=10.200.32.1
kvm: 1
machine: q35
memory: 2048
meta: creation-qemu=7.2.0,ctime=1683718002
name: ns1
nameserver: 10.200.0.5
net0: virtio=1A:61:75:25:C6:30,bridge=vmbr0
numa: 1
ostype: l26
scsi0: gfs_vms:101/vm-101-disk-0.qcow2,discard=on,size=10444M
scsihw: virtio-scsi-pci
searchdomain: linova.de
serial0: socket
smbios1: uuid=e2f503fe-4a66-4085-86c0-bb692add6b7a
sockets: 1
vmgenid: 3be6ec9d-7cfd-47c0-9f86-23c2e3ce5103

root@proxmox1:~#

Our glusterfs storage backend consists of three servers all running Ubuntu 22.04 and glusterfs version 10.1. In the logs of the glusterfs machines I see sometimes errors and warnings about files can not be locked, maybe this is causing the problems. I can post those errors and warnings if wanted, but I was not able to bring the time of this errors and warnings in a relation to the point of time when machines crashed and corrupted files occure.

If I clone a VM from a template I get the following output, maybe this is a hint for our problem:

root@proxmox1:~# qm clone 9000 200 --full --name testvm --description
"testvm" --storage gfs_vms [62/62]
create full clone of drive ide2 (gfs_vms:9000/vm-9000-cloudinit.qcow2)
Formatting
'gluster://gluster1.linova.de/gfs_vms/images/200/vm-200-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
[2023-05-30 16:18:17.753152 +0000] I
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf size is 1024 because ios_sample_interval is 0
[2023-05-30 16:18:17.876879 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:17.877606 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:17.878275 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:27.761247 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
io-stats translator unloaded
[2023-05-30 16:18:28.766999 +0000] I
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf size is 1024 because ios_sample_interval is 0
[2023-05-30 16:18:28.936449 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0:
All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:28.937547 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:28.938115 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:38.774387 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
io-stats translator unloaded
create full clone of drive scsi0 (gfs_vms:9000/base-9000-disk-0.qcow2)
Formatting
'gluster://gluster1.linova.de/gfs_vms/images/200/vm-200-disk-0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=10951327744 lazy_refcounts=off refcount_bits=16
[2023-05-30 16:18:39.962238 +0000] I
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf size is 1024 because ios_sample_interval is 0
[2023-05-30 16:18:40.084300 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:40.084996 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:40.085505 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:49.970199 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
io-stats translator unloaded
[2023-05-30 16:18:50.975729 +0000] I
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf size is 1024 because ios_sample_interval is 0
[2023-05-30 16:18:51.768619 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:51.769330 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:51.769822 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:19:00.984578 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
io-stats translator unloaded
transferred 0.0 B of 10.2 GiB (0.00%)
[2023-05-30 16:19:02.030902 +0000] I
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf size is 1024 because ios_sample_interval is 0
transferred 112.8 MiB of 10.2 GiB (1.08%)
transferred 230.8 MiB of 10.2 GiB (2.21%)
transferred 340.5 MiB of 10.2 GiB (3.26%)
...
transferred 10.1 GiB of 10.2 GiB (99.15%)
transferred 10.2 GiB of 10.2 GiB (100.00%)
transferred 10.2 GiB of 10.2 GiB (100.00%)
[2023-05-30 16:19:29.804006 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:19:29.804807 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:19:29.805486 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:19:32.044693 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
io-stats translator unloaded
root@proxmox1:~#

Is this message about the subvolumes which are down normal?

I have no idea how to further debug the problem so any helping idea or hint would be great. Please let me also know if I can provide more infos regarding our setup or if I can provide more logs.

BTW.: I saw no logs in the syslog file regarding the crashes on our proxmox machine...

Also general tips about seting up a glusterfs storage backend for proxmox would be great or also a message, that glusterfs will not work well with proxmox in specific situations, e.g. if qcow2 files are used for the virtual machines.

Ciao and thanks for your help,

Schoeppi

loomes · Jun 4, 2023

"All subvolumes are down." looks like Network Problems to me?
I have a Gluster Storage in my Homelab. 5x Replicated. But i have set it up by my own and added it as normal storage to Proxmox to use LXC's on it.
It works great without Problems. I have 5-6 LXS's and 2 qcow2 VM's on it.

tidnab · Oct 4, 2023

Similar problem. I have a 5 node Proxmox cluster and glusterfs replica 3 on the first 3 nodes. There is a problem on nodes with only gluster-client installed:

Bash:

root@node5:~# qm set 139 --ide2 proxmoxhdd:cloudinit --boot c --bootdisk scsi0 --serial0 socket --vga serial0
update VM 139: -boot c -bootdisk scsi0 -ide2 proxmoxhdd:cloudinit -serial0 socket -vga serial0
Formatting 'gluster://10.10.10.55/gfs-vol-proxmoxhdd/images/139/vm-139-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
[2023-10-04 08:57:37.779912 +0000] I [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs-vol-proxmoxhdd: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2023-10-04 08:57:37.893524 +0000] E [MSGID: 108006] [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs-vol-proxmoxhdd-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-10-04 08:57:47.785319 +0000] I [io-stats.c:4038:fini] 0-gfs-vol-proxmoxhdd: io-stats translator unloaded
[2023-10-04 08:57:48.819197 +0000] I [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs-vol-proxmoxhdd: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2023-10-04 08:57:48.957411 +0000] E [MSGID: 108006] [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs-vol-proxmoxhdd-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-10-04 08:57:58.823394 +0000] I [io-stats.c:4038:fini] 0-gfs-vol-proxmoxhdd: io-stats translator unloaded
ide2: successfully created disk 'proxmoxhdd:139/vm-139-cloudinit.qcow2,media=cdrom'
generating cloud-init ISO
[2023-10-04 08:57:59.971782 +0000] I [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs-vol-proxmoxhdd: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2023-10-04 08:58:00.107056 +0000] E [MSGID: 108006] [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs-vol-proxmoxhdd-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-10-04 08:58:09.977169 +0000] I [io-stats.c:4038:fini] 0-gfs-vol-proxmoxhdd: io-stats translator unloaded

if I want to create a new VM there are no errors.

Bash:

root@node5:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

RolandK · Dec 25, 2023

here is another report on corruption happening with glusterfs

aweher · Mar 16, 2024

And another one here

All the nodes in the same VLAN, in the same switch, no STP/FW issues. Plain network @10gbps with no traffic at all.
To add the storage to the pve cluster I had to use the pvesm cli command because the gui was unable to scan the volumes from the servers.

Maybe the combination between server in v10 / client in v9 doesn't work?

CEPH is a no-go in my setup :-/

Code:

[2024-03-16 02:39:30.208978 +0000] I [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfspool: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-03-16 02:39:30.320063 +0000] E [MSGID: 108006] [afr-common.c:6140:__afr_handle_child_down_event] 0-gfspool-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2024-03-16 02:39:30.320156 +0000] E [MSGID: 108006] [afr-common.c:6140:__afr_handle_child_down_event] 0-gfspool-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
[2024-03-16 02:39:40.212994 +0000] I [io-stats.c:4038:fini] 0-gfspool: io-stats translator unloaded
[2024-03-16 02:39:41.217344 +0000] I [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfspool: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-03-16 02:39:43.256783 +0000] E [MSGID: 108006] [afr-common.c:6140:__afr_handle_child_down_event] 0-gfspool-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. 

Server:
root@gfs-s:~ # dpkg -l | grep glus
ii  glusterfs-cli                          10.1-1ubuntu0.2                         amd64        clustered file-system (cli package)
ii  glusterfs-client                       10.1-1ubuntu0.2                         amd64        clustered file-system (client package)
ii  glusterfs-common                       10.1-1ubuntu0.2                         amd64        GlusterFS common libraries and translator modules
ii  glusterfs-server                       10.1-1ubuntu0.2                         amd64        clustered file-system (server package)
ii  libglusterd0:amd64                     10.1-1ubuntu0.2                         amd64        GlusterFS glusterd shared library
ii  libglusterfs0:amd64                    10.1-1ubuntu0.2                         amd64        GlusterFS shared library

PVE:
proxmox-ve: 7.4-1 (running kernel: 5.15.143-1-pve)
pve-manager: 7.4-17 (running version: 7.4-17/513c62be)
pve-kernel-5.15: 7.4-11
pve-kernel-5.15.143-1-pve: 5.15.143-1
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.39-3-pve: 5.15.39-3
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph: 17.2.7-pve2~bpo11+1
ceph-fuse: 17.2.7-pve2~bpo11+1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.3
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.6-1
proxmox-backup-file-restore: 2.4.6-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+2
pve-firewall: 4.3-5
pve-firmware: 3.6-6
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.14-pve1

dagservice · Apr 18, 2024

Just adding my entry for stats: experiencing this as well. At this point, i've about had it with gluster, it's been giving me nothing but trouble the past year or so. Think i'm going to switch to local ZFS with replication, see if that works.

Search

Search

Proxmox with glusterfs storage backend: VMs get corrupted

schoeppi

New Member

loomes

Renowned Member

tidnab

New Member

RolandK

Renowned Member

aweher

Renowned Member

dagservice

New Member