glusterfs all subvolume down when create VM

chchang

Well-Known Member
Feb 6, 2018
34
4
48
47
I add a gluster storage in my proxmox cluster (4 nodes)
the glusterfs volume will be mounted , but when I try to clone from template into the storage , there`s always some error like this

Code:
root@pve:~# qm clone 200 189 --name test.glusterfs --full --storage testglusterfs
create full clone of drive virtio0 (seventeraonfreenas10gnetwork:200/base-200-disk-0.qcow2)
[2019-06-04 06:45:35.948617] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
^C[2019-06-04 06:45:35.948675] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.

but I can write files in the storage
Code:
root@pve:/mnt/pve/testglusterfs/images# dd if=/dev/zero of=testfile bs=1k count=1M status=progress
1050831872 bytes (1.1 GB, 1002 MiB) copied, 44 s, 23.9 MB/s   
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 44.9724 s, 23.9 MB/s

and I try to mount the storage via another workstation , and it`s ok to mount and read/write.

any suggestions ??
 
  • Like
Reactions: lukakovacica
I also try to create a new vm in glusterfs , it also shows the same "all subvolume down" messages , but the create job can be done.

Code:
[2019-06-04 08:28:45.414580] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2019-06-04 08:28:45.414752] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2019-06-04 08:29:25.838133] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Formatting 'gluster://192.168.11.185/datavol/images/146/vm-146-disk-0.qcow2', fmt=qcow2 size=34359738368 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2019-06-04 08:29:25.838314] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
TASK OK

and I`m try to install OS in the VM , looks like it just works.

so , why the clone job failed ?
 
  • Like
Reactions: lukakovacica
I get the exact same message when trying to create a VM or migrate the storage of a VM to a GlusterFS volume.

Output:
Code:
create full clone of drive scsi0 (storage:vm-208-disk-0)
[2019-08-02 14:19:43.005478] E [MSGID: 108006] [afr-common.c:5318:__afr_handle_child_down_event] 0-backupvol-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2019-08-02 14:19:43.005872] I [io-stats.c:4027:fini] 0-backupvol: io-stats translator unloaded
TASK ERROR: storage migration failed: error with cfs lock 'storage-backupvol': unable to create image: interrupted by signal
 
Same here... while (successfully) restoring to a VM:

Code:
restore vma archive: lzop -d -c /mnt/pve/SHAREDBACKUP/dump/vzdump-qemu-101-2019_08_22-00_00_01.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp30653.fifo - /var/tmp/vzdumptmp30653
CFG: size: 609 name: qemu-server.conf
DEV: dev_id=1 size: 34359738368 devname: drive-scsi0
DEV: dev_id=2 size: 322122547200 devname: drive-scsi1
CTIME: Thu Aug 22 00:00:05 2019
[2019-08-22 15:00:37.251311] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-SHARED-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
Formatting 'gluster://pve2/SHARED/images/101/vm-101-disk-0.qcow2', fmt=qcow2 size=34359738368 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2019-08-22 15:00:39.209448] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-SHARED-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
new volume ID is 'GlusterFS:101/vm-101-disk-0.qcow2'
map 'drive-scsi0' to 'gluster://pve2/SHARED/images/101/vm-101-disk-0.qcow2' (write zeros = 0)
[2019-08-22 15:00:40.302453] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-SHARED-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
Formatting 'gluster://pve2/SHARED/images/101/vm-101-disk-1.qcow2', fmt=qcow2 size=322122547200 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2019-08-22 15:00:50.203964] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-SHARED-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
new volume ID is 'GlusterFS:101/vm-101-disk-1.qcow2'
map 'drive-scsi1' to 'gluster://pve2/SHARED/images/101/vm-101-disk-1.qcow2' (write zeros = 0)
progress 1% (read 3564830720 bytes, duration 14 sec)
[snip]

Code:
root@pve1:~# gluster volume status

Status of volume: BACKUP
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick pve1:/BackVMs                         49152     0          Y       3260
Brick pve2:/BackVMs                         49152     0          Y       3470
Brick pve3:/BackVMs                         49152     0          Y       4359

Self-heal Daemon on localhost               N/A       N/A        Y       3350
Self-heal Daemon on 10.0.100.93             N/A       N/A        Y       4377
Self-heal Daemon on 10.0.100.92             N/A       N/A        Y       3548

Task Status of Volume BACKUP
------------------------------------------------------------------------------
There are no active volume tasks



Status of volume: SHARED
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick pve1:/RunVMs                          49153     0          Y       3295
Brick pve2:/RunVMs                          49153     0          Y       3486
Brick pve3:/RunVMs                          49153     0          Y       4368

Self-heal Daemon on localhost               N/A       N/A        Y       3350
Self-heal Daemon on 10.0.100.93             N/A       N/A        Y       4377
Self-heal Daemon on 10.0.100.92             N/A       N/A        Y       3548

Task Status of Volume SHARED
------------------------------------------------------------------------------
There are no active volume tasks


root@pve1:~# gluster volume info

Volume Name: BACKUP
Type: Replicate
Volume ID: a8c1f419-4445-423a-a4c7-fb3b82195eba
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp

Bricks:
Brick1: pve1:/BackVMs
Brick2: pve2:/BackVMs
Brick3: pve3:/BackVMs

Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Volume Name: SHARED
Type: Replicate
Volume ID: 538d994b-2de3-43f8-b77c-7a9ba1143ebc
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp

Bricks:
Brick1: pve1:/RunVMs
Brick2: pve2:/RunVMs
Brick3: pve3:/RunVMs


Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
 
Last edited:
What versions are you running?

For the PVE clients
Code:
pveversion -v

If your host has dpkg
Code:
dpkg -l | grep gluster

or if that is not available simply
Code:
gluster --version
 
Code:
root@pve1:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
pve-kernel-5.0: 6.0-5
pve-kernel-helper: 6.0-5
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve2~test1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-2
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-5
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-3
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-5
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-5
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1

Code:
root@pve1:~# dpkg -l |grep gluster
ii  glusterfs-client                     5.5-3                       amd64        clustered file-system (client package)
ii  glusterfs-common                     5.5-3                       amd64        GlusterFS common libraries and translator modules
ii  glusterfs-server                     5.5-3                       amd64        clustered file-system (server package)
ii  libglusterfs-dev                     5.5-3                       amd64        Development files for GlusterFS libraries
ii  libglusterfs0:amd64                  5.5-3                       amd64        GlusterFS shared library

Just realized there are updates (haven't set up correctly the 6.0 no-susbscription repo) , upgrading as we speak...
 
Last edited:
Thank you all for bringing this to our attention!

I could reproduce most of your problems:
  • creating VMs
  • full clones of templates
  • moving disks
  • restoring VMs
with a Gluster storage as target all give the same error.

This happens with @pukkita's version (proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)) as well as with the updates from no-subscription (proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)) which is not a surprise as there has only been a single unrelated patch concerning Gluster since.

why the clone job failed ?
The only exception to that is that my clone task succeeded even though the error message appeared.


There is actually a similar bug already reported. I added more details to it. You are welcome to add yourself to the CC list. I assume this is related to this bug.
 
  • Like
Reactions: pukkita
I have a 4 nodes with gluster and similar problems... I cannot restore any VM backupped on my gluster volume.
Same error.
" All subvolumes are down. Going offline until at least one of them comes back up"

I'm on the latest PVE version and proxmox.

The fact that I cannot restore ANY of my VM's from backup seems to me a very serious issue.
There is nothing I can do about it?

Thanks.
 
Hi, could you please post
Code:
pveversion -v
and also the exact log output of your failed restore?

For me with
Code:
pve-manager/7.0-10/d2f465d3 (running kernel: 5.11.22-2-pve)
glusterfs-client: 9.2-1
restoring backups from a GlusterFS storage works. It does show the "subvolumes down" message, but they seem to be harmless.
 
Seems like it still exists

Bash:
[2024-03-22 09:46:49.729254 +0000] I [io-stats.c:3706:ios_sample_buf_size_configure] 0-gv0: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-03-22 09:46:51.541665 +0000] E [MSGID: 108006] [afr-common.c:6140:__afr_handle_child_down_event] 0-gv0-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2024-03-22 09:46:59.730966 +0000] I [io-stats.c:4038:fini] 0-gv0: io-stats translator unloaded

Bash:
$ gluster --version
glusterfs 9.2

Bash:
$ pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libqb0: not correctly installed
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
 
Last edited: