glusterfs all subvolume down when create VM

chchang

Well-Known Member
Feb 6, 2018
33
4
48
46
I add a gluster storage in my proxmox cluster (4 nodes)
the glusterfs volume will be mounted , but when I try to clone from template into the storage , there`s always some error like this

Code:
root@pve:~# qm clone 200 189 --name test.glusterfs --full --storage testglusterfs
create full clone of drive virtio0 (seventeraonfreenas10gnetwork:200/base-200-disk-0.qcow2)
[2019-06-04 06:45:35.948617] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
^C[2019-06-04 06:45:35.948675] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.

but I can write files in the storage
Code:
root@pve:/mnt/pve/testglusterfs/images# dd if=/dev/zero of=testfile bs=1k count=1M status=progress
1050831872 bytes (1.1 GB, 1002 MiB) copied, 44 s, 23.9 MB/s   
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 44.9724 s, 23.9 MB/s

and I try to mount the storage via another workstation , and it`s ok to mount and read/write.

any suggestions ??
 
  • Like
Reactions: lukakovacica
I also try to create a new vm in glusterfs , it also shows the same "all subvolume down" messages , but the create job can be done.

Code:
[2019-06-04 08:28:45.414580] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2019-06-04 08:28:45.414752] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2019-06-04 08:29:25.838133] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Formatting 'gluster://192.168.11.185/datavol/images/146/vm-146-disk-0.qcow2', fmt=qcow2 size=34359738368 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2019-06-04 08:29:25.838314] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
TASK OK

and I`m try to install OS in the VM , looks like it just works.

so , why the clone job failed ?
 
  • Like
Reactions: lukakovacica
I get the exact same message when trying to create a VM or migrate the storage of a VM to a GlusterFS volume.

Output:
Code:
create full clone of drive scsi0 (storage:vm-208-disk-0)
[2019-08-02 14:19:43.005478] E [MSGID: 108006] [afr-common.c:5318:__afr_handle_child_down_event] 0-backupvol-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2019-08-02 14:19:43.005872] I [io-stats.c:4027:fini] 0-backupvol: io-stats translator unloaded
TASK ERROR: storage migration failed: error with cfs lock 'storage-backupvol': unable to create image: interrupted by signal
 
Same here... while (successfully) restoring to a VM:

Code:
restore vma archive: lzop -d -c /mnt/pve/SHAREDBACKUP/dump/vzdump-qemu-101-2019_08_22-00_00_01.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp30653.fifo - /var/tmp/vzdumptmp30653
CFG: size: 609 name: qemu-server.conf
DEV: dev_id=1 size: 34359738368 devname: drive-scsi0
DEV: dev_id=2 size: 322122547200 devname: drive-scsi1
CTIME: Thu Aug 22 00:00:05 2019
[2019-08-22 15:00:37.251311] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-SHARED-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
Formatting 'gluster://pve2/SHARED/images/101/vm-101-disk-0.qcow2', fmt=qcow2 size=34359738368 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2019-08-22 15:00:39.209448] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-SHARED-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
new volume ID is 'GlusterFS:101/vm-101-disk-0.qcow2'
map 'drive-scsi0' to 'gluster://pve2/SHARED/images/101/vm-101-disk-0.qcow2' (write zeros = 0)
[2019-08-22 15:00:40.302453] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-SHARED-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
Formatting 'gluster://pve2/SHARED/images/101/vm-101-disk-1.qcow2', fmt=qcow2 size=322122547200 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2019-08-22 15:00:50.203964] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-SHARED-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
new volume ID is 'GlusterFS:101/vm-101-disk-1.qcow2'
map 'drive-scsi1' to 'gluster://pve2/SHARED/images/101/vm-101-disk-1.qcow2' (write zeros = 0)
progress 1% (read 3564830720 bytes, duration 14 sec)
[snip]

Code:
root@pve1:~# gluster volume status

Status of volume: BACKUP
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick pve1:/BackVMs                         49152     0          Y       3260
Brick pve2:/BackVMs                         49152     0          Y       3470
Brick pve3:/BackVMs                         49152     0          Y       4359

Self-heal Daemon on localhost               N/A       N/A        Y       3350
Self-heal Daemon on 10.0.100.93             N/A       N/A        Y       4377
Self-heal Daemon on 10.0.100.92             N/A       N/A        Y       3548

Task Status of Volume BACKUP
------------------------------------------------------------------------------
There are no active volume tasks



Status of volume: SHARED
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick pve1:/RunVMs                          49153     0          Y       3295
Brick pve2:/RunVMs                          49153     0          Y       3486
Brick pve3:/RunVMs                          49153     0          Y       4368

Self-heal Daemon on localhost               N/A       N/A        Y       3350
Self-heal Daemon on 10.0.100.93             N/A       N/A        Y       4377
Self-heal Daemon on 10.0.100.92             N/A       N/A        Y       3548

Task Status of Volume SHARED
------------------------------------------------------------------------------
There are no active volume tasks


root@pve1:~# gluster volume info

Volume Name: BACKUP
Type: Replicate
Volume ID: a8c1f419-4445-423a-a4c7-fb3b82195eba
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp

Bricks:
Brick1: pve1:/BackVMs
Brick2: pve2:/BackVMs
Brick3: pve3:/BackVMs

Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Volume Name: SHARED
Type: Replicate
Volume ID: 538d994b-2de3-43f8-b77c-7a9ba1143ebc
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp

Bricks:
Brick1: pve1:/RunVMs
Brick2: pve2:/RunVMs
Brick3: pve3:/RunVMs


Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
 
Last edited:
What versions are you running?

For the PVE clients
Code:
pveversion -v

If your host has dpkg
Code:
dpkg -l | grep gluster

or if that is not available simply
Code:
gluster --version
 
Code:
root@pve1:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
pve-kernel-5.0: 6.0-5
pve-kernel-helper: 6.0-5
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve2~test1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-2
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-5
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-3
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-5
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-5
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1

Code:
root@pve1:~# dpkg -l |grep gluster
ii  glusterfs-client                     5.5-3                       amd64        clustered file-system (client package)
ii  glusterfs-common                     5.5-3                       amd64        GlusterFS common libraries and translator modules
ii  glusterfs-server                     5.5-3                       amd64        clustered file-system (server package)
ii  libglusterfs-dev                     5.5-3                       amd64        Development files for GlusterFS libraries
ii  libglusterfs0:amd64                  5.5-3                       amd64        GlusterFS shared library

Just realized there are updates (haven't set up correctly the 6.0 no-susbscription repo) , upgrading as we speak...
 
Last edited:
Thank you all for bringing this to our attention!

I could reproduce most of your problems:
  • creating VMs
  • full clones of templates
  • moving disks
  • restoring VMs
with a Gluster storage as target all give the same error.

This happens with @pukkita's version (proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)) as well as with the updates from no-subscription (proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)) which is not a surprise as there has only been a single unrelated patch concerning Gluster since.

why the clone job failed ?
The only exception to that is that my clone task succeeded even though the error message appeared.


There is actually a similar bug already reported. I added more details to it. You are welcome to add yourself to the CC list. I assume this is related to this bug.
 
  • Like
Reactions: pukkita
I have a 4 nodes with gluster and similar problems... I cannot restore any VM backupped on my gluster volume.
Same error.
" All subvolumes are down. Going offline until at least one of them comes back up"

I'm on the latest PVE version and proxmox.

The fact that I cannot restore ANY of my VM's from backup seems to me a very serious issue.
There is nothing I can do about it?

Thanks.
 
Hi, could you please post
Code:
pveversion -v
and also the exact log output of your failed restore?

For me with
Code:
pve-manager/7.0-10/d2f465d3 (running kernel: 5.11.22-2-pve)
glusterfs-client: 9.2-1
restoring backups from a GlusterFS storage works. It does show the "subvolumes down" message, but they seem to be harmless.
 
Seems like it still exists

Bash:
[2024-03-22 09:46:49.729254 +0000] I [io-stats.c:3706:ios_sample_buf_size_configure] 0-gv0: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-03-22 09:46:51.541665 +0000] E [MSGID: 108006] [afr-common.c:6140:__afr_handle_child_down_event] 0-gv0-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2024-03-22 09:46:59.730966 +0000] I [io-stats.c:4038:fini] 0-gv0: io-stats translator unloaded

Bash:
$ gluster --version
glusterfs 9.2

Bash:
$ pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libqb0: not correctly installed
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!