glusterfs all subvolume down when create VM

chchang · Jun 4, 2019

I add a gluster storage in my proxmox cluster (4 nodes)
the glusterfs volume will be mounted , but when I try to clone from template into the storage , there`s always some error like this

Code:

root@pve:~# qm clone 200 189 --name test.glusterfs --full --storage testglusterfs
create full clone of drive virtio0 (seventeraonfreenas10gnetwork:200/base-200-disk-0.qcow2)
[2019-06-04 06:45:35.948617] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
^C[2019-06-04 06:45:35.948675] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.

but I can write files in the storage

Code:

root@pve:/mnt/pve/testglusterfs/images# dd if=/dev/zero of=testfile bs=1k count=1M status=progress
1050831872 bytes (1.1 GB, 1002 MiB) copied, 44 s, 23.9 MB/s   
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 44.9724 s, 23.9 MB/s

and I try to mount the storage via another workstation , and it`s ok to mount and read/write.

any suggestions ??

chchang · Jun 4, 2019

I also try to create a new vm in glusterfs , it also shows the same "all subvolume down" messages , but the create job can be done.

Code:

[2019-06-04 08:28:45.414580] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2019-06-04 08:28:45.414752] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2019-06-04 08:29:25.838133] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Formatting 'gluster://192.168.11.185/datavol/images/146/vm-146-disk-0.qcow2', fmt=qcow2 size=34359738368 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2019-06-04 08:29:25.838314] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-datavol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
TASK OK

and I`m try to install OS in the VM , looks like it just works.

so , why the clone job failed ?

lukakovacica · Aug 2, 2019

I get the exact same message when trying to create a VM or migrate the storage of a VM to a GlusterFS volume.

Output:

Code:

create full clone of drive scsi0 (storage:vm-208-disk-0)
[2019-08-02 14:19:43.005478] E [MSGID: 108006] [afr-common.c:5318:__afr_handle_child_down_event] 0-backupvol-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2019-08-02 14:19:43.005872] I [io-stats.c:4027:fini] 0-backupvol: io-stats translator unloaded
TASK ERROR: storage migration failed: error with cfs lock 'storage-backupvol': unable to create image: interrupted by signal

pukkita · Aug 22, 2019

Same here... while (successfully) restoring to a VM:

Code:

restore vma archive: lzop -d -c /mnt/pve/SHAREDBACKUP/dump/vzdump-qemu-101-2019_08_22-00_00_01.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp30653.fifo - /var/tmp/vzdumptmp30653
CFG: size: 609 name: qemu-server.conf
DEV: dev_id=1 size: 34359738368 devname: drive-scsi0
DEV: dev_id=2 size: 322122547200 devname: drive-scsi1
CTIME: Thu Aug 22 00:00:05 2019
[2019-08-22 15:00:37.251311] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-SHARED-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
Formatting 'gluster://pve2/SHARED/images/101/vm-101-disk-0.qcow2', fmt=qcow2 size=34359738368 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2019-08-22 15:00:39.209448] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-SHARED-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
new volume ID is 'GlusterFS:101/vm-101-disk-0.qcow2'
map 'drive-scsi0' to 'gluster://pve2/SHARED/images/101/vm-101-disk-0.qcow2' (write zeros = 0)
[2019-08-22 15:00:40.302453] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-SHARED-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
Formatting 'gluster://pve2/SHARED/images/101/vm-101-disk-1.qcow2', fmt=qcow2 size=322122547200 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2019-08-22 15:00:50.203964] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-SHARED-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
new volume ID is 'GlusterFS:101/vm-101-disk-1.qcow2'
map 'drive-scsi1' to 'gluster://pve2/SHARED/images/101/vm-101-disk-1.qcow2' (write zeros = 0)
progress 1% (read 3564830720 bytes, duration 14 sec)
[snip]

Code:

root@pve1:~# gluster volume status

Status of volume: BACKUP
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick pve1:/BackVMs                         49152     0          Y       3260
Brick pve2:/BackVMs                         49152     0          Y       3470
Brick pve3:/BackVMs                         49152     0          Y       4359

Self-heal Daemon on localhost               N/A       N/A        Y       3350
Self-heal Daemon on 10.0.100.93             N/A       N/A        Y       4377
Self-heal Daemon on 10.0.100.92             N/A       N/A        Y       3548

Task Status of Volume BACKUP
------------------------------------------------------------------------------
There are no active volume tasks



Status of volume: SHARED
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick pve1:/RunVMs                          49153     0          Y       3295
Brick pve2:/RunVMs                          49153     0          Y       3486
Brick pve3:/RunVMs                          49153     0          Y       4368

Self-heal Daemon on localhost               N/A       N/A        Y       3350
Self-heal Daemon on 10.0.100.93             N/A       N/A        Y       4377
Self-heal Daemon on 10.0.100.92             N/A       N/A        Y       3548

Task Status of Volume SHARED
------------------------------------------------------------------------------
There are no active volume tasks


root@pve1:~# gluster volume info

Volume Name: BACKUP
Type: Replicate
Volume ID: a8c1f419-4445-423a-a4c7-fb3b82195eba
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp

Bricks:
Brick1: pve1:/BackVMs
Brick2: pve2:/BackVMs
Brick3: pve3:/BackVMs

Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Volume Name: SHARED
Type: Replicate
Volume ID: 538d994b-2de3-43f8-b77c-7a9ba1143ebc
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp

Bricks:
Brick1: pve1:/RunVMs
Brick2: pve2:/RunVMs
Brick3: pve3:/RunVMs


Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Dominic · Aug 23, 2019

What versions are you running?

For the PVE clients

Code:

pveversion -v

If your host has dpkg

Code:

dpkg -l | grep gluster

or if that is not available simply

Code:

gluster --version

pukkita · Aug 23, 2019

Code:

root@pve1:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
pve-kernel-5.0: 6.0-5
pve-kernel-helper: 6.0-5
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve2~test1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-2
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-5
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-3
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-5
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-5
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1

Code:

root@pve1:~# dpkg -l |grep gluster
ii  glusterfs-client                     5.5-3                       amd64        clustered file-system (client package)
ii  glusterfs-common                     5.5-3                       amd64        GlusterFS common libraries and translator modules
ii  glusterfs-server                     5.5-3                       amd64        clustered file-system (server package)
ii  libglusterfs-dev                     5.5-3                       amd64        Development files for GlusterFS libraries
ii  libglusterfs0:amd64                  5.5-3                       amd64        GlusterFS shared library

Just realized there are updates (haven't set up correctly the 6.0 no-susbscription repo) , upgrading as we speak...

Dominic · Aug 23, 2019

Thank you all for bringing this to our attention!

I could reproduce most of your problems:

creating VMs
full clones of templates
moving disks
restoring VMs

with a Gluster storage as target all give the same error.

This happens with @pukkita's version (proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)) as well as with the updates from no-subscription (proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)) which is not a surprise as there has only been a single unrelated patch concerning Gluster since.

chchang said:
why the clone job failed ?

The only exception to that is that my clone task succeeded even though the error message appeared.

There is actually a similar bug already reported. I added more details to it. You are welcome to add yourself to the CC list. I assume this is related to this bug.

andrea68 · May 9, 2021

I have a 4 nodes with gluster and similar problems... I cannot restore any VM backupped on my gluster volume.
Same error.
" All subvolumes are down. Going offline until at least one of them comes back up"

I'm on the latest PVE version and proxmox.

The fact that I cannot restore ANY of my VM's from backup seems to me a very serious issue.
There is nothing I can do about it?

Thanks.

zemzema · Jul 19, 2021

Same problem here. Does someone have a solution for this problem?

Dominic · Jul 21, 2021

Hi, could you please post

Code:

pveversion -v

and also the exact log output of your failed restore?

For me with

Code:

pve-manager/7.0-10/d2f465d3 (running kernel: 5.11.22-2-pve)
glusterfs-client: 9.2-1

restoring backups from a GlusterFS storage works. It does show the "subvolumes down" message, but they seem to be harmless.

Christophorus Reyhan · Mar 22, 2024

Seems like it still exists

Bash:

[2024-03-22 09:46:49.729254 +0000] I [io-stats.c:3706:ios_sample_buf_size_configure] 0-gv0: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-03-22 09:46:51.541665 +0000] E [MSGID: 108006] [afr-common.c:6140:__afr_handle_child_down_event] 0-gv0-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2024-03-22 09:46:59.730966 +0000] I [io-stats.c:4038:fini] 0-gv0: io-stats translator unloaded

Bash:

$ gluster --version
glusterfs 9.2

Bash:

$ pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libqb0: not correctly installed
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

Christophorus Reyhan · Mar 22, 2024

Might be related to this issue. Perhaps gluster + qemu issue?

Search

Search

glusterfs all subvolume down when create VM

chchang

Well-Known Member

chchang

Well-Known Member

lukakovacica

New Member

pukkita

Member

Dominic

Proxmox Retired Staff

pukkita

Member

Dominic

Proxmox Retired Staff

andrea68

Renowned Member

zemzema

Member

Dominic

Proxmox Retired Staff

Christophorus Reyhan

New Member

Christophorus Reyhan

New Member