Hello,
I have a 2 node production cluster with ceph, a third node for the corum with ceph monitor but no osd this node is use for test purpose.
I have ~20 lxc container and 5 vm on these 2 node.
from few month, sometime backup lxc freeze on a container, not on a particular node, not on a particular container.
backup job successfully done on few container and then freeze at step "creating archive"
ex:
tar is in D state so I can't kill it : ps aux :
I can't stop backup process, I can't remove snapshot on this container and so next backup failed, the only solution is to reboot the node.
I don't find what tar is waiting...
/mnt/vzsnap0/ is empty and not mounted maybe it is the problem?
I try to mount manually but nothing happen :
Backup is done on an NFS mount, I try to change NFS from hard to soft but I always have this issue.
Anyone knows how to find what tar is waiting? anyone has an idea on what's happen?
thanks for your helps
Eric
pveversion --verbose
proxmox-ve: 5.2-2 (running kernel: 4.15.18-4-pve)
pve-manager: 5.2-9 (running version: 5.2-9/4b30e8f9)
pve-kernel-4.15: 5.2-7
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.18-1-pve: 4.15.18-19
pve-kernel-4.15.17-3-pve: 4.15.17-14
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-38
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-10
libpve-storage-perl: 5.0-28
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-2
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-27
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-34
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
I have a 2 node production cluster with ceph, a third node for the corum with ceph monitor but no osd this node is use for test purpose.
I have ~20 lxc container and 5 vm on these 2 node.
from few month, sometime backup lxc freeze on a container, not on a particular node, not on a particular container.
backup job successfully done on few container and then freeze at step "creating archive"
ex:
INFO: Starting Backup of VM 126 (lxc)
INFO: status = running
INFO: CT Name: CT126
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
/dev/rbd2
INFO: creating archive '/mnt/pve/Qnap2/dump/vzdump-lxc-126-2018_10_01-01_03_34.tar.gz'
INFO: status = running
INFO: CT Name: CT126
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
/dev/rbd2
INFO: creating archive '/mnt/pve/Qnap2/dump/vzdump-lxc-126-2018_10_01-01_03_34.tar.gz'
tar is in D state so I can't kill it : ps aux :
root 2855445 0.0 0.0 24804 9284 ? D oct.01 0:21 tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs --xattrs-include=user.* --xattrs-include=security.capability --warning=no-file-ignored --warning=no-xattr-write --one-file-system --warning=no-file-ignored --directory=/mnt/pve/Qnap2/dump/vzdump-lxc-126-2018_10_01-01_03_34.tmp ./etc/vzdump/pct.conf --directory=/mnt/vzsnap0 --no-anchored --exclude=lost+found --anchored --exclude=./tmp/?* --exclude=./var/tmp/?* --exclude=./var/run/?*.pid ./
I can't stop backup process, I can't remove snapshot on this container and so next backup failed, the only solution is to reboot the node.
I don't find what tar is waiting...
/mnt/vzsnap0/ is empty and not mounted maybe it is the problem?
I try to mount manually but nothing happen :
mount /dev/rbd/pveceph1/vm-126-disk-1@vzdump /mnt/vzsnap0/
Backup is done on an NFS mount, I try to change NFS from hard to soft but I always have this issue.
Anyone knows how to find what tar is waiting? anyone has an idea on what's happen?
thanks for your helps
Eric
pveversion --verbose
proxmox-ve: 5.2-2 (running kernel: 4.15.18-4-pve)
pve-manager: 5.2-9 (running version: 5.2-9/4b30e8f9)
pve-kernel-4.15: 5.2-7
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.18-1-pve: 4.15.18-19
pve-kernel-4.15.17-3-pve: 4.15.17-14
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-38
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-10
libpve-storage-perl: 5.0-28
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-2
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-27
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-34
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9