vzdump CT error "Too many levels of symbolic links" when one storage is involved..?

m.ardito

Active Member
Feb 17, 2010
1,473
16
38
Torino, Italy
I have a strange error for vzdump. I searched in past threads, and this is the only similar, yet unsolved, issue happened before to others:
http://forum.proxmox.com/threads/9150-new-vzdump-behavior?p=51889#post51889

...sorry, this is a long, quite detailed post...

I did some "investigation", create some test cases, in order to find a solution myself (which unfortunately I have not been able to):

The issue is, vzdump fails to backup a fairly basic and absolutely idle CT depending on the CT storage or the backup storage in the sense that I have specifically this backup failing if either:
1) if the CT is created on a particular storage
2) if the backup of th CT is tried to a particular storage
and the "failing" storage is always the same. So I think it should be related to the storage or how it is mounted.

before all the rest, this is on pve 3.1-24 (will append pveversion -v output, and other data, at the bottom of the post)

more in detail, I have:
2 nodes cluster (pve1 and pve2, identical nodes, ibm x3650m2),
2 network NFS storages:
- pve_ts809, an old qnap nas (core2duo cpu), with old qnap firmware/kernel/software <-- this seems to work well
- pve_ts879 a new qnap nas (xeon cpu), with new qnap firmware/kernel/software <-- this seems to cause troubles

I can't get a clue of what is not working and why: in the storage.conf and mount output (see bottom of the post) I can see some differences but I didn't configure any of this parameters, they came out when I created the storage from pve gui, so I have no clue if some of them could be wrong, or at least causing troubles, and why. I just noted some differences...

eg, for what I can see:
in storage.conf
pve_ts879 has "options vers=3,tcp,nolock,rsize=262144,wsize=262144"
while
pve_ts809 has "options vers=3"

and in mount output, both as nfs:
/mnt/pve/pve_ts879 has (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=<ts879 IP>,mountvers=3,mountport=53850,mountproto=tcp,local_lock=all,addr=<ts879 IP>)
while
/mnt/pve/pve_ts809 gas (rw,relatime,vers=3,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=<ts809 IP>,mountvers=3,mountport=905,mountproto=udp,local_lock=none,addr=<ts809 IP>)

That said, to test, I created 2 really basic brand new CT from the same template: ubuntu-12.04-standard_12.04-1_i386.tar.gz

CT 113 filesystem is on pve_ts879
CT 114 filesystem is on pve_ts809
both seem to be running fine

backup from gui for CT 114 works both to local and to pve_ts809, but FAILS to pve_ts879 (tried with CT running, LZO compression, snapshot mode)
backup from gui for CT 113 always FAILS to local, FAILS to pve_ts809, FAILS to pve_ts879, (tried with CT running or not, LZO/GZIP/none compression, all modes)

in the logs of the failing jobs, there is always some kind of recurring text (I can provide any example, but from what I see it happens always as you see below):

if CT are down (or stop mode), eg:

Code:
INFO: starting new backup job: vzdump [B]113 [/B]--remove 0 --mode [B]stop [/B]--compress lzo --[B]storage local[/B] --node pve2
...
INFO: creating archive '/var/lib/vz/dump/vzdump-openvz-113-2014_03_13-12_10_03.tar.lzo'
[B]INFO: tar: ./etc/alternatives/: Cannot savedir: Too many levels of symbolic links
[/B]INFO: Total bytes written: 459663360 (439MiB, 58MiB/s)
INFO: tar: Exiting with failure status due to previous errors
INFO: Total bytes written: 459683840 (439MiB, 38MiB/s)
INFO: tar: Exiting with failure status due to previous errors
ERROR: Backup of VM 113 failed - command '(cd /mnt/pve/pve_ts879/private/113;find . '(' -regex '^\.$' ')' -o '(' -type 's' -prune ')' -o -print0|sed 's/\\/\\\\/g'|tar cpf - --totals --sparse --numeric-owner --no-recursion --one-file-system --null -T -|lzop) >/var/lib/vz/dump/vzdump-openvz-113-2014_03_13-12_12_50.tar.dat' failed: exit code 2
INFO: Backup job finished with errors
[B][COLOR=#ff0000]TASK ERROR[/COLOR][/B]: job errors

when CT are running, eg:

Code:
INFO: starting new backup job: vzdump [B]113 [/B]--remove 0 --mode [B]snapshot [/B]--[B]storage local[/B] --node pve2
...
INFO: starting first sync /mnt/pve/pve_ts879/private/113/ to /var/lib/vz/dump/vzdump-openvz-113-2014_03_13-12_11_34.tmp
[B]INFO: rsync: readdir("/mnt/pve/pve_ts879/private/113/etc/alternatives"): Too many levels of symbolic links (40)
[/B]INFO: IO error encountered -- skipping file deletion
...
INFO: total size is 441560783 speedup is 1.00
INFO: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1070) [sender=3.0.9]
ERROR: Backup of VM 113 failed - command 'rsync --stats -x --numeric-ids -aH --delete --no-whole-file --inplace '/mnt/pve/pve_ts879/private/113/' '/var/lib/vz/dump/vzdump-openvz-113-2014_03_13-12_11_34.tmp'' failed: exit code 23
INFO: Backup job finished with errors
[B][COLOR=#ff0000]TASK ERROR[/COLOR][/B]: job errors

Code:
INFO: starting new backup job: vzdump [B]114 [/B]--remove 0 --mode [B]snapshot [/B]--compress lzo --[B]storage pve_ts879[/B] --node pve1
...
INFO: starting final sync /mnt/pve/pve_ts809/private/114/ to /mnt/pve/pve_ts879/dump/vzdump-openvz-114-2014_03_13-11_24_12.tmp
[B]INFO: rsync: readdir("/mnt/pve/pve_ts879/dump/vzdump-openvz-114-2014_03_13-11_24_12.tmp/etc/alternatives"): Too many levels of symbolic links (40)
[/B]INFO: IO error encountered -- skipping file deletion
...
INFO: total size is 441532565 speedup is 688.27
INFO: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1070) [sender=3.0.9]
INFO: resume vm
INFO: Resuming...
INFO: vm is online again after 4 seconds
ERROR: Backup of VM 114 failed - command 'rsync --stats -x --numeric-ids -aH --delete --no-whole-file --inplace '/mnt/pve/pve_ts809/private/114/' '/mnt/pve/pve_ts879/dump/vzdump-openvz-114-2014_03_13-11_24_12.tmp'' failed: exit code 23
INFO: Backup job finished with errors
[B][COLOR=#ff0000]TASK ERROR[/COLOR][/B]: job errors

Code:
INFO: starting new backup job: vzdump [B]114 [/B]--remove 0 --mode [B]snapshot [/B]--compress lzo --[B]storage pve_ts809[/B] --node pve1
...
INFO: starting final sync /mnt/pve/pve_ts809/private/114/ to /mnt/pve/pve_ts809/dump/vzdump-openvz-114-2014_03_13-11_34_15.tmp
...
INFO: total size is 441532565 speedup is 688.27
INFO: final sync finished (4 seconds)
INFO: resume vm
INFO: Resuming...
INFO: vm is online again after 4 seconds
INFO: creating archive '/mnt/pve/pve_ts809/dump/vzdump-openvz-114-2014_03_13-11_34_15.tar.lzo'
INFO: Total bytes written: 459786240 (439MiB, 31MiB/s)
INFO: archive file size: 227MB
INFO: Finished Backup of VM 114 (00:01:10)
INFO: Backup job finished successfully
[COLOR=#008000][B]TASK OK[/B][/COLOR]

Can anyone help me to sort out this issue?

Marco

hosts/storage details
===========================================================================

Code:
#pveversion -v
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-24 (running version: 3.1-24/060bd5a6)
pve-kernel-2.6.32-19-pve: 2.6.32-96
pve-kernel-2.6.32-26-pve: 2.6.32-114
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-9
libpve-access-control: 3.0-8
libpve-storage-perl: 3.0-18
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-6
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1

Code:
#cat /etc/pve/storage.cfg:
...
nfs: pve_ts879
        path /mnt/pve/pve_ts879
        server ts879
        export /PVE
        options vers=3,tcp,nolock,rsize=262144,wsize=262144
        content images,iso,vztmpl,rootdir,backup
        maxfiles 2

nfs: pve_ts809
        path /mnt/pve/pve_ts809
        server <ts809 IP>
        export /PVE
        options vers=3
        content images,iso,vztmpl,rootdir,backup
        maxfiles 2
...

Code:
#mount
...
ts879:/PVE on /mnt/pve/pve_ts879 type nfs (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=<ts879 IP>,mountvers=3,mountport=53850,mountproto=tcp,local_lock=all,addr=<ts879 IP>)
...
<ts809 IP>:/PVE on /mnt/pve/pve_ts809 type nfs (rw,relatime,vers=3,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=<ts809 IP>,mountvers=3,mountport=905,mountproto=udp,local_lock=none,addr=<ts809 IP>)
...
 
Re: vzdump CT error "Too many levels of symbolic links" when one storage is involved.

well I could not find why, but it was definitely that template... I just created another one from
debian-7.0-standard_7.0-2_i386.tar.gz

on the samenode, storage and everything as the CT 113 above
and I could backup it everywhere, the infamous error did not happen.

I found that on those ubuntu template the problem was proably with tar command, because

#tar -cf - > /dev/null

showed the exact same message see in failed baclup logs, both from pve shell, and from CT shell...

to me, remains a mistery...

Marco
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!