VZDump backup fails

gkovacs · Jan 29, 2012

Code:

Jan 28 23:00:03 INFO: Starting Backup of VM 102 (openvz)
Jan 28 23:00:03 INFO: CTID 102 exist mounted running
Jan 28 23:00:03 INFO: status = CTID 102 exist mounted running
Jan 28 23:00:04 INFO: backup mode: snapshot
Jan 28 23:00:04 INFO: ionice priority: 7
Jan 28 23:00:04 INFO: creating lvm snapshot of /dev/mapper/pve-root ('/dev/pve/vzsnap-proxmox2-0')
Jan 28 23:00:37 INFO:   Logical volume "vzsnap-proxmox2-0" created
Jan 28 23:00:40 INFO: creating archive '/mnt/pve/backup/vzdump-openvz-102-2012_01_28-23_00_03.tar'
Jan 29 00:41:39 INFO: Total bytes written: 54316113920 (51GiB, ?/s)
[COLOR=#ff0000]Jan 29 00:41:39 INFO: tar: -: Cannot write: Invalid argument[/COLOR]
Jan 29 00:41:39 INFO: tar: Error is not recoverable: exiting now
Jan 29 00:41:41 INFO:   Logical volume "vzsnap-proxmox2-0" successfully removed
Jan 29 00:41:46 ERROR: Backup of VM 102 failed - command '(cd /mnt/vzsnap0/var/lib/vz/private/102;find . '(' -regex '^\.$' ')' -o '(' -type 's' -prune ')' -o -print0|sed 's/\\/\\\\/g'|tar cpf - --totals --sparse --numeric-owner --no-recursion --ignore-failed-read --one-file-system --null -T -) >/mnt/pve/backup/vzdump-openvz-102-2012_01_28-23_00_03.dat' failed with exit code 2

This error keeps repeating itself for days now.
Anyone seen similar?

gkovacs · Mar 22, 2012

This problem keeps happening. We have a 70 GB OpenVZ VPS that VZDump is unable to backup for a week now, keeps stopping at 61GB.

Code:

Mar 22 00:05:02 INFO: Starting Backup of VM 102 (openvz)
Mar 22 00:05:02 INFO: CTID 102 exist mounted running
Mar 22 00:05:02 INFO: status = CTID 102 exist mounted running
Mar 22 00:05:03 INFO: backup mode: snapshot
Mar 22 00:05:03 INFO: ionice priority: 7
Mar 22 00:05:03 INFO: creating lvm snapshot of /dev/mapper/pve-root ('/dev/pve/vzsnap-proxmox2-0')
Mar 22 00:05:05 INFO:   Logical volume "vzsnap-proxmox2-0" created
Mar 22 00:05:06 INFO: creating archive '/mnt/pve/backup/vzdump-openvz-102-2012_03_22-00_05_02.tar'
[COLOR=#ff0000]Mar 22 02:51:57 INFO: Total bytes written: 64887572480 (61GiB, ?/s)
Mar 22 02:51:57 INFO: tar: -: Cannot write: Input/output error[/COLOR]
Mar 22 02:51:57 INFO: tar: Error is not recoverable: exiting now
Mar 22 02:51:59 INFO:   Logical volume "vzsnap-proxmox2-0" successfully removed
Mar 22 02:52:04 ERROR: Backup of VM 102 failed - command '(cd /mnt/vzsnap0/var/lib/vz/private/102;find . '(' -regex '^\.$' ')' -o '(' -type 's' -prune ')' -o -print0|sed 's/\\/\\\\/g'|tar cpf - --totals --sparse --numeric-owner --no-recursion --ignore-failed-read --one-file-system --null -T -) >/mnt/pve/backup/vzdump-openvz-102-2012_03_22-00_05_02.dat' failed with exit code 2

- Snapshot is set at 8GB, so it's not full
- There is ample free disk space on both the PVE volume and the NFS storage
- VZDump has no problem with smaller VPS's.

Any idea?

tom · Mar 22, 2012

how much space is used for the snapshot? check the size with 'lvdisplay'.

any details about the nfs server?

and make sure you run the latest Proxmox VE version, post 'pveversion -v'

gkovacs · Mar 23, 2012

tom said:
how much space is used for the snapshot? check the size with 'lvdisplay'.

I have manually set the snapshot to 8GB in /etc/vzdump.conf
This is the snapshot at around 56 GB of the backup... at 61 GB it failed again:

Code:

  --- Logical volume ---
  LV Name                /dev/pve/vzsnap-proxmox2-0
  VG Name                pve
  LV UUID                VUtjcY-BgHd-oUaf-YBX3-5iHz-WhQA-bEvzvV
  LV Write Access        read/write
  LV snapshot status     active destination for /dev/pve/root
  LV Status              available
  # open                 1
  LV Size                868.57 GB
  Current LE             222353
  COW-table size         8.00 GB
  COW-table LE           2048
  Allocated to snapshot  8.38%
  Snapshot chunk size    4.00 KB
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2

tom said:
any details about the nfs server?

It's haneWIN NFS Server 1.2 running on a Windows box, there is 400 GB free space. It handles all our other VPS backups fine.

tom said:
and make sure you run the latest Proxmox VE version, post 'pveversion -v'

We tried to backup this VPS on 2 PVE servers, one running 2.6.32-6 kernel, the other 2.6.32-4.
Both are md raid, running deadline scheduler.

Code:

proxmox2:~# pveversion -v
pve-manager: 1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-55+ovzfix-2
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-55+ovzfix-1
pve-kernel-2.6.32-7-pve: 2.6.32-55+ovzfix-2
qemu-server: 1.1-32
pve-firmware: 1.0-15
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-3pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-2
ksm-control-daemon: 1.0-6

Any chance of a manual run of vzdump with a verbose parameter?
Unfortunately the "tar: -: Cannot write: Invalid argument" is not enough to debug the situation.

dietmar · Mar 23, 2012

gkovacs said:
Any chance of a manual run of vzdump with a verbose parameter?
Unfortunately the "tar: -: Cannot write: Invalid argument" is not enough to debug the situation.

I also see:

Code:

Mar 22 02:51:57 INFO: tar: -: Cannot write: Input/output error

That look like a problem on the target storage (NFS storage full)? And hint in /var/log/syslog or the NFS server logs?

gkovacs · Apr 3, 2012

I have contacted the developer of the NFS software. He asked me if Proxmox VE uses NFS-2 or NFS-3, UDP or TCP and what blocksize?

dietmar · Apr 3, 2012

gkovacs said:
He asked me if Proxmox VE uses NFS-2 or NFS-3, UDP or TCP and what blocksize?

Proxmox VE is just a NFS client. So The question is what NFS server do you use, and what mount options? By default, we try to mount NFS-3

gkovacs · Apr 3, 2012

So I guess it's NFS-3 over TCP.
We use haneWIN NFS Server with the default options.

Is the mount command that PVE / VZDump uses accessible somewhere?
Can I modify it to check out different options, block sizes?

dietmar · Apr 3, 2012

gkovacs said:
Is the mount command that PVE / VZDump uses accessible somewhere?

You can see details with

# cat /proc/mounts

gkovacs said:
Can I modify it to check out different options, block sizes?

Yes, you can set the 'options' attribute in /etc/pve/storage.cfg (same options as described in 'man nfs')

gkovacs · Apr 4, 2012

During VZDump backups running on the server (kernel 2.6.32-6), I have found this in the syslog:

Code:

Apr  4 00:57:42 proxmox2 kernel: INFO: task tar:47725 blocked for more than 120 seconds.
Apr  4 00:57:42 proxmox2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  4 00:57:42 proxmox2 kernel: tar           D ffff8801004e6ee0     0 47725  47722    0 0x00000000
Apr  4 00:57:42 proxmox2 kernel: ffff88022022ba18 0000000000000082 0000000000000000 ffffffff8105412c
Apr  4 00:57:42 proxmox2 kernel: ffff88022b0ca780 000000000000f788 ffff88022022bfd8 ffff8801004e6ee0
Apr  4 00:57:42 proxmox2 kernel: ffff88022f1ccce0 ffff8801004e74a8 ffff88022fc08440 ffff8801004e74a8
Apr  4 00:57:42 proxmox2 kernel: Call Trace:
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff8105412c>] ? enqueue_task_fair+0x1c/0x60
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff8107ca3b>] ? lock_timer_base+0x3b/0x70
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff8107d70c>] ? try_to_del_timer_sync+0xac/0xe0
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff8107d762>] ? del_timer_sync+0x22/0x30
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff814fa029>] ? schedule_timeout+0x1d9/0x2d0
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff814f9a92>] io_schedule+0xb2/0x120
Apr  4 00:57:42 proxmox2 kernel: [<ffffffffa04468ee>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff814fa322>] __wait_on_bit+0x62/0x90
Apr  4 00:57:42 proxmox2 kernel: [<ffffffffa04468e0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
Apr  4 00:57:42 proxmox2 kernel: [<ffffffffa04468e0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff814fa3c9>] out_of_line_wait_on_bit+0x79/0x90
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff81093ea0>] ? wake_bit_function+0x0/0x50
Apr  4 00:57:42 proxmox2 kernel: [<ffffffffa04468cf>] nfs_wait_on_request+0x2f/0x40 [nfs]
Apr  4 00:57:42 proxmox2 kernel: [<ffffffffa044d198>] nfs_updatepage+0x2b8/0x540 [nfs]
Apr  4 00:57:42 proxmox2 kernel: [<ffffffffa043b4f1>] nfs_write_end+0x61/0x2c0 [nfs]
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff81121843>] generic_file_buffered_write+0x193/0x2b0
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff81120de0>] ? sync_page_killable+0x0/0x50
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff81123249>] __generic_file_aio_write+0x259/0x470
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff811a6310>] ? touch_atime+0x80/0x170
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff811e92f2>] ? inode_reserved_space+0x22/0x30
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff811234c2>] generic_file_aio_write+0x62/0xd0
Apr  4 00:57:42 proxmox2 kernel: [<ffffffffa043b1d4>] nfs_file_write+0x174/0x230 [nfs]
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff8118b399>] do_sync_write+0xf9/0x140
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff81093e60>] ? autoremove_wake_function+0x0/0x40
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff8118b9eb>] vfs_write+0xcb/0x1a0
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff8118bbb5>] sys_write+0x55/0x90
Apr  4 00:57:42 proxmox2 kernel: [<ffffffff8100b302>] system_call_fastpath+0x16/0x1b

According to the timestamp it is not connected to any backups failing, so probably unrelated.
Still, why does the kernel think tar is frozen?

dietmar · Apr 4, 2012

gkovacs said:
Still, why does the kernel think tar is frozen?

This also seems to be NFS related. What NFS server do you use exactly?

gkovacs · Apr 4, 2012

dietmar said:
This also seems to be NFS related. What NFS server do you use exactly?

We are using haneWIN NFS Server:
http://www.hanewin.net/nfs-e.htm

Worked flawlessly for 2 years.

gkovacs · Apr 5, 2012

We have an interesting discovery: when the backups are saved to a fast disk array, then they fail after 40-50 GB transferred:

Code:

[COLOR=#b22222]Apr 04 00:36:47 INFO: creating archive '/mnt/pve/nfs1/vzdump-openvz-106-2012_04_04-00_36_36.tar'
Apr 04 02:07:00 INFO: Total bytes written: 52308582400 (49GiB, ?/s)
Apr 04 02:07:00 INFO: tar: -: Cannot write: Invalid argument
Apr 04 02:07:00 INFO: tar: Error is not recoverable: exiting now[/COLOR]

But when backups are sent to a slow array, then there is no problem reported by tar / vzdump:

Code:

[COLOR=#006400]Apr 05 02:01:24 INFO: creating archive '/mnt/pve/nfs2/vzdump-openvz-106-2012_04_05-02_01_21.tar'
Apr 05 07:57:10 INFO: Total bytes written: 55236802560 (52GiB, 2.5MiB/s)
Apr 05 07:58:21 INFO: archive file size: 51.44GB[/COLOR]

Most likely this is a buffering problem on the receiving end (same Windows NFS server running on Hyper-V was used with 2 shares).

After that I immediately realized there is a bandwidth limit option in vzdump (--bwlimit).
What does bwlimit do? Does it affect network IO?

dietmar · Apr 5, 2012

gkovacs said:
What does bwlimit do? Does it affect network IO?

The number of bytes written (KiloBytes/Second).

Search

Search

VZDump backup fails

gkovacs

Renowned Member

gkovacs

Renowned Member

tom

Proxmox Staff Member

gkovacs

Renowned Member

dietmar

Proxmox Staff Member

gkovacs

Renowned Member

dietmar

Proxmox Staff Member

gkovacs

Renowned Member

dietmar

Proxmox Staff Member

gkovacs

Renowned Member

dietmar

Proxmox Staff Member

gkovacs

Renowned Member

gkovacs

Renowned Member

dietmar

Proxmox Staff Member