LVM Sanphot bakup problems

rahman

Renowned Member
Nov 1, 2010
63
0
71
Hi, I am having problem with snaphot backups. The backup process stalls just before finising. It is reproducible every time with large VMs. I started snapshot back of 200 GB single disk VM at 18:25:

Code:
 [COLOR=#000000][FONT=tahoma]INFO: starting new backup job: vzdump 121 --remove 0 --mode snapshot --compress lzo --storage NFS_NexentaStor --node kvm44[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 121 (qemu)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: trying to remove stale snapshot '/dev/ST4x2000/vzsnap-kvm44-0'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-kvm44-0" created[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/mnt/pve/NFS_NexentaStor/dump/vzdump-qemu-121-2012_05_10-18_25_36.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/pve/NFS_NexentaStor/dump/vzdump-qemu-121-2012_05_10-18_25_36.tmp/qemu-server.conf' to archive ('qemu-server.conf')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/dev/ST4x2000/vzsnap-kvm44-0' to archive ('vm-disk-ide0.raw')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Total bytes written: 215822109184 (126.66 MiB/s)[/FONT][/COLOR]

in a few 10 minutes, as you see, it was almost finished. On the NFS server there were vzdump-qemu-121-2012_05_10-18_25_36.tar.dat, vzdump-qemu-121-2012_05_10-18_25_36.tmp files. Even the proxmox UI reorted INFO: Total bytes written: 215822109184 (126.66 MiB/s) line, a few "ls -al" command on the NFS server showed that proxmox still writing very slowly. The tar.dat file was still groving but with very slow speed ( a few KB/s). It eventualy stoped growing in a few hours then nothing happens. VZdump process seems stalled and never complete. Even a few days later. And "Stop" button in the GUI doesn't cancel the backup.


There are plenty of lzo errors in syslog. They start about 20 min after backup started. It seems the cause for very slow writing speed just before it stalls:
Code:
May 10 18:54:58 kvm44 kernel: INFO: task lzop:11558 blocked for more than 120 seconds.May 10 18:54:58 kvm44 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 10 18:54:58 kvm44 kernel: lzop          D ffff88063bab0480     0 11558  11556    0 0x00000000
May 10 18:54:58 kvm44 kernel: ffff88044fd23b68 0000000000000046 ffff88044fd23ae8 ffffffffa03e6ec0
May 10 18:54:58 kvm44 kernel: ffff8807932d99c0 ffff880634f9c9c0 ffff88044f000001 00000000000007a8
May 10 18:54:58 kvm44 kernel: ffffffffa03f51a8 ffff88063bab0a20 ffff88044fd23fd8 ffff88044fd23fd8
May 10 18:54:58 kvm44 kernel: Call Trace:
May 10 18:54:58 kvm44 kernel: [<ffffffff81120bf0>] ? sync_page+0x0/0x50
May 10 18:54:58 kvm44 kernel: [<ffffffff81512663>] io_schedule+0x73/0xc0
May 10 18:54:58 kvm44 kernel: [<ffffffff81120c2d>] sync_page+0x3d/0x50
May 10 18:54:58 kvm44 kernel: [<ffffffff8151302f>] __wait_on_bit+0x5f/0x90
May 10 18:54:58 kvm44 kernel: [<ffffffff81120de3>] wait_on_page_bit+0x73/0x80
May 10 18:54:58 kvm44 kernel: [<ffffffff810944f0>] ? wake_bit_function+0x0/0x40
May 10 18:54:58 kvm44 kernel: [<ffffffff811392f5>] ? pagevec_lookup_tag+0x25/0x40
May 10 18:54:58 kvm44 kernel: [<ffffffff811212eb>] wait_on_page_writeback_range+0xfb/0x190
May 10 18:54:58 kvm44 kernel: [<ffffffff811214b8>] filemap_write_and_wait_range+0x78/0x90
May 10 18:54:58 kvm44 kernel: [<ffffffff811bf75a>] vfs_fsync_range+0xba/0x190
May 10 18:54:58 kvm44 kernel: [<ffffffff811bf89d>] vfs_fsync+0x1d/0x20
May 10 18:54:58 kvm44 kernel: [<ffffffffa03aa920>] nfs_file_flush+0x70/0xa0 [nfs]
May 10 18:54:58 kvm44 kernel: [<ffffffff8118bd5c>] filp_close+0x3c/0x90
May 10 18:54:58 kvm44 kernel: [<ffffffff8106dddf>] put_files_struct+0x7f/0xf0
May 10 18:54:58 kvm44 kernel: [<ffffffff8106dea3>] exit_files+0x53/0x70
May 10 18:54:58 kvm44 kernel: [<ffffffff8106fa8d>] do_exit+0x1ad/0x920
May 10 18:54:58 kvm44 kernel: [<ffffffff81070258>] do_group_exit+0x58/0xd0
May 10 18:54:58 kvm44 kernel: [<ffffffff810702e7>] sys_exit_group+0x17/0x20
May 10 18:54:58 kvm44 kernel: [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b

The only way to cancel bakup is to reboot host :(

Code:
pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-66
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-15
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

01:00.0 RAID bus controller [0104]: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] [1000:0079] (rev 04)

I also tried Backup in stop mode, the same happens. From the old archives of this VM, I see that a successful lzo backup is about 18 GB. I watch transfer rate on NFS server with "ls -al" and see that the tar.dat file grows about ~100-120 MB/s but when it reach 17 GB it slows down to a few KB/s and some hours later it stops to grow at ~18 GB.

VM's with small HDDs work without a problem with both backup mode btw. Oh, there is also no free space problem as the VG has over 2 TB free space.
 
Update: Backups to a local directory (backed by FC-SAN + ext3) works OK both for stop and snaphot modes. So it seems something wrong with NFS. Any advice?

Update 2: I created a cifs share on NFS server and added it to proxmox as directory storage. LVM snapshot backup succeed in about 30 mins without any problem :) So it seems there is problem with nfs?
 
Last edited:
Dou you use nfs4 or nfs3? I use nfs3 here. Didn't tried nfs4 yet. This is really annoying. I will continue to use cifs as the backup backend until this issue is solved.
 
We are using NFS running on a Proxmox 2.1 system for backups with out problems. the backup file system is ext3.

our largest backups are 20GB .

Is anyone having issues with nfs on proxmox ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!