LVM Sanphot bakup problems

rahman · May 11, 2012

Hi, I am having problem with snaphot backups. The backup process stalls just before finising. It is reproducible every time with large VMs. I started snapshot back of 200 GB single disk VM at 18:25:

Code:

 [COLOR=#000000][FONT=tahoma]INFO: starting new backup job: vzdump 121 --remove 0 --mode snapshot --compress lzo --storage NFS_NexentaStor --node kvm44[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 121 (qemu)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: trying to remove stale snapshot '/dev/ST4x2000/vzsnap-kvm44-0'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-kvm44-0" created[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/mnt/pve/NFS_NexentaStor/dump/vzdump-qemu-121-2012_05_10-18_25_36.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/pve/NFS_NexentaStor/dump/vzdump-qemu-121-2012_05_10-18_25_36.tmp/qemu-server.conf' to archive ('qemu-server.conf')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/dev/ST4x2000/vzsnap-kvm44-0' to archive ('vm-disk-ide0.raw')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Total bytes written: 215822109184 (126.66 MiB/s)[/FONT][/COLOR]

in a few 10 minutes, as you see, it was almost finished. On the NFS server there were vzdump-qemu-121-2012_05_10-18_25_36.tar.dat, vzdump-qemu-121-2012_05_10-18_25_36.tmp files. Even the proxmox UI reorted INFO: Total bytes written: 215822109184 (126.66 MiB/s) line, a few "ls -al" command on the NFS server showed that proxmox still writing very slowly. The tar.dat file was still groving but with very slow speed ( a few KB/s). It eventualy stoped growing in a few hours then nothing happens. VZdump process seems stalled and never complete. Even a few days later. And "Stop" button in the GUI doesn't cancel the backup.

There are plenty of lzo errors in syslog. They start about 20 min after backup started. It seems the cause for very slow writing speed just before it stalls:

Code:

May 10 18:54:58 kvm44 kernel: INFO: task lzop:11558 blocked for more than 120 seconds.May 10 18:54:58 kvm44 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 10 18:54:58 kvm44 kernel: lzop          D ffff88063bab0480     0 11558  11556    0 0x00000000
May 10 18:54:58 kvm44 kernel: ffff88044fd23b68 0000000000000046 ffff88044fd23ae8 ffffffffa03e6ec0
May 10 18:54:58 kvm44 kernel: ffff8807932d99c0 ffff880634f9c9c0 ffff88044f000001 00000000000007a8
May 10 18:54:58 kvm44 kernel: ffffffffa03f51a8 ffff88063bab0a20 ffff88044fd23fd8 ffff88044fd23fd8
May 10 18:54:58 kvm44 kernel: Call Trace:
May 10 18:54:58 kvm44 kernel: [<ffffffff81120bf0>] ? sync_page+0x0/0x50
May 10 18:54:58 kvm44 kernel: [<ffffffff81512663>] io_schedule+0x73/0xc0
May 10 18:54:58 kvm44 kernel: [<ffffffff81120c2d>] sync_page+0x3d/0x50
May 10 18:54:58 kvm44 kernel: [<ffffffff8151302f>] __wait_on_bit+0x5f/0x90
May 10 18:54:58 kvm44 kernel: [<ffffffff81120de3>] wait_on_page_bit+0x73/0x80
May 10 18:54:58 kvm44 kernel: [<ffffffff810944f0>] ? wake_bit_function+0x0/0x40
May 10 18:54:58 kvm44 kernel: [<ffffffff811392f5>] ? pagevec_lookup_tag+0x25/0x40
May 10 18:54:58 kvm44 kernel: [<ffffffff811212eb>] wait_on_page_writeback_range+0xfb/0x190
May 10 18:54:58 kvm44 kernel: [<ffffffff811214b8>] filemap_write_and_wait_range+0x78/0x90
May 10 18:54:58 kvm44 kernel: [<ffffffff811bf75a>] vfs_fsync_range+0xba/0x190
May 10 18:54:58 kvm44 kernel: [<ffffffff811bf89d>] vfs_fsync+0x1d/0x20
May 10 18:54:58 kvm44 kernel: [<ffffffffa03aa920>] nfs_file_flush+0x70/0xa0 [nfs]
May 10 18:54:58 kvm44 kernel: [<ffffffff8118bd5c>] filp_close+0x3c/0x90
May 10 18:54:58 kvm44 kernel: [<ffffffff8106dddf>] put_files_struct+0x7f/0xf0
May 10 18:54:58 kvm44 kernel: [<ffffffff8106dea3>] exit_files+0x53/0x70
May 10 18:54:58 kvm44 kernel: [<ffffffff8106fa8d>] do_exit+0x1ad/0x920
May 10 18:54:58 kvm44 kernel: [<ffffffff81070258>] do_group_exit+0x58/0xd0
May 10 18:54:58 kvm44 kernel: [<ffffffff810702e7>] sys_exit_group+0x17/0x20
May 10 18:54:58 kvm44 kernel: [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b

The only way to cancel bakup is to reboot host

Code:

pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-66
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-15
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

01:00.0 RAID bus controller [0104]: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] [1000:0079] (rev 04)

I also tried Backup in stop mode, the same happens. From the old archives of this VM, I see that a successful lzo backup is about 18 GB. I watch transfer rate on NFS server with "ls -al" and see that the tar.dat file grows about ~100-120 MB/s but when it reach 17 GB it slows down to a few KB/s and some hours later it stops to grow at ~18 GB.

VM's with small HDDs work without a problem with both backup mode btw. Oh, there is also no free space problem as the VG has over 2 TB free space.

rahman · May 11, 2012

Update: Backups to a local directory (backed by FC-SAN + ext3) works OK both for stop and snaphot modes. So it seems something wrong with NFS. Any advice?

Update 2: I created a cifs share on NFS server and added it to proxmox as directory storage. LVM snapshot backup succeed in about 30 mins without any problem

So it seems there is problem with nfs?

ccube · May 12, 2012

same problem here. while the LSI driver now seems to be stable, we are running in the next problems according NFS!

Any hints for this problem?

rahman · May 12, 2012

Dou you use nfs4 or nfs3? I use nfs3 here. Didn't tried nfs4 yet. This is really annoying. I will continue to use cifs as the backup backend until this issue is solved.

bread-baker · May 13, 2012

We are using NFS running on a Proxmox 2.1 system for backups with out problems. the backup file system is ext3.

our largest backups are 20GB .

Is anyone having issues with nfs on proxmox ?

Search

Search

LVM Sanphot bakup problems

rahman

Renowned Member

rahman

Renowned Member

ccube

Active Member

rahman

Renowned Member

bread-baker

Member

We value your privacy