LVM Snapshot Question

Lymond

Renowned Member
Oct 7, 2009
23
3
68
I've got a shared storage device over iSCSI and Proxmox is mounting a guest over LVM (/dev/proxvg/vm-106-disk-1). However, it lists the LVM snapshot area as inactive and backups for that 20 GB guest take days:

Code:
prox1:# lvdisplay
  --- Logical volume ---
  LV Name                /dev/proxvg/vm-106-disk-1
  VG Name                proxvg
  LV UUID                3MVsAO-5yO4-Tiqe-oin1-5KjD-u460-COpRf3
  LV Write Access        read/write
  LV snapshot status     source of
                         /dev/proxvg/vzsnap-prox2-0 [INACTIVE]
  LV Status              available
  # open                 0
  LV Size                20.00 GB
  Current LE             5120
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:2

  --- Logical volume ---
  LV Name                /dev/proxvg/vzsnap-prox2-0
  VG Name                proxvg
  LV UUID                GM0xn6-2cBV-FiV8-hJ0B-eCfg-cC09-ELQTxE
  LV Write Access        read/write
  LV snapshot status     INACTIVE destination for /dev/proxvg/vm-106-disk-1
  LV Status              NOT available
  LV Size                20.00 GB
  Current LE             5120
  COW-table size         1.00 GB
  COW-table LE           256
  Snapshot chunk size    4.00 KB
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
What's the proper method here for creating backups of LVM storage through proxmox?
 
I've got a shared storage device over iSCSI and Proxmox is mounting a guest over LVM (/dev/proxvg/vm-106-disk-1). However, it lists the LVM snapshot area as inactive

I guess you have a cluster setup with more than one node? The snapshot is only active on the node the VM runs on.

and backups for that 20 GB guest take days:

Please can you post the backup logs?

What's the proper method here for creating backups of LVM storage through proxmox?

I guess you already use vzdump?
 
I guess you have a cluster setup with more than one node? The snapshot is only active on the node the VM runs on.

Correct. Two clustered nodes. VM in question is on an LVM volume over iSCSI shared storage on 1 Gb switch. VM is running on the slave node.

Please can you post the backup logs?

Yep, here you go:

Code:
Nov 14 13:30:02 INFO: Starting Backup of VM 106 (qemu)
Nov 14 13:30:02 INFO: running
Nov 14 13:30:02 INFO: status = running
Nov 14 13:30:03 INFO: backup mode: snapshot
Nov 14 13:30:03 INFO: bandwidth limit: 10240 KB/s
Nov 14 13:30:03 INFO: trying to remove stale snapshot '/dev/proxvg/vzsnap-prox2-0'
Nov 14 13:30:03 INFO:   Logical volume "vzsnap-prox2-0" successfully removed
Nov 14 13:30:11 INFO:   Logical volume "vzsnap-prox2-0" created
Nov 14 13:30:11 INFO: creating archive '/mnt/kvm-backups/vzdump-qemu-106-2009_11_14-13_30_02.tgz'
Nov 14 13:30:11 INFO: adding '/mnt/kvm-backups/vzdump-qemu-106-2009_11_14-13_30_02.tmp/qemu-server.conf' to
 archive ('qemu-server.conf')
Nov 14 13:30:11 INFO: adding '/dev/proxvg/vzsnap-prox2-0' to archive ('vm-disk-ide0.raw')
Nov 16 13:25:43 INFO: Total bytes written: 10920847360 (0.06 MiB/s)
Nov 16 13:25:45 INFO: archive file size: 7.10GB
Nov 16 13:25:47 INFO:   Logical volume "vzsnap-prox2-0" successfully removed
Nov 16 13:25:47 INFO: Finished Backup of VM 106 (47:55:45)

I guess you already use vzdump?

I'm using the web-based backup options. Snapshot is checked. Backing up to a mounted CIFS system on the same Gb switch. Which, in retrospect, may be the problem...the VM is running and being backed up over the same line between the slave and the switch. Should the shared storage (iSCSI) be using a separate private switch and NIC on the cluster nodes while the backup storage is using a public switch?

The 30 GB backup of a VM local on the master node runs in about 2 hours versus 7 GB in 2 days for the VM on the shared storage.
 
In Xen disk images what you see as the VPS filesystem are exported from the host server the dom0 to the VPS domU.

There are a few different ways to do that. e.g. file: / loopback mounted file, tap:aio / tapdisk, and phy: / physical LVM devices.

We have run a few disk IO benchmarks. From them we conclude that it is hard to do good disk IO benchmarks on VPS i.e. it is hard to reconcile some of the test results to what we know happens on real live servers.

But the test results do tend to indicate that the LVM physical file systems perform very well.

Plus they have the benefit that when we run the VPS backups instead of having to pause the VPS while we copy the whole filesystem we just need to pause the VPS while we run an LVM snapshot which takes about a second. And then copy the data out of the snapshot which takes the regular time, but during that time the VPS is running as normal.

On newer hosts we are doing the LVM setup. e.g. on host706 and newer.

If the LVM devices continue to works out really well, then we will consider converting some of the older hosts to this method. That would require a bit of downtime. So we would try to balance the benefit of the upgrade against any unnecessary or frequent service interruptions.
 
Code:
Nov 16 13:25:47 INFO: Finished Backup of VM 106 (47:55:45)
The 30 GB backup of a VM local on the master node runs in about 2 hours versus 7 GB in 2 days for the VM on the shared storage.

Please can you post the log from the backup which fails (takes 2 days)?
 
That was the failed (2 day) backup:

Code:
Nov 16 13:25:47 INFO: Finished Backup of VM 106 (47:55:45)
47 hours = 2 days.

Anyway, I rebooted the slave node and the network picked back up again. I'm not sure what causes it to keep slowing down -- maybe the starting of a particular VM (though no VMs were running at the time of backup so it must be a lingering effect). I've got an older post that refers to this:

http://proxmox.com/forum/showthread.php?t=2569

My latest log after reboot shows it was completed in around 30 minutes which sounds about right. I just need to figure out why the network seems to slow down so much. Here's the latest backup log from the LVM.

Code:
prox2:/mnt/kvm-backups# more vzdump-qemu-106-2009_11_18-09_20_02.log
Nov 18 09:20:02 INFO: Starting Backup of VM 106 (qemu)
Nov 18 09:20:02 INFO: stopped
Nov 18 09:20:02 INFO: status = stopped
Nov 18 09:20:02 INFO: backup mode: stop
Nov 18 09:20:02 INFO: bandwidth limit: 10240 KB/s
Nov 18 09:20:02 INFO: creating archive '/mnt/kvm-backups/vzdump-qemu-106-2009_11_18-09_20_02.tar'
Nov 18 09:20:02 INFO: adding '/mnt/kvm-backups/vzdump-qemu-106-2009_11_18-09_20_02.tmp/qemu-server.conf' to archive ('
qemu-server.conf')
Nov 18 09:20:02 INFO: adding '/dev/proxvg/vm-106-disk-1' to archive ('vm-disk-ide0.raw')
Nov 18 09:52:42 INFO: Total bytes written: 10922061312 (5.31 MiB/s)
Nov 18 09:56:19 INFO: archive file size: 10.17GB
Nov 18 09:56:23 INFO: delete old backup '/mnt/kvm-backups/vzdump-qemu-106-2009_11_14-13_30_02.tgz'
Nov 18 09:56:31 INFO: Finished Backup of VM 106 (00:36:29)
 
47 hours = 2 days.

Oh, sorry - I can see it now ;-)

Anyway, I rebooted the slave node and the network picked back up again. I'm not sure what causes it to keep slowing down

I thought everything is on local disk? So why should a slow network influence the vzdump backup time?
 
My setup is two clustered Proxmox nodes. I've just added some shared storage over iSCSI with LVM. When I was first trying to back up the VM stored on the shared storage, it took 2 days. I rebooted the slave node that the VM would be running from (and presumably backed up through). Network performance picked right up and 2 days became 30 minutes.

But as Dietmar says, I didn't do anything special. Just added a backup job(which actually uses a separate CIFS server -- Windows server share -- on the same switch) for that particular VM and set it to run. Seems to work fine (though honestly, I haven't tried a restore yet).