Backup errors

My only guess at this point about why your backups are slow is that you have a heavy I/O workload during the backup process. An 8-disk RAID-10 shouldn't be that slow to do backups. Maybe you don't have write caching on and your writes are really slow even though your reads are fast?

If you could maybe use an SMB or NFS mount to another server and do the vzdumps to that, that would probably help make it go faster.

I don't understand what exactly you mean by your server is full with one 20GB container. You mean your disk I/O capabilities are utilized or your free disk space is all utilized or what?

If your disks are being beaten on that much you may need to consider one of my earlier suggestions about getting a faster I/O subsystem setup. You could throw faster 10k RPM Raptor drives at the problem (using the existing SATA RAID controller) or setup a fast NAS server to place your containers on.

Thanks Tog, I appreciate your help.

I've checked the cronjobs running in the container, they run later then the backups. I don't know about write caching, I have to check that on the raidcontroller. But it would mean all three nodes I currently run are misconfigured. I can imagine I did something wrong with the 8 disk server, but the other two raid-1 servers are standard adaptec controllers, not much you can do wrong there. I'll check it out anyway.

I don't want to buy more expensive hardware because the current hardware is underperforming for some mysterious reason. Changes are the SAS disk wil underperform also, even if they run quicker then the current SATA disks. I'd rather solve the problem.
I can't imagine I'm the only one running into this.
 
Well it seems write cache on the 8 disk is disabled:

Areca support have confirmed the following:
If you are set to Auto and you have a BBU, then the cache on the actual disk drives is disabled.
If you are set to Auto and you do not have a BBU, then the cache on the actual disk drives is enabled.

Standard it is disabled because of the BBU...Does that makes sense?

(The write cache on both raid-1 servers is enabled. i wonder what speeds others reach with vzdump on two sata 7200rpm drives in raid-1)
 
Perhaps your write speeds are really really terrible even though basic tests like pveperf are showing good read speeds. You definitely need to look into that. I can't tell you what performance is going to be like with a fancy controller like the areca coupled with disabled write caching on the disks, but I can tell you that disks attached direct to the system with write caching disabled has in the past caused write performance to go from 10-30MB/sec to like 2MB/sec for me when I was messing around with turning it off and on.

Might I suggest that you familiarize yourself with iozone. That will help you determine if your write speed is abnormally low.

This problem is basic Linux stuff, not Proxmox VE specific so you can google around for anybody voicing similar complaints for their Linux boxes in general with Adaptec and Areca controllers and it should be more or less relevant reading.
 
Thanks I certainly will follow your advise.

I must turn write cache on despite being disable standard should'nt I?
 
vzdump performance results on my test system

hi,

I just did some vzdump tests here.

My hardware:

  • Quad Xeon 32xx
  • Adaptec 3508 with BBU (write cache on the controller enabler, cache on the disks disabled)
  • 4 WD5001ABYS (500GB SATA), RAID 10

I am using the default proxmox ve installation.

pveperf performance results:

proxmox_ve:/# pveperf
CPU BOGOMIPS: 17027.42
REGEX/SECOND: 773887
HD SIZE: 94.49 GB (/dev/pve/root)
BUFFERED READS: 165.04 MB/sec
AVERAGE SEEK TIME: 12.73 ms
FSYNCS/SECOND: 1234.49
DNS EXT: 44.61 ms
DNS INT: 1.04 ms (proxmox.com)
proxmox_ve:/#


vzdump results (openVZ container, snapshot mode) to local disk:

105: Dec 09 10:52:02 INFO: Starting Backup of VM 105 (openvz)
105: Dec 09 10:52:02 INFO: status = CTID 105 exist mounted running
105: Dec 09 10:52:02 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
105: Dec 09 10:52:02 INFO: Logical volume "vzsnap" created
105: Dec 09 10:52:02 INFO: mounting lvm snapshot
105: Dec 09 10:52:15 INFO: creating archive '/backup/vzdump-105.dat' (/mnt/vzsnap/private/105)
105: Dec 09 10:56:42 INFO: Total bytes written: 3805020160 (3.6GiB, 16MiB/s)
105: Dec 09 10:56:42 INFO: file size 3.54GB
105: Dec 09 10:56:44 INFO: Logical volume "vzsnap" successfully removed
105: Dec 09 10:56:44 INFO: Finished Backup of VM 105 (00:04:42)

vzdump results (openVZ container, snapshot mode) to remote samba share, GBIT network:

105: Dec 09 10:58:01 INFO: Starting Backup of VM 105 (openvz)
105: Dec 09 10:58:01 INFO: status = CTID 105 exist mounted running
105: Dec 09 10:58:01 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
105: Dec 09 10:58:01 INFO: Logical volume "vzsnap" created
105: Dec 09 10:58:01 INFO: mounting lvm snapshot
105: Dec 09 10:58:16 INFO: creating archive '/mnt/backup/vzdump-105.dat' (/mnt/vzsnap/private/105)
105: Dec 09 11:01:33 INFO: Total bytes written: 3805030400 (3.6GiB, 22MiB/s)
105: Dec 09 11:01:37 INFO: file size 3.54GB
105: Dec 09 11:01:45 INFO: Logical volume "vzsnap" successfully removed
105: Dec 09 11:01:45 INFO: Finished Backup of VM 105 (00:03:44)

vzdump results (KVM VM, snapshot mode) to local disk:

126: Dec 09 11:23:01 INFO: Starting Backup of VM 126 (qemu)
126: Dec 09 11:23:01 INFO: status = running
126: Dec 09 11:23:01 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
126: Dec 09 11:23:01 INFO: Logical volume "vzsnap" created
126: Dec 09 11:23:01 INFO: mounting lvm snapshot
126: Dec 09 11:23:02 INFO: creating archive '/backup/vzdump-126.dat' (/mnt/vzsnap/images/126)
126: Dec 09 11:23:02 INFO: qemu-server.conf
126: Dec 09 11:23:02 INFO: vm-126-disk-1.qcow2
126: Dec 09 11:23:52 INFO: vm-126-disk.qcow2
126: Dec 09 11:33:14 INFO: Total bytes written: 23481856000 (22GiB, 37MiB/s)
126: Dec 09 11:33:14 INFO: file size 21.87GB
126: Dec 09 11:33:20 INFO: Logical volume "vzsnap" successfully removed
126: Dec 09 11:33:20 INFO: Finished Backup of VM 126 (00:10:19)


Conclusions:
  • remote vzdump target speed up performance
  • a lot of small files (containers) decrease performance (as the backup of KVM disks is much faster) due to a decreasing/bad disk access times

How to improve performance:
  • get the fastest hard disk possible, SAS 15krmp recommended
  • get the fastest hardware raid controller
  • use a remote vzdump backup target
  • fast network, at least single GBIT, better bond 2 GBIT
  • make sure that no other intensive processes runs during the backup window
 
I turned on disk write cache.
The performance of the system was ever lower then before because of these errors:

Dec 14 02:06:45 INFO: tar: ./var/qmail/mailnames/domain.net/finale/Maildir/new/1203621330.80758.host.domain.com: Unknown file type; file ignored
Dec 14 02:06:45 INFO: tar: ./var/qmail/mailnames/domain.net/finale/Maildir/new/1165843650.2049.host.domain.com: Warning: Cannot stat: No such file or directory
Dec 14 02:06:45 INFO: tar: ./var/qmail/mailnames/domain.net/finale/Maildir/new/1203324587.93056.host.domain.com: Unknown file type; file ignored
Dec 14 02:06:46 INFO: tar: ./var/qmail/mailnames/domain.net/finale/Maildir/new/1207596047.26913.host.domain.com: Unknown file type; file ignored
Dec 14 02:06:49 INFO: tar: ./var/qmail/mailnames/domain.net/finale/Maildir/new/1171365517.24186.host.domain.com: Warning: Cannot stat: No such file or director

the log is 48K and the errors last an hour.
Dec 14 03:06:46 INFO: Total bytes written: 21967626240 (21GiB, 2.8MiB/s)

I understand from Dietmar in a previous post thet you had a simular error. Did you found out what caused it?
 
II understand from Dietmar in a previous post thet you had a simular error. Did you found out what caused it?

The snapshot size was too small here. After increasing it the problem disapeared. We use similar disks, but performance is much better (see posts from tom).

I guess you should test and optimize disk IO performance without vzdump/snapshots first.

- Dietmar
 
I will try and setup a NFS using this guide http://wiki.openvz.org/NFS

But in the meantime i would like to hear from other people who are running a Plesk server and backup using snapshot on a local disk, what their performances are.
 
This is my Plesk openvz container:

Code:
# cat /backup/vzdump-102.log
Dec 18 01:02:46 INFO: Starting Backup of VM 102 (openvz)
Dec 18 01:02:46 INFO: status = CTID 102 exist mounted running
Dec 18 01:02:47 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
Dec 18 01:02:47 INFO:   Logical volume "vzsnap" created
Dec 18 01:02:47 INFO: mounting lvm snapshot
Dec 18 01:02:51 INFO: creating archive '/backup/vzdump-102.dat' (/mnt/vzsnap/private/102)
Dec 18 02:03:56 INFO: Total bytes written: 25380177920 (24GiB, 7.1MiB/s)
Dec 18 02:03:56 INFO: file size 14.74GB
Dec 18 02:04:31 INFO:   Logical volume "vzsnap" successfully removed
Dec 18 02:04:31 INFO: Finished Backup of VM 102 (01:01:45)
 
This is my performance,
i still use a 100MB switch for the backup.
But the snapshots are not that large on this node.

Backups are written to a NFS share.

101: Dec 18 01:30:02 INFO: Starting Backup of VM 101 (qemu)
101: Dec 18 01:30:02 INFO: status = running
101: Dec 18 01:30:02 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
101: Dec 18 01:30:02 INFO: Logical volume "vzsnap" created
101: Dec 18 01:30:02 INFO: mounting lvm snapshot
101: Dec 18 01:30:03 INFO: creating archive '/backup/vzdump-101.dat' (/mnt/vzsnap/images/101)
101: Dec 18 01:30:03 INFO: qemu-server.conf
101: Dec 18 01:30:03 INFO: vm-101-disk.qcow2
101: Dec 18 01:46:08 INFO: Total bytes written: 9179504640 (8.6GiB, 9.1MiB/s)
101: Dec 18 01:46:10 INFO: file size 4.92GB
101: Dec 18 01:46:19 INFO: Logical volume "vzsnap" successfully removed
101: Dec 18 01:46:20 INFO: Finished Backup of VM 101 (00:16:18)

102: Dec 18 01:46:20 INFO: Starting Backup of VM 102 (qemu)
102: Dec 18 01:46:20 INFO: status = running
102: Dec 18 01:46:21 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
102: Dec 18 01:46:21 INFO: Logical volume "vzsnap" created
102: Dec 18 01:46:21 INFO: mounting lvm snapshot
102: Dec 18 01:46:22 INFO: creating archive '/backup/vzdump-102.dat' (/mnt/vzsnap/images/102)
102: Dec 18 01:46:22 INFO: qemu-server.conf
102: Dec 18 01:46:22 INFO: vm-102-disk.qcow2
102: Dec 18 01:51:36 INFO: Total bytes written: 2750023680 (2.6GiB, 8.4MiB/s)
102: Dec 18 01:51:38 INFO: file size 1.30GB
102: Dec 18 01:51:41 INFO: Logical volume "vzsnap" successfully removed
102: Dec 18 01:51:41 INFO: Finished Backup of VM 102 (00:05:21)

107: Dec 18 01:51:41 INFO: Starting Backup of VM 107 (openvz)
107: Dec 18 01:51:41 INFO: status = CTID 107 exist mounted running
107: Dec 18 01:51:41 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
107: Dec 18 01:51:41 INFO: Logical volume "vzsnap" created
107: Dec 18 01:51:41 INFO: mounting lvm snapshot
107: Dec 18 01:51:45 INFO: creating archive '/backup/vzdump-107.dat' (/mnt/vzsnap/private/107)
107: Dec 18 01:53:32 INFO: Total bytes written: 528465920 (504MiB, 5.1MiB/s)
107: Dec 18 01:53:34 INFO: file size 248MB
107: Dec 18 01:53:35 INFO: Logical volume "vzsnap" successfully removed
107: Dec 18 01:53:35 INFO: Finished Backup of VM 107 (00:01:54)
 
Last edited:
Well these are not all that different from mine. But offcourse I should have asked to list the systems used.

This morning my backupfile is suddenly 10GB smaller (no gz used), vzdump does not list the speed but says the backup succeeded.
:confused:
 
Again, you should track down the problem step by step. Test without vzdump first.

Also post the logs (why does vzdump does not list the speed?)

- Dietmar
 
Dec 19 01:00:02 INFO: Starting Backup of VM 107 (openvz)
Dec 19 01:00:02 INFO: status = CTID 107 exist mounted running
Dec 19 01:00:02 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
Dec 19 01:00:03 INFO: Logical volume "vzsnap" created
Dec 19 01:00:03 INFO: mounting lvm snapshot
Dec 19 01:00:04 INFO: creating archive '/backup/vzdump-107.dat' (/mnt/vzsnap/private/107)
Dec 19 01:18:14 INFO: tar: ./var/qmail/mailnames/domain.nl/finale/Maildir/new/1218359565.43191.host.server.com: Unknown file type; file ignored
Dec 19 01:18:25 INFO: tar: ./var/qmail/mailnames/domain.nl/finale/Maildir/new/1218364669.62665.host.server.com: Warning: Cannot stat: No such file or directory

---snip errors about filenames, these files indeed exist and have long names like "23442341.=s,qwef.w" wich tar doesn't like------

Dec 19 01:58:23 INFO: file size 10.70GB
Dec 19 01:58:53 INFO: Logical volume "vzsnap" successfully removed
Dec 19 01:58:53 INFO: Finished Backup of VM 107 (00:58:51)

If I look at the speeds other people get with Plesk backups I think the speed is maybe not quite as low as it seemed first. This plesk container has 200 websites wich sometimes have mailboxes with 500MB mail. Thats a lot of files...
But you're absolutly right I should test that with a benchmark.
Its just that this erratic behaviour of the backups makes me nervous.

On the bright site, I tested a restore of this container on the same server and it was restored in less then 15 minutes. Before PMVE it would take a whole day!
 
If I look at the speeds other people get with Plesk backups I think the speed is maybe not quite as low as it seemed first.

vzdump has a bandwidth limit default of 10MB/s. So the posted results may not show maximum possible performance (see --bwlimit option).

- Dietmar
 
Well that explains why a test with a KVM backup did not exceeded 10MB/s.
In the logs other people posted I see that the bigger the container/backup is the slower it gets. Until today my backups where 21GB...

Any idea why the log did not say anything about the speed tonight?
That it is only half the normal size gives me the impression that it was interrupted somehow.
 
I've set the backup on 'suspend' now.

This is the influence of size on speed.
Two Plesk containers on the same node (two sata disks, raid-1).

vzdump --quiet --node 1 --snapshot --dumpdir /backup --mailto support@mydomain.com 103 104

103: Dec 19 01:00:03 INFO: Starting Backup of VM 103 (openvz)
103: Dec 19 01:00:03 INFO: status = CTID 103 exist mounted running
103: Dec 19 01:00:03 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
103: Dec 19 01:00:04 INFO: Logical volume "vzsnap" created
103: Dec 19 01:00:04 INFO: mounting lvm snapshot
103: Dec 19 01:00:10 INFO: creating archive '/backup/vzdump-103.dat' (/mnt/vzsnap/private/103)
103: Dec 19 03:09:55 INFO: Total bytes written: 21053685760 (20GiB, 2.7MiB/s)
103: Dec 19 03:09:55 INFO: file size 19.61GB
103: Dec 19 03:10:38 INFO: Logical volume "vzsnap" successfully removed
103: Dec 19 03:10:38 INFO: Finished Backup of VM 103 (02:10:35)

104: Dec 19 03:10:38 INFO: Starting Backup of VM 104 (openvz)
104: Dec 19 03:10:38 INFO: status = CTID 104 exist mounted running
104: Dec 19 03:10:38 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
104: Dec 19 03:10:38 INFO: Logical volume "vzsnap" created
104: Dec 19 03:10:38 INFO: mounting lvm snapshot
104: Dec 19 03:10:44 INFO: creating archive '/backup/vzdump-104.dat' (/mnt/vzsnap/private/104)
104: Dec 19 03:15:10 INFO: Total bytes written: 2451630080 (2.3GiB, 9.6MiB/s)
104: Dec 19 03:15:10 INFO: file size 2.28GB
104: Dec 19 03:15:16 INFO: Logical volume "vzsnap" successfully removed
104: Dec 19 03:15:16 INFO: Finished Backup of VM 104 (00:04:38)

(20GiB, 2.7MiB/s)
(2.3GiB, 9.6MiB/s)

I guess I have less problems then I feared.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!