Tweaking VZDump with Ceph Backend

NeRFaGe · Jan 5, 2015

Ok, lets start off with some background. We have been using Proxmox since v 1.x, and quite successfully I might add. We've been happy with the performance, useability, features, the whole 9 yards.

Our Current "production" setup:
11 Nodes total
6x Nodes = 2x16-Core AMD Opteron w/ 256GB of memory - storage is 2x SSDs mirrored for System, and 6x SSDs in RAID5 for images, all connected to LSI RAID controllers
2x Nodes = 2x Intel E5649 CPUs w/ 96GB of memory (MB Limited) and 4x SSDs in RAID5 connected to LSI controller
2x Nodes = 2x Intel E5-2687W w/ 256GB of memory - storage is 2x SSDs mirrored for System, and 6x SSDs in RAID5 for images, all connected to LSI RAID controllers
1x Node = 1x Intel Xeon 5420 w 16GB of memory - storage is 4x 3TB WD Red drives with a highpoint RAID controller (This node is used only for testing and template cloning)

All servers have either 2x or 4x 1GB network connections (Depending on chassis), all bonded using LACP w/ mLAG across multiple Cisco devices (allowing for the best combo of speed and reliability).

All image storage is local on this setup.

As I mentioned, this has worked very well for us for multiple years. Currently, we are running 178 KVM machines on this setup.

As you might imagine, we're beginning to outgrow our hardware, so beginning last year, we embarked on an initiative to build the next iteration of our cloud.

The new hardware is as follows:
4x Nodes = 2x Intel E5-2667v2 w/ 512 GB of memory and 8x SSDs in RAID5 attached to an LSI RAID controller and 6x Intel 10Gbe interfaces
1x Node = 2x Intel E5-2687W w/ 512GB of memory and 8x SSDs in RAID5 attached to an LSI RAID controller and 4x Intel 10Gbe interfaces
4x Ceph Storage Nodes = 1x Intel E5-2667v2 w/ 256GB of memory and 24x 1TB SSDs attached in pass-through mode for OSDs (2x SSDs mirrored for OS) and 10x Intel 10Gbe Interfaces

In addition, all of the previous Nodes will be migrated into this infrastructure and upgraded to 10Gbe networking (Once all VMs have been moved)

Ceph is managed on it's own, and uses dedicated MONs (5 of them) and is managed using Inktank's calamari toolset. Proxmox servers are NOT managing ceph.

All networking is connected using LACP + mLAG bonding across multiple switches for performance and redundancy.

After about 3 weeks of testing and tweaking, we have the new cluster working at the performance levels we wanted. We've not only been able to avg about 1.4GB/s performace on individual VMs (Both read and write), but have also been able to sustain over 600MB/s on as many as 8 VMs simultaneously. Needless to say, we have the cluster to a very useable state.

However, and here's the rub, we've been unable to get anything even remotely close to acceptable performance running backups or restores of VMs using this setup. We have a specific VM on current production that we backup every hour. On the 1Gig network, backing up to NFS storage, we get about 70MB/s and it takes just over 14 mins for the 64GB VM every time. However, when I put a copy of the same VM over on the Ceph-backed infrastructure, the best we can get is 12MB/s and it take almost 2 hours to run the same backup. It's not the network, as we've tested the same VM using local image storage, and were able to backup to 10Gbe NFS shares at over 700MB/s.

I've done quite a bit of research over the past few weeks, and have found that the issues seems to be isolated to how KVM reads 64kb blocks when performing it's internal backup functionality. I understand that this is not easily changed (even checked out the source myself and confirmed this is a BIG thing to change), but the 64kb reads are definitely the culprit. Ceph uses 4MB "blocks" to storage data, and each 64kb read is actually reading a 4MB block. (in very rough math, this means Ceph is reading 625GB to backup a 64GB VM). This is also decimating the IOPS situation, which in turn leads to the slowdown.

So, my question, after all the back-story, is this... Is there any way to resolve this issue? One of the tricks used to speed up much of the performance of ceph itself was to use read-ahead buffers. However, I cannot find any information as to how to turn these on for vzdump. I've also seen Tom from Proxmox themselves mention that he runs backups out of Ceph nightly. I'm curious, if he sees this, if he has any pointers that he uses to help with this situation. We currently backups about 30% of our VMs on a nightly basis at a minimum, however with the speed our testing has shown, this would be impossible using Ceph. Also to note, we have tested out using ceph's own tools for exporting and importing images, but since this can lead to data-corruption when run against running VMs, this is not a viable solution.

Any tips or assistance would be greatly appreciated. I'm at the point where my google searches are no longer turning up any new information and feel like I've read the entire internet by now ;-)

mir · Jan 5, 2015

If you have space locally on the nodes could you try this:
in /etc/vzdump.conf on each node add: tmpdir: /some/place/with/enough/space

When vzdump runs it by default assembles and compress the same place as the resulting backup will be. Using the above you can have the small read and writes on local storage and only have the writing in big chunks done over the network. Possible you should be able to gain up to 200% increase in speed.

NeRFaGe · Jan 5, 2015

mir,

I saw this post this morning and have been testing using this option. It "appears" that vzdump ignore this when running against KVM VMs, as opposed to VZ containers like in your example.

This is the new setup with 10Gbe and ceph -> 2 -> 10Gbe NFS:

Code:

root@Cloud01:/etc# vzdump 1062 --remove 0 --mode snapshot --compress lzo --storage CloudBackupNAS --node Cloud01 --tmpdir /var/lib/vz/vztmp/
INFO: starting new backup job: vzdump 1062 --tmpdir /var/lib/vz/vztmp/ --remove 0 --mode snapshot --compress lzo --node Cloud01 --storage CloudBackupNAS
INFO: Starting Backup of VM 1062 (qemu)
INFO: status = stopped
INFO: update VM 1062: -lock backup
INFO: backup mode: stop
INFO: bandwidth limit: 10000000 KB/s
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/CloudBackupNAS/dump/vzdump-qemu-1062-2015_01_05-12_18_22.vma.lzo'
INFO: starting kvm to execute backup task
INFO: started backup task 'be3a9016-69c0-44bc-b3ac-d5e0043a778a'
INFO: status: 0% (231604224/68719476736), sparse 0% (141385728), duration 3, 77/30 MB/s
INFO: status: 1% (700907520/68719476736), sparse 0% (190640128), duration 17, 33/30 MB/s
INFO: status: 2% (1394999296/68719476736), sparse 0% (213979136), duration 39, 31/30 MB/s
INFO: status: 3% (2085945344/68719476736), sparse 0% (253984768), duration 60, 32/30 MB/s
INFO: status: 4% (2767192064/68719476736), sparse 0% (390557696), duration 78, 37/30 MB/s
INFO: status: 5% (3455188992/68719476736), sparse 0% (423862272), duration 101, 29/28 MB/s
INFO: status: 6% (4158390272/68719476736), sparse 0% (506523648), duration 121, 35/31 MB/s
INFO: status: 7% (4817879040/68719476736), sparse 0% (570286080), duration 140, 34/31 MB/s
INFO: status: 8% (5533270016/68719476736), sparse 0% (625168384), duration 160, 35/33 MB/s
INFO: status: 9% (6209732608/68719476736), sparse 0% (663523328), duration 179, 35/33 MB/s
INFO: status: 10% (6898974720/68719476736), sparse 1% (825462784), duration 195, 43/32 MB/s

versus the output on the current production with 1Gbe and local -> 2 -> 1Gbe NFS:

Code:

vzdump 1062 --quiet 1 --mailto alerts@***.com --mode snapshot --compress lzo --storage CloudSnapshotNAS

1062: Jan 05 12:00:01 INFO: Starting Backup of VM 1062 (qemu)
1062: Jan 05 12:00:01 INFO: status = running
1062: Jan 05 12:00:01 INFO: update VM 1062: -lock backup
1062: Jan 05 12:00:01 INFO: backup mode: snapshot
1062: Jan 05 12:00:01 INFO: bandwidth limit: 85000 KB/s
1062: Jan 05 12:00:01 INFO: ionice priority: 7
1062: Jan 05 12:00:02 INFO: creating archive '/mnt/pve/CloudSnapshotNAS/dump/vzdump-qemu-1062-2015_01_05-12_00_01.vma.lzo'
1062: Jan 05 12:00:02 INFO: started backup task '2c2eb88c-b042-4a85-9b0c-b751cd51c3e0'
1062: Jan 05 12:00:05 INFO: status: 0% (262799360/68719476736), sparse 0% (147075072), duration 3, 87/38 MB/s
1062: Jan 05 12:00:10 INFO: status: 1% (697303040/68719476736), sparse 0% (190722048), duration 8, 86/78 MB/s
1062: Jan 05 12:00:18 INFO: status: 2% (1394606080/68719476736), sparse 0% (212951040), duration 16, 87/84 MB/s
1062: Jan 05 12:00:26 INFO: status: 3% (2091974656/68719476736), sparse 0% (254066688), duration 24, 87/82 MB/s
1062: Jan 05 12:00:34 INFO: status: 4% (2788229120/68719476736), sparse 0% (391897088), duration 32, 87/69 MB/s
1062: Jan 05 12:00:42 INFO: status: 5% (3477602304/68719476736), sparse 0% (423944192), duration 40, 86/82 MB/s
1062: Jan 05 12:00:50 INFO: status: 6% (4174905344/68719476736), sparse 0% (505896960), duration 48, 87/76 MB/s
1062: Jan 05 12:01:06 INFO: status: 7% (4878827520/68719476736), sparse 0% (573796352), duration 64, 43/39 MB/s
1062: Jan 05 12:01:14 INFO: status: 8% (5576130560/68719476736), sparse 0% (633937920), duration 72, 87/79 MB/s
1062: Jan 05 12:01:24 INFO: status: 9% (6240600064/68719476736), sparse 0% (671641600), duration 82, 66/62 MB/s
1062: Jan 05 12:01:34 INFO: status: 10% (6956974080/68719476736), sparse 1% (855592960), duration 92, 71/53 MB/s

and just for comparison's sake, here is the new setup (32GB VM) going from 10Gbe NFS -> 2 -> 10Gbe NFS (Same NFS server even, so you can see what I get even when its done inefficiently):

Code:

INFO: starting new backup job: vzdump 252022 --remove 0 --mode snapshot  --compress lzo --storage CloudCloneNAS --node Cloud02
INFO: Starting Backup of VM 252022 (qemu)
INFO: status = running
INFO: update VM 252022: -lock backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/CloudCloneNAS/dump/vzdump-qemu-252022-2015_01_05-12_28_40.vma.lzo'
INFO: started backup task '0f62ea08-fa1f-47c2-b684-a4efc8ef09f6'
INFO: status: 2% (859045888/34359738368), sparse 0% (209944576), duration 3, 286/216 MB/s
INFO: status: 4% (1607860224/34359738368), sparse 0% (259919872), duration 6, 249/232 MB/s
INFO: status: 7% (2470707200/34359738368), sparse 1% (429436928), duration 9, 287/231 MB/s
INFO: status: 9% (3162505216/34359738368), sparse 1% (435875840), duration 12, 230/228 MB/s
INFO: status: 11% (3804889088/34359738368), sparse 1% (457019392), duration 15, 214/207 MB/s

As you can see, ceph -> 2 -> well, anything
is proving to be very disappointing

Tested both with and without the tmpdir setting, and it avgs 33MB/s both ways. Same for switching to gzip or uncompressed as well.

mir said:
If you have space locally on the nodes could you try this:
in /etc/vzdump.conf on each node add: tmpdir: /some/place/with/enough/space

mir · Jan 5, 2015

Oh, I read it as a problem with CT's only. Sorry for getting that wrong. What about jumbo frames, are you using that?

wahmed · Jan 5, 2015

I understand the setup you have. But one thing is not clear what you meant by Ceph backend backup store? Did you setup a NFS storage on top of Ceph RBD? Or are you using CephFS?

After rereading the post i think i got it what you are saying. You meant when VMs are on local storage, backup to NFS storage occurs faster than when VMs are on Ceph storage?

NeRFaGe · Jan 5, 2015

Yes, jumbo frames are configured and in use on all devices.

Using flat file testing, I can hit the limit of the test NFS server's storage write IO (about 255MB/s on a good day) before I hit the max speed of the network (12.5 GB/s on the ceph nodes, and 5GB/s on the proxmox nodes). I've seen as high as 3GB/s during artificial testing already, with VMs showing as high as 1.4GB of IO to ceph-backed storage in optimal conditions.

The "magic bullet" here would be a way to manipulate the read-ahead caching for rbd storage when its qemu doing the reading directly. I've found various posts all over the internet about the /sys/block variables that control that for the kernel. But when qemu does the reads, there's no block device to be able to change that variable for. There was a thread in late October that I've found on multiple sites (http://comments.gmane.org/gmane.linux.pve.devel/9246) that speaks about a patch to the librbd driver allowing for buffer variables to be manipulated. I can see the code in git, but it doesn't appear to have been merged yet due to a conflict (https://github.com/ceph/ceph/commit/a9f302da08ab96128b28d85e2f566ad0f2ba2f30). I tried out the config options that were added (See below), but they did not appear to be recognized.

Code:

[TABLE="class: diff-table tab-size-8"]
[TR]
[TD="class: blob-code blob-code-addition is-hovered"]+OPTION(rbd_readahead_trigger_requests, OPT_INT, 10) // number of sequential requests necessary to trigger readahead[/TD]
[/TR]
[TR]
[TD="class: blob-code blob-code-addition"]+OPTION(rbd_readahead_max_bytes, OPT_LONGLONG, 512 * 1024) // set to 0 to disable readahead[/TD]
[/TR]
[TR]
[TD="class: blob-code blob-code-addition"]+OPTION(rbd_readahead_disable_after_bytes, OPT_LONGLONG, 50 * 1024 * 1024) // how many bytes are read in total before readahead is disabled[/TD]
[/TR]
[/TABLE]

The conversation in that thread seems to have just died. There was a more recent post asking for any updates, but with no response.

mir said:
Oh, I read it as a problem with CT's only. Sorry for getting that wrong. What about jumbo frames, are you using that?

NeRFaGe · Jan 5, 2015

symmcom said:
I understand the setup you have. But one thing is not clear what you meant by Ceph backend backup store? Did you setup a NFS storage on top of Ceph RBD? Or are you using CephFS?

After rereading the post i think i got it what you are saying. You meant when VMs are on local storage, backup to NFS storage occurs faster than when VMs are on Ceph storage?

Yeah, you got it. I have multiple storage options that I've been using for testing. The specific hardware I listed in the original post is what is running the cloud, or at least what was intended for that purpose. In the process of testing ceph, I have also tested various other possible alternatives including straight NFS, DRDB, and gluster. Right now, the primary hardware IS configured as seen in the first post. 5 proxmox nodes, 4 with 40Gbit, and 1 with 20 Gbit, and then 4 ceph storage nodes, each with 24x 1TB SSDs and 100Gbit. (All of the MON, admin, Calamari, and MDS nodes for ceph are housed separately and are all 10Gbit) I also have a separate server I've deployed for testing backups specifically that's running hardware similar to what we use for the 1st tier of our DR hierarchy. Spindle HDs, RAID5 with spares, and 40Gbit (What I had spare). Running NFS since it pairs so well with proxmox. The backups that are at issue are specifically the cpeh->2->anything through vzdump. Local to NFS or even NFS to NFS work fine, and are able to max out the spindle drives before the network. IO for both read and write for VMs running on ceph is excellent. Its just the vzdump backups out of ceph...

mir · Jan 5, 2015

NeRFaGe said:
The "magic bullet" here would be a way to manipulate the read-ahead caching for rbd storage when its qemu doing the reading directly. I've found various posts all over the internet about the /sys/block variables that control that for the kernel. But when qemu does the reads, there's no block device to be able to change that variable for. There was a thread in late October that I've found on multiple sites (http://comments.gmane.org/gmane.linux.pve.devel/9246) that speaks about a patch to the librbd driver allowing for buffer variables to be manipulated. I can see the code in git, but it doesn't appear to have been merged yet due to a conflict (https://github.com/ceph/ceph/commit/a9f302da08ab96128b28d85e2f566ad0f2ba2f30). I tried out the config options that were added (See below), but they did not appear to be recognized.

Code:

[TABLE="class: diff-table tab-size-8"] [TR] [TD="class: blob-code blob-code-addition is-hovered"]+OPTION(rbd_readahead_trigger_requests, OPT_INT, 10) // number of sequential requests necessary to trigger readahead[/TD] [/TR] [TR] [TD="class: blob-code blob-code-addition"]+OPTION(rbd_readahead_max_bytes, OPT_LONGLONG, 512 * 1024) // set to 0 to disable readahead[/TD] [/TR] [TR] [TD="class: blob-code blob-code-addition"]+OPTION(rbd_readahead_disable_after_bytes, OPT_LONGLONG, 50 * 1024 * 1024) // how many bytes are read in total before readahead is disabled[/TD] [/TR] [/TABLE]

The conversation in that thread seems to have just died. There was a more recent post asking for any updates, but with no response.

Have you seen this (http://www.sebastien-han.fr/blog/2012/11/15/make-your-rbd-fly-with-flashcache/)? it seems to be the solution to your problem

NeRFaGe · Jan 5, 2015

I've actually read just about everything that Mr. Han has written so far. Many of his tweaks and hints are what led to the performance we can get out of ceph today.

Allow me to clarify the issue further:
When vzdump runs a backup of a running VM, it uses qemu directly, and through it, the specific 'backup' functionality therein. In doing so, qemu then hops through the rbd driver to start grabbing the data. The native block size that qemu uses for this functionality is 64kb. So for EACH 64kb request, ceph is reading in 4MB (it's block size) and returning the 64kb to qemu. This is HUGELY inefficient, and leads to exceptionally slow reads. This behavior can actually be recreated quite easily, and Mr. Han himself even has a few articles about this. For random reads, its usually a good idea to have the random data cached in a more near-line, high-speed format, such as flash or even RAM. In the case of sequential reads, however, read-ahead cache can be used to great effect. In proxmox, we have the cache=writeback option for storage, which enables the rbd write-cache, and inside each VM, it's simple to change /sys/block/vda/queue/read_ahead_kb to 16MB (What our testing has shown to be the sweet spot). The issue is that the read-ahead cache, something that would allow backups to move at wire-speed out of ceph (because each read would use the full 4MB, not discard 99% of it only to read it again), is enabled on a per-instance basis. When running a VM, the conf file enables the write-back, and the kernel inside the VM handles the read ahead. But on the proxmox host, there is no block device to modify the read ahead cache. So qemu will use the kernel default, which is 128kb. A minimum of 4MB is necessary for useable performance.

Please keep in mind, I'm fully aware that there might not be a good answer to this question at the moment. When I saw your post about the temp folder location this morning, I started testing it out immediately. Sadly, I've found that it doesn't seem to matter. Since the CPU on the host is more up to the task of compression on the fly, and the NFS target is faster than the ceph source, the temp location doesn't matter. The bottleneck is still qemu reading directly from ceph. Then I saw Tom's comment about using ceph @ ProxMox themselves and running nightly backups, and had hoped maybe he'd have some insight into the situation.

Also, while your specific link seems to go to a 404 article, I do also want to mention that we're aware of the new concept in ceph of cache pools. The issue there is that you tier frequently accessed data on faster storage, such as SSDs. However, we're already 100% solid-state for any non-DR storage. In addition, backups access 100% of the data, so we'd have to have everything in cache, which by going 100% solid-state, is sort of what we're already doing.

mir · Jan 5, 2015

The was broken since the bb inserted )? add the end or the url. Use this instead: http://www.sebastien-han.fr/blog/2012/11/15/make-your-rbd-fly-with-flashcache/ see also http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Flashcache

Backup performance also suffers from the fact that current qemu is single treaded which means read and writes cannot be separated.

The read_ahead_kb seems to be included in a later release of ceph -> http://ceph-users.ceph.narkive.com/PSJmb2fO/large-reads-become-512-kbyte-reads-on-qemu-kvm-rbd

Maybe you could roll your own kernel on the ceph nodes or try to find out which kernel exactly has this patch: https://lkml.org/lkml/2014/9/6/123

wahmed · Jan 6, 2015

Here is my take on the issue. I do not think you are alone on this. The backup numbers you have mentioned thats pretty much what i get in my environment for backup Ceph to NFS storage. I have few Proxmox+Ceph environments running but i will use the one with 4 Ceph nodes as an example.

Code:

10001025: Dec 27 23:30:01 INFO: Starting Backup of VM 10001025 (qemu)
10001025: Dec 27 23:30:01 INFO: status = running
10001025: Dec 27 23:30:02 INFO: update VM 10001025: -lock backup
10001025: Dec 27 23:30:02 INFO: backup mode: snapshot
10001025: Dec 27 23:30:02 INFO: ionice priority: 7
10001025: Dec 27 23:30:02 INFO: creating archive '/mnt/pve/backup-vm-01/dump/vzdump-qemu-10001025-2014_12_27-23_30_01.vma.lzo'
10001025: Dec 27 23:30:02 INFO: started backup task '1ed6226e-e05a-419c-aa80-62ca88ee4616'
10001025: Dec 27 23:30:05 INFO: status: 0% (114032640/1138166333440), sparse 0% (19484672), duration 3, 38/31 MB/s
10001025: Dec 27 23:33:23 INFO: status: 1% (11396775936/1138166333440), sparse 0% (5433204736), duration 201, 56/29 MB/s
10001025: Dec 27 23:34:19 INFO: status: 2% (23112974336/1138166333440), sparse 1% (16407011328), duration 257, 209/13 MB/s
10001025: Dec 27 23:34:49 INFO: status: 3% (34268708864/1138166333440), sparse 2% (27197419520), duration 287, 371/12 MB/s
10001025: Dec 27 23:35:09 INFO: status: 4% (45838696448/1138166333440), sparse 3% (38631428096), duration 307, 578/6 MB/s
10001025: Dec 27 23:35:28 INFO: status: 5% (58067189760/1138166333440), sparse 4% (50632568832), duration 326, 643/11 MB/s
10001025: Dec 27 23:37:37 INFO: status: 6% (68329865216/1138166333440), sparse 4% (56766509056), duration 455, 79/32 MB/s
10001025: Dec 27 23:42:50 INFO: status: 7% (79694462976/1138166333440), sparse 4% (56797179904), duration 768, 36/36 MB/s
10001025: Dec 27 23:48:39 INFO: status: 8% (91056504832/1138166333440), sparse 4% (56831082496), duration 1117, 32/32 MB/s
10001025: Dec 27 23:50:17 INFO: status: 9% (103335526400/1138166333440), sparse 5% (66157400064), duration 1215, 125/30 MB/s
10001025: Dec 27 23:50:29 INFO: status: 10% (114140446720/1138166333440), sparse 6% (76960485376), duration 1227, 900/0 MB/s
[COLOR=#222222][FONT=Verdana]

[/FONT][/COLOR]

Code:

2141: Dec 22 23:30:01 INFO: Starting Backup of VM 2141 (qemu)
2141: Dec 22 23:30:01 INFO: status = running
2141: Dec 22 23:30:02 INFO: update VM 2141: -lock backup
2141: Dec 22 23:30:02 INFO: backup mode: snapshot
2141: Dec 22 23:30:02 INFO: ionice priority: 7
2141: Dec 22 23:30:02 INFO: snapshots found (not included into backup)
2141: Dec 22 23:30:02 INFO: creating archive '/mnt/pve/backup-vm-01/dump/vzdump-qemu-2141-2014_12_22-23_30_01.vma.lzo'
2141: Dec 22 23:30:02 INFO: started backup task 'c60970e4-bc1d-42d2-92b0-b937afdcee4a'
2141: Dec 22 23:30:05 INFO: status: 0% (258473984/128849018880), sparse 0% (138448896), duration 3, 86/40 MB/s
2141: Dec 22 23:30:27 INFO: status: 1% (1315176448/128849018880), sparse 0% (150519808), duration 25, 48/47 MB/s
2141: Dec 22 23:31:12 INFO: status: 2% (2624258048/128849018880), sparse 0% (162627584), duration 70, 29/28 MB/s
2141: Dec 22 23:31:48 INFO: status: 3% (3878617088/128849018880), sparse 0% (203055104), duration 106, 34/33 MB/s
2141: Dec 22 23:32:33 INFO: status: 4% (5159714816/128849018880), sparse 0% (239067136), duration 151, 28/27 MB/s
2141: Dec 22 23:33:20 INFO: status: 5% (6507724800/128849018880), sparse 0% (258068480), duration 198, 28/28 MB/s
2141: Dec 22 23:33:48 INFO: status: 6% (7738687488/128849018880), sparse 0% (264204288), duration 226, 43/43 MB/s
2141: Dec 22 23:34:29 INFO: status: 7% (9034399744/128849018880), sparse 0% (305381376), duration 267, 31/30 MB/s
2141: Dec 22 23:34:56 INFO: status: 8% (10341580800/128849018880), sparse 0% (339173376), duration 294, 48/47 MB/s
2141: Dec 22 23:35:47 INFO: status: 9% (11660886016/128849018880), sparse 0% (418578432), duration 345, 25/24 MB/s
2141: Dec 22 23:36:18 INFO: status: 10% (12914196480/128849018880), sparse 0% (453038080), duration 376, 40/39 MB/s
[FONT=Verdana]

Code:

[/FONT]
2182: Oct 20 23:47:03 INFO: Starting Backup of VM 2182 (qemu)
2182: Oct 20 23:47:03 INFO: status = running
2182: Oct 20 23:47:04 INFO: update VM 2182: -lock backup
2182: Oct 20 23:47:13 INFO: backup mode: snapshot
2182: Oct 20 23:47:13 INFO: ionice priority: 7
2182: Oct 20 23:47:13 INFO: snapshots found (not included into backup)
2182: Oct 20 23:47:13 INFO: creating archive '/mnt/pve/nfs-backup-01/dump/vzdump-qemu-2182-2014_10_20-23_47_03.vma.lzo'
2182: Oct 20 23:47:13 INFO: started backup task '577fdac1-4912-4bdb-bea3-b4d662fde784'
2182: Oct 20 23:47:16 INFO: status: 0% (123076608/96636764160), sparse 0% (53841920), duration 3, 41/23 MB/s
2182: Oct 20 23:47:38 INFO: status: 1% (990314496/96636764160), sparse 0% (70758400), duration 25, 39/38 MB/s
2182: Oct 20 23:48:04 INFO: status: 2% (1956773888/96636764160), sparse 0% (82042880), duration 51, 37/36 MB/s
2182: Oct 20 23:48:32 INFO: status: 3% (2903506944/96636764160), sparse 0% (92704768), duration 79, 33/33 MB/s
2182: Oct 20 23:49:02 INFO: status: 4% (3870621696/96636764160), sparse 0% (110886912), duration 109, 32/31 MB/s
2182: Oct 20 23:49:36 INFO: status: 5% (4851892224/96636764160), sparse 0% (118964224), duration 143, 28/28 MB/s
2182: Oct 20 23:50:27 INFO: status: 6% (5799084032/96636764160), sparse 0% (125448192), duration 194, 18/18 MB/s
2182: Oct 20 23:51:08 INFO: status: 7% (6769016832/96636764160), sparse 0% (144470016), duration 235, 23/23 MB/s
2182: Oct 20 23:51:40 INFO: status: 8% (7746617344/96636764160), sparse 0% (147714048), duration 267, 30/30 MB/s
2182: Oct 20 23:52:23 INFO: status: 9% (8724480000/96636764160), sparse 0% (182075392), duration 310, 22/21 MB/s
2182: Oct 20 23:52:57 INFO: status: 10% (9702277120/96636764160), sparse 0% (223440896), duration 344, 28/27 MB/s
[FONT=Verdana]

Obvously in the first snippet of the backup lot of cache happening thats why large numbers. But thats also the VM with biggest disk image we have at 1TB with only 45% full. But the 2nd and 3rd snippets are what we get in avg for all Ceph > NFS backup. I tried many configuration tweak but nothing really increased any more than this. The environment has 4 Proxmox nodes, 4 Ceph nodes with 20 OSDs (2 replicas, 1024PG, Standard Segate HDD) on dedicated 1 GB network for backup only. The performance was slightly lower. After changing RBD Cache in ceph.conf performance improved slightly. We dont have as many VMs as your environment but about 75 VMs. Ceph has its own 20GB dedicated network for cluster and another 20GB network for Ceph Public network. Below is the Ceph health status during backup:[/FONT]

The link mir provided is missing, but i read it when it was available few months ago. I personally never tried it since i did not want to depend on 1 or 2 SSDs for caching. Mainly because this ceph cluster is destined to grow far beyond 20 OSDs. With higher number of OSDs, journal on SSD is not recommended.

Search

Search

Tweaking VZDump with Ceph Backend

NeRFaGe

New Member

mir

Famous Member

NeRFaGe

New Member

mir

Famous Member

wahmed

Famous Member

NeRFaGe

New Member

NeRFaGe

New Member

mir

Famous Member

NeRFaGe

New Member

mir

Famous Member

wahmed

Famous Member

We value your privacy