VZDump slow on ceph images, RBD export fast

@symmcom of course they are stored on ceph connected with rbd and not krbd.
I just wanted to make the back up on the cephfs mount.
 
are there any news on this?
I use CephFS from my datacenter provider and the backup is really slow.

Code:
INFO: status: 65% (81750065152/125627793408), sparse 33% (41490862080), duration 1458, read/write 28/25 MB/s
INFO: status: 66% (82932531200/125627793408), sparse 33% (41494974464), duration 1497, read/write 30/30 MB/s
INFO: status: 67% (84198359040/125627793408), sparse 33% (41635176448), duration 1542, read/write 28/25 MB/s
INFO: status: 68% (85440004096/125627793408), sparse 33% (41642319872), duration 1589, read/write 26/26 MB/s
INFO: status: 69% (86719594496/125627793408), sparse 33% (41780776960), duration 1630, read/write 31/27 MB/s
 
For all: yesterday i installed an update for pve-qemu-kvm and now the backup runs much better.
 
For all: yesterday i installed an update for pve-qemu-kvm and now the backup runs much better.

yes, there were some improvements in pve-qemu-kvm (4.0.1-5) - fix a backup speed regression with disks on ceph
 
i dont know why but now its the same as before:
Code:
INFO: status: 40% (107453874176/268435456000), sparse 0% (1327009792), duration 3341, read/write 18/18 MB/s
INFO: status: 41% (110167588864/268435456000), sparse 0% (1346469888), duration 3410, read/write 39/39 MB/s
INFO: status: 42% (112784834560/268435456000), sparse 0% (1372262400), duration 3640, read/write 11/11 MB/s
INFO: status: 43% (115448217600/268435456000), sparse 0% (1411354624), duration 3738, read/write 27/26 MB/s
INFO: status: 44% (118229041152/268435456000), sparse 0% (1423388672), duration 3875, read/write 20/20 MB/s
INFO: status: 45% (120900812800/268435456000), sparse 0% (1440010240), duration 4048, read/write 15/15 MB/s
 
no, my backups are fast. your current 'pveversion -v'?
Code:
root@pve-n1:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-5-pve)
pve-manager: 6.0-15 (running version: 6.0-15/52b91481)
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-4-pve: 5.0.21-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.21-2-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-12
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12+deb10u1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.0-9
pve-container: 3.0-14
pve-docs: 6.0-9
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-8
pve-firmware: 3.0-4
pve-ha-manager: 3.0-5
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-1
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-1
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
root@pve-n1:~#
 
ceph-fuse: 12.2.11+dfsg1-2.1+b1
It seems the host is using the stock Ceph packages from Debian. You can try to install ceph-fuse 14.2.4, from our repository (pveceph install). That might already bring some improvement. But this will move the client compatibility to Luminous, so a cluster prior to it may not be accessed anymore.

EDIT: The assumption is, that the node is running as a client.
 
It seems the host is using the stock Ceph packages from Debian. You can try to install ceph-fuse 14.2.4, from our repository (pveceph install). That might already bring some improvement. But this will move the client compatibility to Luminous, so a cluster prior to it may not be accessed anymore.

EDIT: The assumption is, that the node is running as a client.

Hi, i think this doesnt matter or? I dont use ceph on my proxmox nodes.
I have rent some ceph space from my datacenter provider.
 
i dont know why but now its the same as before:
Code:
INFO: status: 40% (107453874176/268435456000), sparse 0% (1327009792), duration 3341, read/write 18/18 MB/s
INFO: status: 41% (110167588864/268435456000), sparse 0% (1346469888), duration 3410, read/write 39/39 MB/s
INFO: status: 42% (112784834560/268435456000), sparse 0% (1372262400), duration 3640, read/write 11/11 MB/s
INFO: status: 43% (115448217600/268435456000), sparse 0% (1411354624), duration 3738, read/write 27/26 MB/s
INFO: status: 44% (118229041152/268435456000), sparse 0% (1423388672), duration 3875, read/write 20/20 MB/s
INFO: status: 45% (120900812800/268435456000), sparse 0% (1440010240), duration 4048, read/write 15/15 MB/s

your sparse value is different than previous log. (0% vs 33%).

Also, to have last pve-qemu-kvm patch enable, you need to stop/start the vm after package upgrade
 
your sparse value is different than previous log. (0% vs 33%).

Also, to have last pve-qemu-kvm patch enable, you need to stop/start the vm after package upgrade

what does this mean with the sparse?
all vms are recently restarted (migration, etc.)
 
Hi everybody,

i installed pveceph and these are the results for a single backup:
Code:
INFO: starting new backup job: vzdump 105 --remove 0 --mode snapshot --storage storage01-backup --compress lzo --node pve-n2
INFO: Starting Backup of VM 105 (qemu)
INFO: Backup started at 2019-12-03 10:44:16
INFO: status = running
INFO: update VM 105: -lock backup
INFO: VM Name: prod-xyz
INFO: include disk 'scsi0' 'fs-ceph:vm-105-disk-0' 100G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/storage01-backup/dump/vzdump-qemu-105-2019_12_03-10_44_16.vma.lzo'
INFO: started backup task 'c1867ad8-b12b-401b-9b6c-dcf64ab4311e'
INFO: status: 0% (545259520/107374182400), sparse 0% (203870208), duration 3, read/write 181/113 MB/s
INFO: status: 1% (1107296256/107374182400), sparse 0% (223576064), duration 7, read/write 140/135 MB/s
INFO: status: 2% (2155872256/107374182400), sparse 0% (436531200), duration 28, read/write 49/39 MB/s
INFO: status: 3% (3225419776/107374182400), sparse 0% (530411520), duration 52, read/write 44/40 MB/s
INFO: status: 4% (4399824896/107374182400), sparse 0% (671240192), duration 67, read/write 78/68 MB/s
INFO: status: 5% (5398069248/107374182400), sparse 0% (802390016), duration 96, read/write 34/29 MB/s
INFO: status: 6% (6534725632/107374182400), sparse 0% (835182592), duration 120, read/write 47/45 MB/s
INFO: status: 7% (7528775680/107374182400), sparse 0% (953671680), duration 136, read/write 62/54 MB/s
INFO: status: 8% (8589934592/107374182400), sparse 0% (954126336), duration 157, read/write 50/50 MB/s
INFO: status: 9% (9718202368/107374182400), sparse 1% (1095553024), duration 188, read/write 36/31 MB/s
INFO: status: 10% (10875830272/107374182400), sparse 1% (1242320896), duration 213, read/write 46/40 MB/s
INFO: status: 11% (11874074624/107374182400), sparse 1% (1442607104), duration 230, read/write 58/46 MB/s
INFO: status: 12% (13006536704/107374182400), sparse 1% (1531453440), duration 257, read/write 41/38 MB/s
INFO: status: 13% (14008975360/107374182400), sparse 1% (1721765888), duration 274, read/write 58/47 MB/s
INFO: status: 14% (15179186176/107374182400), sparse 1% (1870340096), duration 312, read/write 30/26 MB/s
INFO: status: 15% (16240345088/107374182400), sparse 1% (2115837952), duration 341, read/write 36/28 MB/s
INFO: status: 16% (17230200832/107374182400), sparse 2% (2203607040), duration 367, read/write 38/34 MB/s
INFO: status: 17% (18333302784/107374182400), sparse 2% (2379190272), duration 387, read/write 55/46 MB/s
INFO: status: 18% (19339935744/107374182400), sparse 2% (2499043328), duration 408, read/write 47/42 MB/s
INFO: status: 19% (20451426304/107374182400), sparse 2% (2728407040), duration 439, read/write 35/28 MB/s
INFO: status: 20% (21520973824/107374182400), sparse 2% (2844102656), duration 465, read/write 41/36 MB/s
INFO: status: 21% (22569549824/107374182400), sparse 2% (2996977664), duration 490, read/write 41/35 MB/s
INFO: status: 22% (23664263168/107374182400), sparse 2% (3003740160), duration 531, read/write 26/26 MB/s
INFO: status: 23% (24767365120/107374182400), sparse 2% (3136299008), duration 556, read/write 44/38 MB/s
INFO: status: 24% (25773998080/107374182400), sparse 3% (3250241536), duration 581, read/write 40/35 MB/s
INFO: status: 25% (26898071552/107374182400), sparse 3% (3473526784), duration 608, read/write 41/33 MB/s
INFO: status: 26% (28106031104/107374182400), sparse 3% (3635580928), duration 651, read/write 28/24 MB/s
INFO: status: 27% (29095886848/107374182400), sparse 3% (3842043904), duration 675, read/write 41/32 MB/s
INFO: status: 28% (30064771072/107374182400), sparse 3% (3955834880), duration 697, read/write 44/38 MB/s
INFO: status: 29% (31226593280/107374182400), sparse 3% (4177711104), duration 716, read/write 61/49 MB/s
INFO: status: 30% (32333889536/107374182400), sparse 3% (4282781696), duration 751, read/write 31/28 MB/s
INFO: status: 31% (33374076928/107374182400), sparse 4% (4488450048), duration 773, read/write 47/37 MB/s
INFO: status: 32% (34435235840/107374182400), sparse 4% (4698583040), duration 791, read/write 58/47 MB/s
INFO: status: 33% (35500589056/107374182400), sparse 4% (4832219136), duration 812, read/write 50/44 MB/s
INFO: status: 34% (37232836608/107374182400), sparse 5% (6364786688), duration 822, read/write 173/19 MB/s
INFO: status: 35% (38075891712/107374182400), sparse 6% (6957400064), duration 830, read/write 105/31 MB/s
INFO: status: 36% (39615201280/107374182400), sparse 7% (8241082368), duration 833, read/write 513/85 MB/s
INFO: status: 38% (41511026688/107374182400), sparse 9% (10110799872), duration 836, read/write 631/8 MB/s
INFO: status: 40% (43679481856/107374182400), sparse 11% (12031664128), duration 839, read/write 722/82 MB/s
INFO: status: 42% (45218791424/107374182400), sparse 12% (13524668416), duration 845, read/write 256/7 MB/s
INFO: status: 44% (47978643456/107374182400), sparse 14% (15944544256), duration 849, read/write 689/84 MB/s
INFO: status: 46% (50176458752/107374182400), sparse 16% (17882132480), duration 852, read/write 732/86 MB/s
INFO: status: 47% (50549751808/107374182400), sparse 16% (17882136576), duration 855, read/write 124/124 MB/s
INFO: status: 48% (52290387968/107374182400), sparse 17% (19297677312), duration 858, read/write 580/108 MB/s
INFO: status: 51% (55113154560/107374182400), sparse 20% (21874999296), duration 861, read/write 940/81 MB/s
INFO: status: 54% (58523123712/107374182400), sparse 23% (25162457088), duration 864, read/write 1136/40 MB/s
INFO: status: 57% (61664657408/107374182400), sparse 26% (28131336192), duration 867, read/write 1047/57 MB/s
INFO: status: 62% (67494739968/107374182400), sparse 31% (33956704256), duration 870, read/write 1943/1 MB/s
INFO: status: 64% (69503811584/107374182400), sparse 33% (35965771776), duration 884, read/write 143/0 MB/s
INFO: status: 70% (75208065024/107374182400), sparse 38% (41661628416), duration 887, read/write 1901/2 MB/s
INFO: status: 73% (78664171520/107374182400), sparse 41% (44952170496), duration 890, read/write 1152/55 MB/s
INFO: status: 76% (82304827392/107374182400), sparse 45% (48427651072), duration 893, read/write 1213/55 MB/s
INFO: status: 77% (82715869184/107374182400), sparse 45% (48431181824), duration 896, read/write 137/135 MB/s
INFO: status: 78% (83823165440/107374182400), sparse 45% (48434089984), duration 904, read/write 138/138 MB/s
INFO: status: 79% (84850769920/107374182400), sparse 45% (48553353216), duration 911, read/write 146/129 MB/s
INFO: status: 80% (85911928832/107374182400), sparse 45% (48554516480), duration 942, read/write 34/34 MB/s
INFO: status: 81% (87069556736/107374182400), sparse 45% (48664289280), duration 988, read/write 25/22 MB/s
INFO: status: 82% (88080384000/107374182400), sparse 45% (48667541504), duration 1018, read/write 33/33 MB/s
INFO: status: 83% (89128960000/107374182400), sparse 45% (48791355392), duration 1058, read/write 26/23 MB/s
INFO: status: 84% (90282393600/107374182400), sparse 45% (48797483008), duration 1135, read/write 14/14 MB/s
INFO: status: 85% (91297415168/107374182400), sparse 45% (48953266176), duration 1159, read/write 42/35 MB/s
INFO: status: 86% (92417294336/107374182400), sparse 45% (48970891264), duration 1214, read/write 20/20 MB/s
INFO: status: 87% (93444898816/107374182400), sparse 45% (49145905152), duration 1242, read/write 36/30 MB/s
INFO: status: 88% (94556389376/107374182400), sparse 45% (49244459008), duration 1285, read/write 25/23 MB/s
INFO: status: 89% (95667879936/107374182400), sparse 46% (49496989696), duration 1325, read/write 27/21 MB/s
INFO: status: 90% (96762593280/107374182400), sparse 46% (49644474368), duration 1379, read/write 20/17 MB/s
INFO: status: 91% (97748254720/107374182400), sparse 46% (49804165120), duration 1391, read/write 82/68 MB/s
INFO: status: 92% (98863939584/107374182400), sparse 46% (49950924800), duration 1431, read/write 27/24 MB/s
INFO: status: 93% (99946070016/107374182400), sparse 46% (50186657792), duration 1462, read/write 34/27 MB/s
INFO: status: 94% (101019811840/107374182400), sparse 46% (50333724672), duration 1489, read/write 39/34 MB/s
INFO: status: 95% (102076776448/107374182400), sparse 47% (50546696192), duration 1534, read/write 23/18 MB/s
INFO: status: 96% (103091798016/107374182400), sparse 47% (50653143040), duration 1581, read/write 21/19 MB/s
INFO: status: 97% (104173928448/107374182400), sparse 47% (50809360384), duration 1600, read/write 56/48 MB/s
INFO: status: 98% (105251864576/107374182400), sparse 47% (50855858176), duration 1623, read/write 46/44 MB/s
INFO: status: 99% (106388520960/107374182400), sparse 47% (50904518656), duration 1648, read/write 45/43 MB/s
INFO: status: 100% (107374182400/107374182400), sparse 47% (50943602688), duration 1665, read/write 57/55 MB/s
INFO: transferred 107374 MB in 1665 seconds (64 MB/s)
INFO: archive file size: 38.78GB
INFO: Finished Backup of VM 105 (00:27:55)
INFO: Backup finished at 2019-12-03 11:12:11
INFO: Backup job finished successfully
TASK OK

What do you think about it?
 
i installed pveceph and these are the results for a single backup:
Do you mean pveceph install to get to a new Ceph version? If this is the case then you either need to migrate the VM off and back. Or shutdown and start it. After that Qemu will use the newer library.

What do you think about it?
You should us a backup report before. In my previous post I talked about a rados bench, to benchmark the Ceph pool.
https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark

And yet again, ask your provider.
 
Do you mean pveceph install to get to a new Ceph version? If this is the case then you either need to migrate the VM off and back. Or shutdown and start it. After that Qemu will use the newer library.
yes i did a pveceph install and migrated my machines to the node with the new library.


You should us a backup report before. In my previous post I talked about a rados bench, to benchmark the Ceph pool.
https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark
this is the log output from the backup task last night:
backup log

How can i perform a rados bench on external storage?

And yet again, ask your provider.
What should i ask them?
 
My few Cent on this.. Just getting like 300MB/s w/ rbd export, and 10-30MB/s w/ vzdump. Snapshot backups even bring my guest to their knees. Total disaster. What do I miss here? (Latest dist-upgrade, 3-node Epyc Cluster, All-10g)
 
My few Cent on this.. Just getting like 300MB/s w/ rbd export, and 10-30MB/s w/ vzdump. Snapshot backups even bring my guest to their knees. Total disaster. What do I miss here? (Latest dist-upgrade, 3-node Epyc Cluster, All-10g)
rbd export is faster because it's reading bigger block size with parallelism (bigger queue depth, less ios, but bigger).
proxmox backup can only backup small block 1 by 1 . (don't remember, 4k ou 64k max). That's mean more ios, but smaller and queue depth=1

The difference between both is the latency (network latency + cpu speed on ceph host + cpu speed on client).


(BTW, I'm full ceph, I'm using rbd export/import for backup to another ceph cluster.
code is available here:
https://github.com/JackSlateur/backurne
)
 
  • Like
Reactions: ozdjh
Hey @spirit, my comment related to the "bringing his vm guests to their knees" which is the problem we are seeing with backups to nfs shares. CPU in the vms goes through the roof and they drop off the network when they are being backed up.

With all the trouble we've been seeing backing up to NFS and CIFS volumes during our proxmox evaluation I started looking at rbd exports today. Does your code only work if it's writing to another ceph volume? I'd like to export to local storage.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!