Backup slowing down to a crawl

writethrough
This mode causes the hypervisor to interact with the disk image file or block device with O_DSYNC semantics. Writes are reported as completed only when the data has been committed to the storage device. The host page cache is used in what can be termed a writethrough caching mode. The guest's virtual storage adapter is informed that there is no writeback cache, so the guest would not need to send down flush commands to manage data integrity. The storage behaves as if there is a writethrough cache.
Syncs after write, holds writethrough cache.

directsync
This mode causes the hypervisor to interact with the disk image file or block device with both O_DSYNC and O_DIRECT semantics. This means, writes are reported as completed only when the data has been committed to the storage device, and when it is also desirable to bypass the host page cache. Like writethrough, it is helpful to guests that do not send flushes when needed. It was the last cache mode added, completing the possible combinations of caching and direct access semantics.
Syncs after each block written, no cache.

VMA is the archive format, vzdump is the tool. It uses qemu to backup, details can be found here.
https://git.proxmox.com/?p=pve-qemu.git;a=blob_plain;f=backup.txt;hb=HEAD
 
Thank you Alwin. It's not that I can't read. In fact, I had read the things you quote and link already months ago. Nowhere does it state that directsync does not work with the Proxmox backup process.

As I understand it, PVE has an enhanced version qemu to implement the backup process as described. Is that correct? Where is it documented that this does not work with cache=directsync?

Stefan
 
It is completely simple,
if you have a slow system(slow is relative) and use a mode what increase the overhead
and then do it twice you can't image it works fast.

It is normal that the backup performance is slow.
 
But my system is not slow. All my backups where the cache mode is not directsync have >50MB/s speed.

This backup is not "slow", it stalls. Absolutely no activity and no load for hours. Of course I know that any bottleneck will slow down the backup process. The problem with that theory is that there is no bottleneck, at least not on the hardware. No CPU usage, no disk usage, no network usage.

Do you really mean waiting 3 days for 30% of a backup that usually completes in less than one hour is "working as intended"? Yes, I can certainly try again to see if it ever completes, but how many days/weeks of waiting for it to finish do you think is acceptable until it's a bug?

For comparison with the log in my post #17, here's a log with cache=writethrough:
Code:
INFO: Starting Backup of VM 111 (qemu)
INFO: status = running
INFO: update VM 111: -lock backup
INFO: include disk 'scsi0' 'rbd_ct:vm-111-disk-1' 140G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/backup01/dump/vzdump-qemu-111-2018_07_23-22_00_01.vma.gz'
INFO: started backup task 'b3c9ba78-2a87-4884-92a1-49be54194c4c'
INFO: status: 0% (152043520/150323855360), sparse 0% (22716416), duration 3, read/write 50/43 MB/s
INFO: status: 1% (1512833024/150323855360), sparse 0% (42422272), duration 36, read/write 41/40 MB/s
INFO: status: 2% (3063676928/150323855360), sparse 0% (235909120), duration 87, read/write 30/26 MB/s
INFO: status: 3% (4558159872/150323855360), sparse 0% (377311232), duration 126, read/write 38/34 MB/s
INFO: status: 4% (6024724480/150323855360), sparse 0% (392949760), duration 177, read/write 28/28 MB/s
INFO: status: 5% (7579369472/150323855360), sparse 0% (559370240), duration 226, read/write 31/28 MB/s
INFO: status: 6% (9076998144/150323855360), sparse 0% (692879360), duration 266, read/write 37/34 MB/s
INFO: status: 7% (10531766272/150323855360), sparse 0% (697065472), duration 309, read/write 33/33 MB/s
INFO: status: 8% (12060721152/150323855360), sparse 0% (831627264), duration 364, read/write 27/25 MB/s
INFO: status: 9% (13577486336/150323855360), sparse 0% (1013186560), duration 417, read/write 28/25 MB/s
INFO: status: 10% (15134228480/150323855360), sparse 0% (1147375616), duration 463, read/write 33/30 MB/s
INFO: status: 11% (16538533888/150323855360), sparse 0% (1180848128), duration 495, read/write 43/42 MB/s
INFO: status: 12% (18081775616/150323855360), sparse 0% (1342926848), duration 534, read/write 39/35 MB/s
INFO: status: 13% (19568001024/150323855360), sparse 0% (1483931648), duration 558, read/write 61/56 MB/s
INFO: status: 14% (21066809344/150323855360), sparse 1% (2129547264), duration 587, read/write 51/29 MB/s
INFO: status: 15% (22604742656/150323855360), sparse 2% (3084271616), duration 596, read/write 170/64 MB/s
INFO: status: 16% (25506545664/150323855360), sparse 3% (5979791360), duration 599, read/write 967/2 MB/s
INFO: status: 19% (28610199552/150323855360), sparse 6% (9083416576), duration 602, read/write 1034/0 MB/s
INFO: status: 20% (30224285696/150323855360), sparse 6% (10100731904), duration 613, read/write 146/54 MB/s
INFO: status: 21% (31621251072/150323855360), sparse 6% (10106527744), duration 662, read/write 28/28 MB/s
INFO: status: 22% (33092468736/150323855360), sparse 6% (10246127616), duration 703, read/write 35/32 MB/s
INFO: status: 23% (34616508416/150323855360), sparse 6% (10419507200), duration 751, read/write 31/28 MB/s
INFO: status: 24% (36081762304/150323855360), sparse 6% (10424381440), duration 783, read/write 45/45 MB/s
INFO: status: 25% (37615501312/150323855360), sparse 7% (10816188416), duration 820, read/write 41/30 MB/s
INFO: status: 26% (39116996608/150323855360), sparse 7% (10970587136), duration 857, read/write 40/36 MB/s
INFO: status: 27% (40595619840/150323855360), sparse 7% (11057876992), duration 880, read/write 64/60 MB/s
INFO: status: 28% (42121297920/150323855360), sparse 7% (11220770816), duration 920, read/write 38/34 MB/s
INFO: status: 29% (43606081536/150323855360), sparse 7% (11354951680), duration 965, read/write 32/30 MB/s
INFO: status: 30% (45137920000/150323855360), sparse 7% (11988578304), duration 1000, read/write 43/25 MB/s
INFO: status: 31% (46631747584/150323855360), sparse 8% (12852830208), duration 1013, read/write 114/48 MB/s
INFO: status: 32% (48181608448/150323855360), sparse 9% (14293516288), duration 1016, read/write 516/36 MB/s
INFO: status: 33% (49857757184/150323855360), sparse 10% (15871234048), duration 1023, read/write 239/14 MB/s
INFO: status: 34% (51246268416/150323855360), sparse 10% (16395005952), duration 1045, read/write 63/39 MB/s
INFO: status: 35% (52683079680/150323855360), sparse 11% (16992624640), duration 1071, read/write 55/32 MB/s
INFO: status: 36% (54940663808/150323855360), sparse 12% (19049906176), duration 1076, read/write 451/40 MB/s
INFO: status: 37% (55708745728/150323855360), sparse 13% (19677921280), duration 1084, read/write 96/17 MB/s
INFO: status: 38% (57131073536/150323855360), sparse 13% (20075155456), duration 1122, read/write 37/26 MB/s
INFO: status: 39% (58673594368/150323855360), sparse 13% (20578545664), duration 1162, read/write 38/25 MB/s
INFO: status: 40% (60285452288/150323855360), sparse 14% (21390450688), duration 1183, read/write 76/38 MB/s
INFO: status: 41% (61661970432/150323855360), sparse 14% (21424959488), duration 1227, read/write 31/30 MB/s
INFO: status: 42% (63196889088/150323855360), sparse 14% (21598060544), duration 1263, read/write 42/37 MB/s
INFO: status: 43% (64690061312/150323855360), sparse 14% (22137143296), duration 1286, read/write 64/41 MB/s
INFO: status: 44% (66187296768/150323855360), sparse 15% (23135789056), duration 1293, read/write 213/71 MB/s
INFO: status: 45% (67671425024/150323855360), sparse 15% (23441285120), duration 1330, read/write 40/31 MB/s
INFO: status: 46% (69179801600/150323855360), sparse 15% (23575478272), duration 1367, read/write 40/37 MB/s
INFO: status: 47% (70681231360/150323855360), sparse 15% (23575621632), duration 1415, read/write 31/31 MB/s
INFO: status: 48% (72532361216/150323855360), sparse 16% (24123793408), duration 1440, read/write 74/52 MB/s
INFO: status: 49% (73710698496/150323855360), sparse 16% (24408797184), duration 1467, read/write 43/33 MB/s
INFO: status: 50% (75288150016/150323855360), sparse 16% (24546140160), duration 1514, read/write 33/30 MB/s
INFO: status: 51% (76686950400/150323855360), sparse 16% (24551235584), duration 1538, read/write 58/58 MB/s
INFO: status: 52% (78188380160/150323855360), sparse 16% (24684228608), duration 1576, read/write 39/36 MB/s
INFO: status: 53% (79724019712/150323855360), sparse 16% (24917360640), duration 1605, read/write 52/44 MB/s
INFO: status: 54% (81233051648/150323855360), sparse 16% (25179627520), duration 1638, read/write 45/37 MB/s
INFO: status: 55% (83030966272/150323855360), sparse 17% (26290647040), duration 1658, read/write 89/34 MB/s
INFO: status: 56% (84183482368/150323855360), sparse 17% (26475999232), duration 1694, read/write 32/26 MB/s
INFO: status: 57% (85979693056/150323855360), sparse 18% (27589394432), duration 1720, read/write 69/26 MB/s
INFO: status: 58% (87204560896/150323855360), sparse 19% (28769161216), duration 1723, read/write 408/15 MB/s
INFO: status: 59% (88762679296/150323855360), sparse 19% (29163438080), duration 1788, read/write 23/17 MB/s
INFO: status: 60% (90303365120/150323855360), sparse 19% (29514854400), duration 1830, read/write 36/28 MB/s
INFO: status: 61% (91704918016/150323855360), sparse 20% (30495232000), duration 1837, read/write 200/60 MB/s
INFO: status: 62% (93881368576/150323855360), sparse 21% (31828111360), duration 1854, read/write 128/49 MB/s
INFO: status: 63% (94704107520/150323855360), sparse 21% (32472068096), duration 1858, read/write 205/44 MB/s
INFO: status: 64% (96216285184/150323855360), sparse 21% (32481275904), duration 1898, read/write 37/37 MB/s
INFO: status: 65% (97743405056/150323855360), sparse 21% (32628011008), duration 1925, read/write 56/51 MB/s
INFO: status: 66% (99215802368/150323855360), sparse 21% (32763043840), duration 1954, read/write 50/46 MB/s
INFO: status: 67% (100777132032/150323855360), sparse 22% (33260335104), duration 1972, read/write 86/59 MB/s
INFO: status: 68% (102232686592/150323855360), sparse 22% (33620480000), duration 1997, read/write 58/43 MB/s
INFO: status: 69% (103777304576/150323855360), sparse 22% (34514657280), duration 2008, read/write 140/59 MB/s
INFO: status: 70% (105309143040/150323855360), sparse 23% (35034693632), duration 2025, read/write 90/59 MB/s
INFO: status: 71% (106780164096/150323855360), sparse 23% (35101949952), duration 2052, read/write 54/51 MB/s
INFO: status: 72% (108519555072/150323855360), sparse 24% (36191748096), duration 2069, read/write 102/38 MB/s
INFO: status: 73% (109770047488/150323855360), sparse 24% (36766814208), duration 2081, read/write 104/56 MB/s
INFO: status: 74% (111273050112/150323855360), sparse 25% (37979205632), duration 2087, read/write 250/48 MB/s
INFO: status: 75% (112777691136/150323855360), sparse 25% (38565281792), duration 2106, read/write 79/48 MB/s
INFO: status: 76% (114263785472/150323855360), sparse 25% (38727147520), duration 2134, read/write 53/47 MB/s
INFO: status: 77% (115781140480/150323855360), sparse 25% (38744125440), duration 2158, read/write 63/62 MB/s
INFO: status: 78% (117297774592/150323855360), sparse 26% (39179644928), duration 2181, read/write 65/47 MB/s
INFO: status: 79% (118795403264/150323855360), sparse 26% (39325007872), duration 2205, read/write 62/56 MB/s
INFO: status: 80% (120914903040/150323855360), sparse 26% (40380039168), duration 2223, read/write 117/59 MB/s
INFO: status: 81% (121916096512/150323855360), sparse 27% (41001185280), duration 2231, read/write 125/47 MB/s
INFO: status: 82% (123266269184/150323855360), sparse 27% (41387909120), duration 2250, read/write 71/50 MB/s
INFO: status: 83% (124770713600/150323855360), sparse 27% (41517772800), duration 2274, read/write 62/57 MB/s
INFO: status: 84% (126325358592/150323855360), sparse 27% (41529110528), duration 2305, read/write 50/49 MB/s
INFO: status: 85% (127792578560/150323855360), sparse 27% (41817145344), duration 2328, read/write 63/51 MB/s
INFO: status: 86% (129297809408/150323855360), sparse 27% (42048147456), duration 2349, read/write 71/60 MB/s
INFO: status: 87% (130817458176/150323855360), sparse 28% (42104209408), duration 2376, read/write 56/54 MB/s
INFO: status: 88% (132798611456/150323855360), sparse 28% (43540082688), duration 2388, read/write 165/45 MB/s
INFO: status: 89% (133866979328/150323855360), sparse 29% (43950030848), duration 2398, read/write 106/65 MB/s
INFO: status: 90% (135383351296/150323855360), sparse 29% (44218912768), duration 2418, read/write 75/62 MB/s
INFO: status: 91% (136798732288/150323855360), sparse 29% (44218912768), duration 2449, read/write 45/45 MB/s
INFO: status: 92% (138311434240/150323855360), sparse 29% (44353064960), duration 2473, read/write 63/57 MB/s
INFO: status: 93% (139864834048/150323855360), sparse 29% (44526440448), duration 2500, read/write 57/51 MB/s
INFO: status: 94% (141323730944/150323855360), sparse 29% (44617838592), duration 2522, read/write 66/62 MB/s
INFO: status: 95% (142837284864/150323855360), sparse 29% (44866715648), duration 2546, read/write 63/52 MB/s
INFO: status: 96% (144352542720/150323855360), sparse 29% (45001084928), duration 2573, read/write 56/51 MB/s
INFO: status: 97% (145824940032/150323855360), sparse 30% (45271322624), duration 2596, read/write 64/52 MB/s
INFO: status: 98% (147360579584/150323855360), sparse 30% (45432238080), duration 2620, read/write 63/57 MB/s
INFO: status: 99% (148839923712/150323855360), sparse 30% (45576331264), duration 2651, read/write 47/43 MB/s
INFO: status: 100% (150323855360/150323855360), sparse 30% (45853716480), duration 2675, read/write 61/50 MB/s
INFO: transferred 150323 MB in 2675 seconds (56 MB/s)
INFO: archive file size: 61.01GB
INFO: delete old backup '/mnt/pve/backup01/dump/vzdump-qemu-111-2018_07_20-22_00_01.vma.gz'
INFO: Finished Backup of VM 111 (00:44:50)
 
Hello,

I have a similar (or same) problem. In my case the storage is HPE MSA 2050 (attached by SAS). There's 4 LUNs where VMs located as LVM, shared between multiple nodes. Backup (snapshot mode) started weekly, and at first performance is great, it's 3-digit numbers. Pure read of sparse data can go up to 400 MB/s.

At some point the speed goes down, but not everywhere at once, only on particular LUN(s). Read speed of sparse data slowed to 20 Mb/s. It happens almost simultaneously on multiple nodes, if backup is going on more than one node at once. But only for particular LUNs, and I think it's most used LUNs (there more VMs that on others).

Somehow restart of controller on the storage correct the problem, and backup return to full speed. Because of multipath I can do that safely, but it needs to be done manually every time after a few hours, beacause read speed go down again.

The problem not likely in the storage, because only read speed of vzdump is affected, everything else work fine. I can copy data from the same LUN at high speed when backup already slowed. I'm not sure when this problem started, and I appreciate every suggestion for finding the cause.
 
Last edited:
@avn, may you please open up a new thread for it. Even though it seems similar to OPs question, the setup is different and it might get lost (not visible) in this thread.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!