I have this problem with a backup one one VM: the backup starts "ok" (not as fast as it could, but reasonable speed) but at some point slows down to almost no progress. Here are two logs:
As you can see, there are sometimes "gaps" like in Log 1 between 59% and 60% it took more than 6000 seconds. When the backup is in this state (slow), the system is completely idle.
No CPU activity.
iostat -kx 2 shows almost no disk activity.
Network is mostly idle.
When I log in to the VM and do a dd from the disk, it reads with >100 MB/s.
When I do a dd on the RBD device on the host (we use Ceph with KRBD), I get >100MB/s.
When I write to the backup storage (GlusterFS) I get > 100 MB/s, also sustained for a long time.
What else can I test/change? Other VMs (also similar in size) do not show this behaviour. This VM backed up until last week with no problem. I did have this problem before on other VMs, too. It's quite random.
Thanks,
Stefan
Code:
INFO: Starting Backup of VM 111 (qemu)
INFO: status = running
INFO: update VM 111: -lock backup
INFO: include disk 'scsi0' 'rbd_ct:vm-111-disk-1' 140G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/backup01/dump/vzdump-qemu-111-2018_06_23-21_30_02.vma.gz'
INFO: started backup task 'e1795f14-3f61-4c01-85d8-4ff1d371579c'
INFO: status: 0% (160432128/150323855360), sparse 0% (7118848), duration 3, read/write 53/51 MB/s
INFO: status: 1% (1543241728/150323855360), sparse 0% (204550144), duration 34, read/write 44/38 MB/s
INFO: status: 2% (3028484096/150323855360), sparse 0% (527478784), duration 68, read/write 43/34 MB/s
INFO: status: 3% (4515430400/150323855360), sparse 0% (639668224), duration 105, read/write 40/37 MB/s
INFO: status: 4% (6023806976/150323855360), sparse 0% (655355904), duration 152, read/write 32/31 MB/s
INFO: status: 5% (7571767296/150323855360), sparse 0% (813867008), duration 189, read/write 41/37 MB/s
INFO: status: 6% (9069592576/150323855360), sparse 0% (945946624), duration 224, read/write 42/39 MB/s
INFO: status: 7% (10555293696/150323855360), sparse 0% (950132736), duration 253, read/write 51/51 MB/s
INFO: status: 8% (12089229312/150323855360), sparse 0% (1085124608), duration 292, read/write 39/35 MB/s
INFO: status: 9% (13573160960/150323855360), sparse 0% (1259769856), duration 334, read/write 35/31 MB/s
INFO: status: 10% (15121907712/150323855360), sparse 0% (1383661568), duration 380, read/write 33/30 MB/s
INFO: status: 11% (16557539328/150323855360), sparse 0% (1411604480), duration 425, read/write 31/31 MB/s
INFO: status: 12% (18057658368/150323855360), sparse 1% (1567252480), duration 480, read/write 27/24 MB/s
INFO: status: 13% (19542573056/150323855360), sparse 1% (1711927296), duration 530, read/write 29/26 MB/s
INFO: status: 14% (21064908800/150323855360), sparse 1% (1724657664), duration 562, read/write 47/47 MB/s
INFO: status: 15% (22555197440/150323855360), sparse 1% (1872805888), duration 622, read/write 24/22 MB/s
INFO: status: 16% (24084348928/150323855360), sparse 1% (1996996608), duration 674, read/write 29/27 MB/s
INFO: status: 17% (25559367680/150323855360), sparse 1% (2018693120), duration 3169, read/write 0/0 MB/s
INFO: status: 18% (27088584704/150323855360), sparse 1% (2235613184), duration 3409, read/write 6/5 MB/s
INFO: status: 19% (28577103872/150323855360), sparse 1% (2392182784), duration 3439, read/write 49/44 MB/s
INFO: status: 20% (30070669312/150323855360), sparse 1% (2542043136), duration 3464, read/write 59/53 MB/s
INFO: status: 21% (31578128384/150323855360), sparse 1% (2701664256), duration 3495, read/write 48/43 MB/s
INFO: status: 22% (33075363840/150323855360), sparse 1% (2715508736), duration 3538, read/write 34/34 MB/s
INFO: status: 23% (34611200000/150323855360), sparse 1% (2863001600), duration 3568, read/write 51/46 MB/s
INFO: status: 24% (36127047680/150323855360), sparse 2% (3041021952), duration 3595, read/write 56/49 MB/s
INFO: status: 25% (37624610816/150323855360), sparse 2% (3100639232), duration 3628, read/write 45/43 MB/s
INFO: status: 26% (39113195520/150323855360), sparse 2% (3423080448), duration 3653, read/write 59/46 MB/s
INFO: status: 27% (40614952960/150323855360), sparse 2% (3566981120), duration 3684, read/write 48/43 MB/s
INFO: status: 28% (42146136064/150323855360), sparse 2% (3687231488), duration 3714, read/write 51/47 MB/s
INFO: status: 29% (43623120896/150323855360), sparse 2% (3714830336), duration 3745, read/write 47/46 MB/s
INFO: status: 30% (45119176704/150323855360), sparse 2% (3849728000), duration 3772, read/write 55/50 MB/s
INFO: status: 31% (46631878656/150323855360), sparse 2% (3980304384), duration 3808, read/write 42/38 MB/s
INFO: status: 32% (48129703936/150323855360), sparse 2% (4001726464), duration 3834, read/write 57/56 MB/s
INFO: status: 33% (49622089728/150323855360), sparse 2% (4138037248), duration 3870, read/write 41/37 MB/s
INFO: status: 34% (51130269696/150323855360), sparse 2% (4212781056), duration 3899, read/write 52/49 MB/s
INFO: status: 35% (52621737984/150323855360), sparse 2% (4348125184), duration 3935, read/write 41/37 MB/s
INFO: status: 36% (54156656640/150323855360), sparse 2% (4509302784), duration 3965, read/write 51/45 MB/s
INFO: status: 37% (55641440256/150323855360), sparse 3% (4591120384), duration 3997, read/write 46/43 MB/s
INFO: status: 38% (57130156032/150323855360), sparse 3% (4816203776), duration 4027, read/write 49/42 MB/s
INFO: status: 39% (58682245120/150323855360), sparse 3% (4968542208), duration 4056, read/write 53/48 MB/s
INFO: status: 40% (60205367296/150323855360), sparse 3% (5023625216), duration 4087, read/write 49/47 MB/s
INFO: status: 41% (61672587264/150323855360), sparse 3% (5185970176), duration 4116, read/write 50/44 MB/s
INFO: status: 42% (63137841152/150323855360), sparse 3% (5373067264), duration 4143, read/write 54/47 MB/s
INFO: status: 43% (64650412032/150323855360), sparse 3% (5526986752), duration 4179, read/write 42/37 MB/s
INFO: status: 44% (66161737728/150323855360), sparse 3% (5537914880), duration 4210, read/write 48/48 MB/s
INFO: status: 45% (67707994112/150323855360), sparse 3% (5670223872), duration 4243, read/write 46/42 MB/s
INFO: status: 46% (69182095360/150323855360), sparse 3% (5804797952), duration 4290, read/write 31/28 MB/s
INFO: status: 47% (70708887552/150323855360), sparse 3% (5817257984), duration 4319, read/write 52/52 MB/s
INFO: status: 48% (72157298688/150323855360), sparse 3% (5940146176), duration 4349, read/write 48/44 MB/s
INFO: status: 49% (73661284352/150323855360), sparse 4% (6074302464), duration 4377, read/write 53/48 MB/s
INFO: status: 50% (75180015616/150323855360), sparse 4% (6164926464), duration 4403, read/write 58/54 MB/s
INFO: status: 51% (76709560320/150323855360), sparse 4% (6395613184), duration 4436, read/write 46/39 MB/s
INFO: status: 52% (78199914496/150323855360), sparse 4% (6560391168), duration 4469, read/write 45/40 MB/s
INFO: status: 53% (79700033536/150323855360), sparse 4% (6689472512), duration 4499, read/write 50/45 MB/s
INFO: status: 54% (81214046208/150323855360), sparse 4% (6704939008), duration 5162, read/write 2/2 MB/s
INFO: status: 55% (82687688704/150323855360), sparse 4% (6818947072), duration 5189, read/write 54/50 MB/s
INFO: status: 56% (84224507904/150323855360), sparse 4% (6971334656), duration 5220, read/write 49/44 MB/s
INFO: status: 57% (85702606848/150323855360), sparse 4% (7118102528), duration 5251, read/write 47/42 MB/s
INFO: status: 58% (87223631872/150323855360), sparse 4% (7305052160), duration 5280, read/write 52/46 MB/s
INFO: status: 59% (88717918208/150323855360), sparse 4% (7443169280), duration 5313, read/write 45/41 MB/s
INFO: status: 60% (90210435072/150323855360), sparse 4% (7454748672), duration 11384, read/write 0/0 MB/s
INFO: status: 61% (91735457792/150323855360), sparse 4% (7471779840), duration 11707, read/write 4/4 MB/s
INFO: status: 62% (93207396352/150323855360), sparse 4% (7490129920), duration 12014, read/write 4/4 MB/s
INFO: status: 63% (94771609600/150323855360), sparse 4% (7505657856), duration 12334, read/write 4/4 MB/s
INFO: status: 64% (96271925248/150323855360), sparse 5% (7545704448), duration 12624, read/write 5/5 MB/s
INFO: status: 65% (97735147520/150323855360), sparse 5% (7562711040), duration 12914, read/write 5/4 MB/s
INFO: status: 66% (99223470080/150323855360), sparse 5% (7588704256), duration 17714, read/write 0/0 MB/s
INFO: status: 67% (100719984640/150323855360), sparse 5% (7613255680), duration 105541, read/write 0/0 MB/s
INFO: status: 68% (102227705856/150323855360), sparse 5% (7629295616), duration 105770, read/write 6/6 MB/s
INFO: status: 69% (103729004544/150323855360), sparse 5% (7629561856), duration 106032, read/write 5/5 MB/s
INFO: status: 70% (105229123584/150323855360), sparse 5% (7632371712), duration 106412, read/write 3/3 MB/s
INFO: status: 71% (106735927296/150323855360), sparse 5% (7811944448), duration 106672, read/write 5/5 MB/s
INFO: status: 72% (108233687040/150323855360), sparse 5% (7822983168), duration 106946, read/write 5/5 MB/s
INFO: status: 73% (109748355072/150323855360), sparse 5% (7836348416), duration 107162, read/write 7/6 MB/s
INFO: status: 74% (111242838016/150323855360), sparse 5% (7843069952), duration 107383, read/write 6/6 MB/s
INFO: status: 75% (112789225472/150323855360), sparse 5% (7914864640), duration 107543, read/write 9/9 MB/s
INFO: status: 76% (114267979776/150323855360), sparse 5% (8054267904), duration 107761, read/write 6/6 MB/s
INFO: status: 77% (115761676288/150323855360), sparse 5% (8169975808), duration 124941, read/write 0/0 MB/s
INFO: status: 78% (117255962624/150323855360), sparse 5% (8304369664), duration 124979, read/write 39/35 MB/s
INFO: status: 79% (118781509632/150323855360), sparse 5% (8423411712), duration 125018, read/write 39/36 MB/s
INFO: status: 80% (120286019584/150323855360), sparse 5% (8544096256), duration 125057, read/write 38/35 MB/s
Code:
INFO: Starting Backup of VM 111 (qemu)
INFO: status = running
INFO: update VM 111: -lock backup
INFO: include disk 'scsi0' 'rbd_ct:vm-111-disk-1' 140G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/backup01/dump/vzdump-qemu-111-2018_06_25-21_30_02.vma.gz'
INFO: started backup task '40e230ba-c332-4f45-95d8-25fde55f3210'
INFO: status: 0% (133038080/150323855360), sparse 0% (7147520), duration 3, read/write 44/41 MB/s
INFO: status: 1% (1535639552/150323855360), sparse 0% (204435456), duration 29, read/write 53/46 MB/s
INFO: status: 2% (3046703104/150323855360), sparse 0% (528662528), duration 64, read/write 43/33 MB/s
INFO: status: 3% (4556390400/150323855360), sparse 0% (642711552), duration 122, read/write 26/24 MB/s
INFO: status: 4% (6028394496/150323855360), sparse 0% (658427904), duration 171, read/write 30/29 MB/s
INFO: status: 5% (7574847488/150323855360), sparse 0% (816939008), duration 210, read/write 39/35 MB/s
INFO: status: 6% (9019916288/150323855360), sparse 0% (835088384), duration 3481, read/write 0/0 MB/s
INFO: status: 7% (10540482560/150323855360), sparse 0% (844193792), duration 4734, read/write 1/1 MB/s
INFO: status: 8% (12067667968/150323855360), sparse 0% (980254720), duration 10253, read/write 0/0 MB/s
INFO: status: 9% (13531021312/150323855360), sparse 0% (990515200), duration 18648, read/write 0/0 MB/s
INFO: status: 10% (15044706304/150323855360), sparse 0% (1018007552), duration 19170, read/write 2/2 MB/s
INFO: status: 11% (16538664960/150323855360), sparse 0% (1031659520), duration 19469, read/write 4/4 MB/s
INFO: status: 12% (18044354560/150323855360), sparse 0% (1055346688), duration 21679, read/write 0/0 MB/s
INFO: status: 13% (19545063424/150323855360), sparse 0% (1078284288), duration 22382, read/write 2/2 MB/s
INFO: status: 14% (21065629696/150323855360), sparse 0% (1090793472), duration 23589, read/write 1/1 MB/s
INFO: status: 15% (22583640064/150323855360), sparse 0% (1100865536), duration 23860, read/write 5/5 MB/s
INFO: status: 16% (24122556416/150323855360), sparse 0% (1113579520), duration 28958, read/write 0/0 MB/s
INFO: status: 17% (25558908928/150323855360), sparse 0% (1129041920), duration 29286, read/write 4/4 MB/s
INFO: status: 18% (27067154432/150323855360), sparse 0% (1141547008), duration 31973, read/write 0/0 MB/s
INFO: status: 19% (28564258816/150323855360), sparse 0% (1161191424), duration 32290, read/write 4/4 MB/s
As you can see, there are sometimes "gaps" like in Log 1 between 59% and 60% it took more than 6000 seconds. When the backup is in this state (slow), the system is completely idle.
No CPU activity.
iostat -kx 2 shows almost no disk activity.
Network is mostly idle.
When I log in to the VM and do a dd from the disk, it reads with >100 MB/s.
When I do a dd on the RBD device on the host (we use Ceph with KRBD), I get >100MB/s.
When I write to the backup storage (GlusterFS) I get > 100 MB/s, also sustained for a long time.
What else can I test/change? Other VMs (also similar in size) do not show this behaviour. This VM backed up until last week with no problem. I did have this problem before on other VMs, too. It's quite random.
Thanks,
Stefan