Nightly backup causes KVM to die, requiring qm unlock to restart

andydills

Active Member
Jan 5, 2013
60
1
28
Some nights when our backups run, one of the HA KVMs will die, and to make things worse it isn't brought up on the other node.

Here's an example:

109: Jun 23 05:20:42 INFO: Starting Backup of VM 109 (qemu)
109: Jun 23 05:20:42 INFO: status = running
109: Jun 23 05:20:44 INFO: backup mode: snapshot
109: Jun 23 05:20:44 INFO: ionice priority: 7
109: Jun 23 05:20:44 INFO: creating archive '/mnt/pve/backups/dump/vzdump-qemu-109-2013_06_23-05_20_42.vma.lzo'
109: Jun 23 05:20:44 INFO: started backup task '078b2fae-37f0-489b-9910-7ecb9d2474d1'
109: Jun 23 05:20:47 INFO: status: 0% (387710976/42949672960), sparse 0% (74457088), duration 3, 129/104 MB/s
109: Jun 23 05:20:50 INFO: status: 1% (640548864/42949672960), sparse 0% (114208768), duration 6, 84/71 MB/s

snip


109: Jun 23 05:24:29 INFO: status: 16% (6906576896/42949672960), sparse 0% (282591232), duration 225, 45/45 MB/s
109: Jun 23 05:24:33 INFO: status: 17% (7336099840/42949672960), sparse 0% (318234624), duration 229, 107/98 MB/s
109: Jun 23 05:26:56 ERROR: VM 109 not running
109: Jun 23 05:26:56 INFO: aborting backup job
109: Jun 23 05:26:56 ERROR: unable to find configuration file for VM 109 - no such machine
109: Jun 23 05:26:56 INFO: no such VM ('109')
109: Jun 23 05:26:57 ERROR: Backup of VM 109 failed - VM 109 not running

Then, I see it's not running on the other node either. So, I have to log into the backup node, do a "qm unlock 109", and then I can start it and migrate it back to the primary.

We backup to an NFS share, which happens to be served by the same FreeNAS box serving the quorum disk (via iscsi), and I'm wondering if that could be part of the problem.

We haven't upgraded to 3.0 yet, but are running the most current 2.3:

pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-19-pve
proxmox-ve-2.6.32: 2.3-96
pve-kernel-2.6.32-17-pve: 2.6.32-83
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-19-pve: 2.6.32-96
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-20
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-7
vncterm: 1.0-4
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-10
ksm-control-daemon: 1.1-1


Suggestions?
 
Maybe the disk has errors - do you use qcow2? If so, try to check the image using 'qemu-img check'
 
I have a similar problem with one of my Proxmox servers. During the nightly backup the complete system hangs. After a reboot I've to unlock the vm and the backup did not complete.
Already changed memory, harddisks. Still same problem, only during nightly backup.
The server has two SATA 600 WD Black drives, one with Proxmox, the other only for Backups (LZO, Snapshot).

The VM has a RAW type of harddisk (Windows XP) and no sparse space.

Proxmox version:

pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-19-pve
proxmox-ve-2.6.32: 2.3-96
pve-kernel-2.6.32-10-pve: 2.6.32-63
pve-kernel-2.6.32-19-pve: 2.6.32-96
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-20
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-7
vncterm: 1.0-4
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-10
ksm-control-daemon: 1.1-1
 
One thing to add...if I disable the HA management on a VM exhibiting this problem, the problem stops occurring. Would it be helpful to disable HA on this VM for an evening to see a successful backup log?

I should also add, there are HA VMs which never have this problem.
 
Well, I spoke too soon, disabling HA on that VM did not fix it like it had fixed one other VM that had exhibited this behavior. Here is tonight's log:

Code:
  109: Jun 25 05:16:55 INFO: Starting Backup of VM 109 (qemu)  109: Jun 25 05:16:55 INFO: status = running
  109: Jun 25 05:16:56 INFO: backup mode: snapshot
  109: Jun 25 05:16:56 INFO: ionice priority: 7
  109: Jun 25 05:16:56 INFO: creating archive '/mnt/pve/backups/dump/vzdump-qemu-109-2013_06_25-05_16_55.vma.lzo'
  109: Jun 25 05:16:56 INFO: started backup task 'a3a75060-688c-43ab-9131-add9bace87d8'
  109: Jun 25 05:16:59 INFO: status: 0% (387710976/42949672960), sparse 0% (74457088), duration 3, 129/104 MB/s
  109: Jun 25 05:17:02 INFO: status: 1% (642383872/42949672960), sparse 0% (114208768), duration 6, 84/71 MB/s
  109: Jun 25 05:17:05 INFO: status: 2% (904658944/42949672960), sparse 0% (114253824), duration 9, 87/87 MB/s
  109: Jun 25 05:17:10 INFO: status: 3% (1315176448/42949672960), sparse 0% (133500928), duration 14, 82/78 MB/s
  109: Jun 25 05:17:17 INFO: status: 4% (1718091776/42949672960), sparse 0% (137486336), duration 21, 57/56 MB/s
  109: Jun 25 05:17:26 INFO: status: 5% (2199126016/42949672960), sparse 0% (138883072), duration 30, 53/53 MB/s
  109: Jun 25 05:17:32 INFO: status: 6% (2615148544/42949672960), sparse 0% (146427904), duration 36, 69/68 MB/s
  109: Jun 25 05:17:38 INFO: status: 7% (3078881280/42949672960), sparse 0% (185921536), duration 42, 77/70 MB/s
  109: Jun 25 05:17:42 INFO: status: 8% (3523608576/42949672960), sparse 0% (195747840), duration 46, 111/108 MB/s
  109: Jun 25 05:17:47 INFO: status: 9% (3880517632/42949672960), sparse 0% (210866176), duration 51, 71/68 MB/s
  109: Jun 25 05:17:53 INFO: status: 10% (4317773824/42949672960), sparse 0% (216883200), duration 57, 72/71 MB/s
  109: Jun 25 05:17:58 INFO: status: 11% (4755161088/42949672960), sparse 0% (219131904), duration 62, 87/87 MB/s
  109: Jun 25 05:18:05 INFO: status: 12% (5169479680/42949672960), sparse 0% (254619648), duration 69, 59/54 MB/s
  109: Jun 25 05:18:12 INFO: status: 13% (5633212416/42949672960), sparse 0% (258510848), duration 76, 66/65 MB/s
  109: Jun 25 05:18:18 INFO: status: 14% (6043729920/42949672960), sparse 0% (264867840), duration 82, 68/67 MB/s
  109: Jun 25 05:18:24 INFO: status: 15% (6488457216/42949672960), sparse 0% (280535040), duration 88, 74/71 MB/s
  109: Jun 25 05:22:02 ERROR: VM 109 not running
  109: Jun 25 05:22:02 INFO: aborting backup job
  109: Jun 25 05:22:02 ERROR: VM 109 not running
  109: Jun 25 05:22:03 ERROR: Backup of VM 109 failed - VM 109 not running

So, I decided to check the other logs for anything that corresponded to this, and I found this:

Code:
Jun 25 05:20:12 proxmox1 kernel: vmbr0: port 8(tap109i0) entering disabled state
Jun 25 05:20:12 proxmox1 kernel: vmbr0: port 8(tap109i0) entering disabled state
Jun 25 05:21:33 proxmox1 kernel: lzop          D ffff88086d2e2580     0 54699  24026    0 0x00000000
Jun 25 05:21:33 proxmox1 kernel: ffff8802b6c2b9c8 0000000000000082 0000000000000000 00000000000b3521
Jun 25 05:21:33 proxmox1 kernel: 0000000000c2b978 ffff8808a065c540 ffff88107c646010 ffff8808a065e9c0
Jun 25 05:21:33 proxmox1 kernel: ffff88107c646000 000000010e8c50ac ffff88086d2e2b48 000000000001e9c0
Jun 25 05:21:33 proxmox1 kernel: Call Trace:
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff811247d0>] ? sync_page+0x0/0x50
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff8151e543>] io_schedule+0x73/0xc0
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff8112480d>] sync_page+0x3d/0x50
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff8151f45f>] __wait_on_bit+0x5f/0x90
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff81124a43>] wait_on_page_bit+0x73/0x80
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff81097290>] ? wake_bit_function+0x0/0x40
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff81126d40>] grab_cache_page_write_begin+0xc0/0x150
Jun 25 05:21:33 proxmox1 kernel: [<ffffffffa0600477>] nfs_write_begin+0x77/0x230 [nfs]
Jun 25 05:21:33 proxmox1 kernel: [<ffffffffa06016cd>] ? nfs_write_end+0x6d/0x290 [nfs]
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff8112537b>] generic_file_buffered_write_iter+0x10b/0x2b0
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff811369d6>] ? free_pcppages_bulk+0x296/0x370
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff811271c2>] __generic_file_write_iter+0x1a2/0x3b0
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff81127455>] __generic_file_aio_write+0x85/0xa0
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff811274f8>] generic_file_aio_write+0x88/0x100
Jun 25 05:21:33 proxmox1 kernel: [<ffffffffa06011fc>] nfs_file_write+0x10c/0x210 [nfs]
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff811988ca>] do_sync_write+0xfa/0x140
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff81097250>] ? autoremove_wake_function+0x0/0x40
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff8100984c>] ? __switch_to+0x1ac/0x320
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff81198ba8>] vfs_write+0xb8/0x1a0
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff811994a1>] sys_write+0x51/0x90
Jun 25 05:21:33 proxmox1 kernel: [<ffffffff8100b102>] system_call_fastpath+0x16/0x1b

Anybody have any ideas what that could indicate, or things I could do to further troubleshoot this?
 
Worked to local storage:

Code:
[COLOR=#000000][FONT=tahoma]INFO: starting new backup job: vzdump 109 --remove 0 --mode snapshot --compress lzo --storage local --node proxmox1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 109 (qemu)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/var/lib/vz/dump/vzdump-qemu-109-2013_06_25-07_36_58.vma.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: started backup task '9ba8506e-c6d0-4236-8105-97133e361a1b'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 0% (363003904/42949672960), sparse 0% (74375168), duration 3, 121/96 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 1% (535953408/42949672960), sparse 0% (85663744), duration 6, 57/53 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 2% (862846976/42949672960), sparse 0% (114266112), duration 11, 65/59 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 3% (1318977536/42949672960), sparse 0% (135368704), duration 18, 65/62 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 4% (1727660032/42949672960), sparse 0% (137461760), duration 28, 40/40 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 5% (2161311744/42949672960), sparse 0% (137555968), duration 40, 36/36 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 6% (2580938752/42949672960), sparse 0% (137592832), duration 50, 41/41 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 7% (3018063872/42949672960), sparse 0% (174706688), duration 57, 62/57 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 8% (3446603776/42949672960), sparse 0% (175095808), duration 62, 85/85 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 9% (3873308672/42949672960), sparse 0% (199667712), duration 69, 60/57 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 10% (4356046848/42949672960), sparse 0% (205754368), duration 76, 68/68 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 11% (4729798656/42949672960), sparse 0% (207818752), duration 80, 93/92 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 12% (5192286208/42949672960), sparse 0% (243310592), duration 89, 51/47 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 13% (5610405888/42949672960), sparse 0% (247083008), duration 96, 59/59 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 14% (6051332096/42949672960), sparse 0% (253480960), duration 103, 62/62 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 15% (6507134976/42949672960), sparse 0% (269135872), duration 110, 65/62 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 16% (6872367104/42949672960), sparse 0% (269623296), duration 119, 40/40 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 17% (7305363456/42949672960), sparse 0% (306753536), duration 124, 86/79 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 18% (7788429312/42949672960), sparse 0% (311832576), duration 133, 53/53 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 19% (8244559872/42949672960), sparse 0% (318144512), duration 136, 152/149 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 20% (8675459072/42949672960), sparse 0% (323969024), duration 139, 143/141 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 21% (9192407040/42949672960), sparse 1% (792154112), duration 142, 172/16 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 23% (9882828800/42949672960), sparse 3% (1402327040), duration 145, 230/26 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 24% (10319953920/42949672960), sparse 3% (1408507904), duration 151, 72/71 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 25% (10768482304/42949672960), sparse 3% (1410940928), duration 159, 56/55 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 26% (11198005248/42949672960), sparse 3% (1414635520), duration 167, 53/53 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 27% (11608522752/42949672960), sparse 3% (1453998080), duration 175, 51/46 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 28% (12029788160/42949672960), sparse 3% (1457807360), duration 182, 60/59 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 29% (12456165376/42949672960), sparse 3% (1459040256), duration 187, 85/85 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 30% (12921077760/42949672960), sparse 3% (1461473280), duration 192, 92/92 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 31% (13322813440/42949672960), sparse 3% (1497993216), duration 199, 57/52 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 32% (13809352704/42949672960), sparse 3% (1543004160), duration 205, 81/73 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 33% (14216069120/42949672960), sparse 3% (1546063872), duration 214, 45/44 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 34% (14607581184/42949672960), sparse 3% (1546383360), duration 222, 48/48 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 35% (15044706304/42949672960), sparse 3% (1548894208), duration 229, 62/62 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 36% (15516041216/42949672960), sparse 3% (1558892544), duration 239, 47/46 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 37% (15918956544/42949672960), sparse 3% (1605459968), duration 246, 57/50 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 38% (16337076224/42949672960), sparse 3% (1606545408), duration 253, 59/59 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 39% (16804610048/42949672960), sparse 3% (1614110720), duration 262, 51/51 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 40% (17249337344/42949672960), sparse 3% (1616482304), duration 269, 63/63 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 41% (17644650496/42949672960), sparse 3% (1624510464), duration 275, 65/64 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 42% (18070372352/42949672960), sparse 3% (1658314752), duration 281, 70/65 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 43% (18564513792/42949672960), sparse 3% (1669783552), duration 287, 82/80 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 44% (18948423680/42949672960), sparse 3% (1687044096), duration 291, 95/91 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 45% (19377946624/42949672960), sparse 3% (1689374720), duration 297, 71/71 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 46% (19773259776/42949672960), sparse 3% (1691226112), duration 303, 65/65 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 47% (20232863744/42949672960), sparse 4% (1737117696), duration 311, 57/51 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 48% (20617101312/42949672960), sparse 4% (1737494528), duration 320, 42/42 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 49% (21058027520/42949672960), sparse 4% (1738141696), duration 330, 44/44 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 50% (21495152640/42949672960), sparse 4% (1739886592), duration 339, 48/48 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 51% (21951283200/42949672960), sparse 4% (1742155776), duration 346, 65/64 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 52% (22357475328/42949672960), sparse 4% (1775972352), duration 354, 50/46 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 53% (22791323648/42949672960), sparse 4% (1776074752), duration 368, 30/30 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 54% (23224647680/42949672960), sparse 4% (1776152576), duration 381, 33/33 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 55% (23623761920/42949672960), sparse 4% (1776222208), duration 392, 36/36 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 56% (24060887040/42949672960), sparse 4% (1776513024), duration 405, 33/33 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 57% (24574033920/42949672960), sparse 4% (1814392832), duration 414, 57/52 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 58% (24992612352/42949672960), sparse 4% (1837174784), duration 418, 104/98 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 59% (25368461312/42949672960), sparse 4% (1852342272), duration 423, 75/72 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 60% (25794183168/42949672960), sparse 4% (1854590976), duration 430, 60/60 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 61% (26288324608/42949672960), sparse 4% (1955930112), duration 438, 61/49 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 62% (26668433408/42949672960), sparse 4% (2057367552), duration 443, 76/55 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 63% (27061911552/42949672960), sparse 4% (2058715136), duration 449, 65/65 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 64% (27550285824/42949672960), sparse 4% (2072748032), duration 459, 48/47 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 65% (27980922880/42949672960), sparse 4% (2089648128), duration 465, 71/68 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 66% (28394127360/42949672960), sparse 4% (2093621248), duration 470, 82/81 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 67% (28827451392/42949672960), sparse 4% (2143277056), duration 474, 108/95 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 68% (29291184128/42949672960), sparse 5% (2156883968), duration 479, 92/90 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 69% (29812457472/42949672960), sparse 5% (2242846720), duration 482, 173/145 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 71% (30670585856/42949672960), sparse 7% (3094245376), duration 485, 286/2 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 72% (30967463936/42949672960), sparse 7% (3202224128), duration 488, 98/62 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 73% (31358976000/42949672960), sparse 7% (3291291648), duration 491, 130/100 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 74% (31830310912/42949672960), sparse 7% (3317846016), duration 495, 117/111 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 75% (32260292608/42949672960), sparse 7% (3331362816), duration 499, 107/104 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 76% (32689356800/42949672960), sparse 7% (3347087360), duration 503, 107/103 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 77% (33096466432/42949672960), sparse 7% (3385843712), duration 508, 81/73 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 78% (33662435328/42949672960), sparse 8% (3705765888), duration 515, 80/35 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 80% (34501033984/42949672960), sparse 10% (4543053824), duration 518, 279/0 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 81% (35156262912/42949672960), sparse 11% (5056090112), duration 521, 218/47 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 83% (35855990784/42949672960), sparse 13% (5727858688), duration 524, 233/9 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 85% (36727095296/42949672960), sparse 15% (6597603328), duration 527, 290/0 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 86% (37300076544/42949672960), sparse 16% (7032471552), duration 530, 190/46 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 87% (37474926592/42949672960), sparse 16% (7039660032), duration 533, 58/55 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 88% (37977718784/42949672960), sparse 17% (7510609920), duration 536, 167/10 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 90% (38851837952/42949672960), sparse 19% (8383361024), duration 539, 291/0 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 92% (39692861440/42949672960), sparse 21% (9223069696), duration 542, 280/0 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 94% (40540897280/42949672960), sparse 23% (10069778432), duration 545, 282/0 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 96% (41407610880/42949672960), sparse 25% (10935140352), duration 548, 288/0 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 98% (42236182528/42949672960), sparse 27% (11762417664), duration 551, 276/0 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 100% (42949672960/42949672960), sparse 29% (12474793984), duration 554, 237/0 MB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: transferred 42949 MB in 554 seconds (77 MB/s)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: archive file size: 14.78GB[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Finished Backup of VM 109 (00:09:15)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Backup job finished successfully[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]TASK OK[/FONT][/COLOR]


I have adjusted vzdump.conf to specify a bwlimit a few mbps lower than the interface speed, to make sure the NFS traffic doesn't stomp on the iscsi quorum disk. So far the backup is running well, this may be the solution.

EDIT: The backup finished successfully, so hopefully this is resolved.

I guess my primary confusion is how an NFS backup failure can cause the KVM to completely exit.


 
Last edited:
Well, unfortunately it died again last night, even with the limit in place for vzdump.

Is there a way to specify, for a specific VM, which backup pool is used (for the automatic nightly backup)?

I'm still very confused as to how an NFS problem can cause the KVM being backed up to exit.

EDIT: Can somebody help me understand what could possibly be going on here? What possible reasons could almost all of my KVMs never have this problem, but this particular KVM exits every night during backup? Surely it can't be an NFS issue, other KVMs would have problems. And there haven't been any of those odd errors since. Here's what I see in the messages from this morning:

Code:
Jun 26 05:29:29 proxmox1 kernel: vmbr0: port 6(tap109i0) entering disabled stateJun 26 05:29:29 proxmox1 kernel: vmbr0: port 6(tap109i0) entering disabled state
Jun 26 05:29:37 proxmox1 rgmanager[3269]: status on pvevm "109" returned 7 (unspecified)
Jun 26 05:29:38 proxmox1 rgmanager[3269]: Stopping service pvevm:109
Jun 26 05:29:38 proxmox1 rgmanager[388098]: [pvevm] VM 109 is already stopped
Jun 26 05:29:38 proxmox1 rgmanager[3269]: Service pvevm:109 is recovering
Jun 26 05:29:39 proxmox1 rgmanager[3269]: Recovering failed service pvevm:109
Jun 26 05:29:39 proxmox1 pvevm: <root@pam> starting task UPID:proxmox1:0005EC1E:006625F2:51CAB483:qmstart:109:root@pam:
Jun 26 05:29:39 proxmox1 task UPID:proxmox1:0005EC1E:006625F2:51CAB483:qmstart:109:root@pam:: start VM 109: UPID:proxmox1:0005EC1E:0
06625F2:51CAB483:qmstart:109:root@pam:
Jun 26 05:29:39 proxmox1 rgmanager[3269]: start on pvevm "109" returned 1 (generic error)
Jun 26 05:29:39 proxmox1 rgmanager[3269]: #68: Failed to start pvevm:109; return value: 1
Jun 26 05:29:39 proxmox1 rgmanager[3269]: Stopping service pvevm:109
Jun 26 05:29:40 proxmox1 rgmanager[388128]: [pvevm] VM 109 is already stopped
Jun 26 05:29:40 proxmox1 rgmanager[3269]: Service pvevm:109 is recovering
Jun 26 05:29:40 proxmox1 rgmanager[3269]: #71: Relocating failed service pvevm:109
Jun 26 05:29:43 proxmox1 rgmanager[3269]: Service pvevm:109 is now running on member 2

The KVM in question is running CentOS 6.4. I'm simply using e1000 and scsi...and now that I think about it, the other linux VMs are virtio (I didn't set this one up).

Is it worth my time to convert the KVM to virtio to see if that resolves the problem?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!