KVM backup error since migrate to 2.3

Gardouille · Mar 11, 2013

Hello,

I have a little problem since i migrate one of my server to Proxmox VE 2.3 last friday.

A example of log before the migration:

Code:

402: mars 08 00:30:02 INFO: Starting Backup of VM 402 (qemu)
402: mars 08 00:30:02 INFO: status = running
402: mars 08 00:30:02 INFO: backup mode: suspend
402: mars 08 00:30:02 INFO: ionice priority: 7
402: mars 08 00:30:02 INFO: suspend vm
402: mars 08 00:30:03 INFO: creating archive '/mnt/backup_lancelot/05_vendredi/dump/vzdump-qemu-402-2013_03_08-00_30_02.tar.lzo'
402: mars 08 00:30:03 INFO: adding '/mnt/backup_lancelot/05_vendredi/dump/vzdump-qemu-402-2013_03_08-00_30_02.tmp/qemu-server.conf' to archive ('qemu-server.conf')
402: mars 08 00:30:03 INFO: adding '/var/lib/vz/images/402/vm-402-disk-1.raw' to archive ('vm-disk-ide0.raw')
402: mars 08 00:35:54 INFO: Total bytes written: 34359740928 (93.36 MiB/s)
402: mars 08 00:35:54 INFO: archive file size: 1.67GB
402: mars 08 00:35:54 INFO: delete old backup '/mnt/backup_lancelot/05_vendredi/dump/vzdump-qemu-402-2013_03_01-00_30_03.tar.lzo'
402: mars 08 00:35:59 INFO: resume vm
402: mars 08 00:36:00 INFO: vm is online again after 358 seconds
402: mars 08 00:36:00 INFO: Finished Backup of VM 402 (00:05:58)

405: mars 08 00:36:00 INFO: Starting Backup of VM 405 (qemu)
405: mars 08 00:36:00 INFO: status = running
405: mars 08 00:36:00 INFO: backup mode: suspend
405: mars 08 00:36:00 INFO: ionice priority: 7
405: mars 08 00:36:01 INFO: suspend vm
405: mars 08 00:36:01 INFO: creating archive '/mnt/backup_lancelot/05_vendredi/dump/vzdump-qemu-405-2013_03_08-00_36_00.tar.lzo'
405: mars 08 00:36:01 INFO: adding '/mnt/backup_lancelot/05_vendredi/dump/vzdump-qemu-405-2013_03_08-00_36_00.tmp/qemu-server.conf' to archive ('qemu-server.conf')
405: mars 08 00:36:01 INFO: adding '/var/lib/vz/images/405/vm-405-disk-1.qcow2' to archive ('vm-disk-ide0.qcow2')
405: mars 08 00:48:33 INFO: Total bytes written: 22356363776 (28.35 MiB/s)
405: mars 08 00:48:33 INFO: archive file size: 13.70GB
405: mars 08 00:48:33 INFO: delete old backup '/mnt/backup_lancelot/05_vendredi/dump/vzdump-qemu-405-2013_03_01-00_36_09.tar.lzo'
405: mars 08 00:49:14 INFO: resume vm
405: mars 08 00:49:15 INFO: vm is online again after 794 seconds
405: mars 08 00:49:15 INFO: Finished Backup of VM 405 (00:13:15)

412: mars 08 00:49:15 INFO: Starting Backup of VM 412 (qemu)
412: mars 08 00:49:15 INFO: status = running
412: mars 08 00:49:15 INFO: backup mode: suspend
412: mars 08 00:49:15 INFO: ionice priority: 7
412: mars 08 00:49:15 INFO: suspend vm
412: mars 08 00:49:16 INFO: creating archive '/mnt/backup_lancelot/05_vendredi/dump/vzdump-qemu-412-2013_03_08-00_49_15.tar.lzo'
412: mars 08 00:49:16 INFO: adding '/mnt/backup_lancelot/05_vendredi/dump/vzdump-qemu-412-2013_03_08-00_49_15.tmp/qemu-server.conf' to archive ('qemu-server.conf')
412: mars 08 00:49:16 INFO: adding '/var/lib/vz/images/412/vm-412-disk-1.qcow2' to archive ('vm-disk-ide0.qcow2')
412: mars 08 00:51:02 INFO: Total bytes written: 5721229824 (51.47 MiB/s)
412: mars 08 00:51:02 INFO: archive file size: 1.35GB
412: mars 08 00:51:02 INFO: delete old backup '/mnt/backup_lancelot/05_vendredi/dump/vzdump-qemu-412-2013_03_01-00_49_29.tar.lzo'
412: mars 08 00:51:06 INFO: resume vm
412: mars 08 00:51:07 INFO: vm is online again after 112 seconds
412: mars 08 00:51:07 INFO: Finished Backup of VM 412 (00:01:52)

And now, the problem with the saturday's log (same for monday's backup):

Code:

405: mars 09 00:35:13 INFO: Starting Backup of VM 405 (qemu)
405: mars 09 00:35:13 INFO: status = running
405: mars 09 00:35:14 INFO: backup mode: suspend
405: mars 09 00:35:14 INFO: ionice priority: 7
405: mars 09 00:35:14 INFO: suspend vm
405: mars 09 00:35:15 INFO: creating archive '/mnt/backup_lancelot/06_samedi/dump/vzdump-qemu-405-2013_03_09-00_35_13.vma.lzo'
405: mars 09 00:35:15 INFO: started backup task 'a86cc475-cf77-4dda-82cb-5ead7a659db5'
405: mars 09 00:35:18 INFO: status: 0% (98238464/34359738368), sparse 0% (389120), duration 3, 32/32 MB/s
405: mars 09 00:35:31 INFO: status: 1% (380239872/34359738368), sparse 0% (2678784), duration 16, 21/21 MB/s
405: mars 09 00:35:42 INFO: status: 2% (719126528/34359738368), sparse 0% (6250496), duration 27, 30/30 MB/s
405: mars 09 00:35:53 INFO: status: 3% (1060700160/34359738368), sparse 0% (13131776), duration 38, 31/30 MB/s
405: mars 09 00:35:55 INFO: status: 3% (1073741824/34359738368), sparse 0% (14180352), duration 40, 6/5 MB/s
[COLOR=#ff0000] 405: mars 09 00:35:55 ERROR: backup_complete_cb -5[/COLOR]
405: mars 09 00:35:55 INFO: aborting backup job
405: mars 09 00:35:56 INFO: resume vm
405: mars 09 00:35:57 INFO: vm is online again after 43 seconds
[COLOR=#ff0000] 405: mars 09 00:35:57 ERROR: Backup of VM 405 failed - backup_complete_cb -5[/COLOR]

I've got this problem only with the VM 405 ... The system doesn't create any archive vzdump-qemu-405-2013_03_09-00_35_13.vma.lzo

Sunday, the backup's error was different:

Code:

405: mars 10 00:35:25 INFO: Starting Backup of VM 405 (qemu)
405: mars 10 00:35:25 INFO: status = running
405: mars 10 00:35:25 INFO: backup mode: suspend
405: mars 10 00:35:25 INFO: ionice priority: 7
405: mars 10 00:35:26 INFO: suspend vm
405: mars 10 00:35:27 INFO: creating archive '/mnt/backup_lancelot/07_dimanche/dump/vzdump-qemu-405-2013_03_10-00_35_25.vma.lzo'
405: mars 10 00:35:27 INFO: started backup task 'a52e5694-c4c4-4fbc-a927-37688a4b339d'
405: mars 10 00:35:30 INFO: status: 0% (58261504/34359738368), sparse 0% (434176), duration 3, 19/19 MB/s
405: mars 10 00:35:43 INFO: status: 1% (355205120/34359738368), sparse 0% (2232320), duration 16, 22/22 MB/s
405: mars 10 00:35:55 INFO: status: 2% (700448768/34359738368), sparse 0% (5857280), duration 28, 28/28 MB/s
405: mars 10 00:36:07 INFO: status: 3% (1073741824/34359738368), sparse 0% (14241792), duration 40, 31/30 MB/s
405: mars 10 00:36:07 INFO: transferred 1073 MB in 40 seconds (26 MB/s)
405: mars 10 00:36:07 INFO: archive file size: 675MB
405: mars 10 00:36:07 INFO: delete old backup '/mnt/backup_lancelot/07_dimanche/dump/vzdump-qemu-405-2013_03_03-00_36_00.tar.lzo'
405: mars 10 00:36:47 INFO: resume vm
405: mars 10 00:36:47 INFO: vm is online again after 81 seconds
405: mars 10 00:36:47 INFO: Finished Backup of VM 405 (00:01:22)

And now, there is an archive of 675MB instead of 13.70GB ... Ok the new format is maybe more powerful but ... ^^

More informations about my system:

Code:

uname -a
Linux lancelot 2.6.32-18-pve #1 SMP Mon Jan 21 12:09:05 CET 2013 x86_64 GNU/Linux

Code:

pveversion -v
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-18-pve
proxmox-ve-2.6.32: 2.3-88
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-18-pve: 2.6.32-88
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-17
pve-firmware: 1.0-21
libpve-common-perl: 1.0-48
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-6
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-6
ksm-control-daemon: 1.1-1

Any idea or solution? I can give more informations if necessary =)

Sorry for my english ... I'm a french frog (but i don't like to eat this little green thing! :þ)

dietmar · Mar 11, 2013

Gardouille said:
And now, there is an archive of 675MB instead of 13.70GB ... Ok the new format is maybe more powerful but ... ^^

Are you able to restore the VM (please test using another vmid)?

Gardouille · Mar 11, 2013

Ok, test:

Code:

[B]qmrestore to_restore/vzdump-qemu-405-2013_03_10-00_35_25.vma.lzo  666[/B]

qmrestore to_restore/vzdump-qemu-405-2013_03_10-00_35_25.
vzdump-qemu-405-2013_03_10-00_35_25.log      vzdump-qemu-405-2013_03_10-00_35_25.vma.lzo  
root@lancelot:/mnt/backup_lancelot# qmrestore to_restore/vzdump-qemu-405-2013_03_10-00_35_25.vma.lzo 666
restore vma archive: lzop -d -c /mnt/backup_lancelot/to_restore/vzdump-qemu-405-2013_03_10-00_35_25.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp420079.fifo - /var/tmp/vzdumptmp420079
CFG: size: 254 name: qemu-server.conf
DEV: dev_id=1 size: 34359738368 devname: drive-ide0
CTIME: Sun Mar 10 00:35:27 2013
Formatting '/var/lib/vz/images/666/vm-666-disk-1.qcow2', fmt=qcow2 size=34359738368 encryption=off cluster_size=65536 preallocation='metadata' lazy_refcounts=off 
new volume ID is 'local:666/vm-666-disk-1.qcow2'
map 'drive-ide0' to '/var/lib/vz/images/666/vm-666-disk-1.qcow2' (write zeros = 0)
progress 1% (read 343605248 bytes, duration 3 sec)
progress 2% (read 687210496 bytes, duration 13 sec)
progress 3% (read 1030815744 bytes, duration 24 sec)

** ERROR **: restore failed - detected missing cluster 16356 for stream drive-ide0
aborting...
/bin/bash: line 1: 420081 Done                    lzop -d -c /mnt/backup_lancelot/to_restore/vzdump-qemu-405-2013_03_10-00_35_25.vma.lzo
     420082 Aborted                 | vma extract -v -r /var/tmp/vzdumptmp420079.fifo - /var/tmp/vzdumptmp420079
temporary volume 'local:666/vm-666-disk-1.qcow2' sucessfuly removed
command 'lzop -d -c /mnt/backup_lancelot/to_restore/vzdump-qemu-405-2013_03_10-00_35_25.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp420079.fifo - /var/tmp/vzdumptmp420079' failed: exit code 134

nugzarg · Mar 12, 2013

Same problem here.

Only one KVM-VM will not backup. Backups of all other VMs on the same proxmox-node (and on the same local storage) ends successful.

Code:

Mar 12 02:00:01 INFO: Starting Backup of VM 120 (qemu)
Mar 12 02:00:01 INFO: status = running
Mar 12 02:00:02 INFO: backup mode: snapshot
Mar 12 02:00:02 INFO: bandwidth limit: 51200 KB/s
Mar 12 02:00:02 INFO: ionice priority: 2
Mar 12 02:00:02 INFO: creating archive '/mnt/vzbackup/dump/vzdump-qemu-120-2013_03_12-02_00_01.vma.lzo'
Mar 12 02:00:02 INFO: started backup task 'b77eba29-021a-43ba-8820-c57aaf4b88b5'
Mar 12 02:00:05 INFO: status: 0% (82509824/171798691840), sparse 0% (11001856), duration 3, 27/23 MB/s
Mar 12 02:01:13 INFO: status: 1% (1722744832/171798691840), sparse 0% (225296384), duration 71, 24/20 MB/s
Mar 12 02:01:57 INFO: status: 2% (3464429568/171798691840), sparse 0% (736137216), duration 115, 39/27 MB/s
Mar 12 02:02:34 INFO: status: 3% (5159124992/171798691840), sparse 1% (1895231488), duration 152, 45/14 MB/s
Mar 12 02:03:09 INFO: status: 4% (6924533760/171798691840), sparse 1% (3309723648), duration 187, 50/10 MB/s
Mar 12 02:03:48 INFO: status: 5% (8642428928/171798691840), sparse 2% (4462587904), duration 226, 44/14 MB/s
Mar 12 02:04:28 INFO: status: 6% (10358816768/171798691840), sparse 3% (5696008192), duration 266, 42/12 MB/s
Mar 12 02:05:11 INFO: status: 7% (12075532288/171798691840), sparse 4% (6969344000), duration 309, 39/10 MB/s
Mar 12 02:05:44 INFO: status: 8% (13750435840/171798691840), sparse 5% (8601968640), duration 342, 50/1 MB/s
Mar 12 02:06:25 INFO: status: 9% (15504703488/171798691840), sparse 5% (10197975040), duration 383, 42/3 MB/s
Mar 12 02:07:18 INFO: status: 10% (17192321024/171798691840), sparse 6% (10572009472), duration 436, 31/24 MB/s
Mar 12 02:08:21 INFO: status: 11% (18910412800/171798691840), sparse 6% (10705747968), duration 499, 27/25 MB/s
Mar 12 02:09:19 INFO: status: 12% (20632305664/171798691840), sparse 6% (10878922752), duration 557, 29/26 MB/s
Mar 12 02:09:56 INFO: status: 13% (22342664192/171798691840), sparse 7% (12340969472), duration 594, 46/6 MB/s
Mar 12 02:10:30 INFO: status: 14% (24094441472/171798691840), sparse 8% (14087176192), duration 628, 51/0 MB/s
Mar 12 02:11:03 INFO: status: 15% (25820069888/171798691840), sparse 9% (15701647360), duration 661, 52/3 MB/s
Mar 12 02:11:35 INFO: status: 16% (27518763008/171798691840), sparse 10% (17399386112), duration 693, 53/0 MB/s
Mar 12 02:12:07 INFO: status: 17% (29222764544/171798691840), sparse 11% (19102642176), duration 725, 53/0 MB/s
Mar 12 02:12:40 INFO: status: 18% (30974541824/171798691840), sparse 12% (20852645888), duration 758, 53/0 MB/s
Mar 12 02:13:12 INFO: status: 19% (32678543360/171798691840), sparse 13% (22555832320), duration 790, 53/0 MB/s
Mar 12 02:13:44 INFO: status: 20% (34377236480/171798691840), sparse 14% (24252821504), duration 822, 53/0 MB/s
Mar 12 02:14:17 INFO: status: 21% (36078682112/171798691840), sparse 14% (25611038720), duration 855, 51/10 MB/s
Mar 12 02:14:51 INFO: status: 22% (37838782464/171798691840), sparse 15% (26465779712), duration 889, 51/26 MB/s
Mar 12 02:15:25 INFO: status: 23% (39548157952/171798691840), sparse 16% (28000038912), duration 923, 50/5 MB/s
Mar 12 02:16:01 INFO: status: 24% (41234333696/171798691840), sparse 17% (29291626496), duration 959, 46/10 MB/s
Mar 12 02:16:53 INFO: status: 25% (42973200384/171798691840), sparse 17% (29383454720), duration 1011, 33/31 MB/s
Mar 12 02:18:14 INFO: status: 26% (44671959040/171798691840), sparse 17% (29421592576), duration 1092, 20/20 MB/s
Mar 12 02:19:09 INFO: status: 27% (46392279040/171798691840), sparse 17% (29532053504), duration 1147, 31/29 MB/s
Mar 12 02:20:02 INFO: status: 28% (48115417088/171798691840), sparse 17% (29664505856), duration 1200, 32/30 MB/s
Mar 12 02:21:00 INFO: status: 29% (49826955264/171798691840), sparse 17% (29780123648), duration 1258, 29/27 MB/s
Mar 12 02:21:55 INFO: status: 30% (51555598336/171798691840), sparse 17% (29838086144), duration 1313, 31/30 MB/s
Mar 12 02:22:55 INFO: status: 31% (53265563648/171798691840), sparse 17% (29890441216), duration 1373, 28/27 MB/s
Mar 12 02:23:44 INFO: status: 32% (54982737920/171798691840), sparse 17% (30024392704), duration 1422, 35/32 MB/s
Mar 12 02:24:33 INFO: status: 33% (56697028608/171798691840), sparse 17% (30152245248), duration 1471, 34/32 MB/s
Mar 12 02:24:43 INFO: [COLOR=#ff0000]status: 33%[/COLOR] (57132580864/171798691840), sparse 17% (30359838720), duration 1481, 43/22 MB/s
Mar 12 02:24:43 ERROR: backup_complete_cb -5
Mar 12 02:24:43 INFO: aborting backup job
Mar 12 02:24:47 [COLOR=#ff0000]ERROR: Backup of VM 120 failed - backup_complete_cb -5[/COLOR]

Backup job always stops (many attempts) at 33%.

Code:

Mar 12 10:10:12 INFO:[COLOR=#ff0000] status: 33%[/COLOR] (56760139776/171798691840), sparse 17% (30356566016), duration 1224, 44/10 MB/s
Mar 12 10:10:12 ERROR: backup_complete_cb -5
Mar 12 10:10:12 INFO: aborting backup job
Mar 12 10:10:15 ERROR: [COLOR=#ff0000]Backup of VM 120 failed - backup_complete_cb -5[/COLOR]

Code:

pveversion -v
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-18-pve
proxmox-ve-2.6.32: 2.3-88
pve-kernel-2.6.32-10-pve: 2.6.32-63
pve-kernel-2.6.32-18-pve: 2.6.32-88
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-17
pve-firmware: 1.0-21
libpve-common-perl: 1.0-48
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-6
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-6
ksm-control-daemon: 1.1-1

Code:

uname -a
Linux proxmox1 2.6.32-18-pve #1 SMP Mon Jan 21 12:09:05 CET 2013 x86_64 GNU/Linux

dietmar · Mar 12, 2013

nugzarg said:
Only one KVM-VM will not backup. Backups of all other VMs on the same proxmox-node (and on the same local storage) ends successful.

Please can you post the VM config? Is there something special with this VM? Are there any IO errors in /var/log/syslog?

nugzarg · Mar 12, 2013

Hi Dietmar,

Here VM 120 config:

Code:

balloon: 3072
boot: cd
bootdisk: virtio0
cores: 3
ide2: none,media=cdrom
memory: 6144
name: web.somename.ge
net0: e1000=XX:XX:XX:XX:XX:XX,bridge=vmbr1
onboot: 1
ostype: l26
sockets: 2
startup: order=1
tablet: 0
virtio0: local:120/vm-120-disk-1.qcow2,size=160G

Config of VM 107 (whose backup works):

Code:

balloon: 3072
bootdisk: virtio0
cores: 2
ide2: none,media=cdrom
memory: 6144
name: changedname.ge
net0: e1000=XX:XX:XX:XX:XX:XX,bridge=vmbr1
onboot: 1
ostype: l26
sockets: 2
tablet: 0
virtio0: local:107/vm-107-disk-1.qcow2,size=300G

There is no IO errors in syslog.

Best Regards,
Nugzar

nugzarg · Mar 12, 2013

VM is in production. I can't give you a copy of qcow2 image. Sorry.

Best Regards,
Nugzar

Gardouille · Mar 12, 2013

dietmar said:
Please can you post the VM config? Is there something special with this VM? Are there any IO errors in /var/log/syslog?

Same informations for my VM with problems:

Code:

boot: cdn
bootdisk: ide0
cores: 1
ide0: local:405/vm-405-disk-1.qcow2
ide2: none,media=cdrom
keyboard: fr
memory: 1024
name: admin-win
net0: rtl8139=XX:XX:XX:XX:XX:XX,bridge=vmbr0
onboot: 1
ostype: wxp
sockets: 1

dietmar · Mar 12, 2013

Maybe it is related to the qcow format. For testing, please can you convert the disk to 'raw' using qemu-img and test again with that copy?

Gardouille · Mar 12, 2013

Code:

# qemu-img convert -O raw vm-666-disk-1.qcow2 vm-666-disk-1.raw 
qemu-img: error while reading sector 2097152: Input/output error 
zsh: exit 1     qemu-img convert -O raw vm-666-disk-1.qcow2  vm-666-disk-1.raw

If I modify the VM configuration file:
ide0: local:666/vm-666-disk-1.raw

The VM start without problem FOR Proxmox, but in the VM's console, there is a "Err. Read disk error".

For my others VM (no issue):
* one with the raw format: ide0: local:402/vm-402-disk-1.raw (32G)
* one with the qcow2 format: ide0: local:412/vm-412-disk-1.qcow2 (5,4G)

Edit:
:'( Seems to be my qcow2 file is damaged. I will try some things to get a new VM without errors.

Thanks for the help dietmar

dietmar · Mar 12, 2013

I also uploaded a fix (qemu-server_2.3-18_amd64.deb) so that we always detect those errors (instead of reporting successful backup).

nugzarg · Mar 12, 2013

qcow2 image is corrupt.

Code:

qemu-img check vm-120-disk-1.qcow2 
Leaked cluster 5013 refcount=1 reference=0
Leaked cluster 5014 refcount=1 reference=0
Leaked cluster 21903 refcount=1 reference=0
...

1027 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.

What now?

dietmar · Mar 12, 2013

nugzarg said:
What now?

Simply follow my suggestion?

nugzarg · Mar 12, 2013

Code:

qemu-img convert -p -f qcow2 vm-120-disk-1.qcow2 -O raw vm-120-disk-1.raw
qemu-img: error while reading sector 110780416: Input/output error

dietmar · Mar 12, 2013

nugzarg said:

So your image is also damaged (Seems qemu-img check does not show all errors).

nugzarg · Mar 13, 2013

As in my case, if "qemu-img check" reports errors like:

Code:

1027 leaked clusters were found on the image. This means waste of disk space, but no harm to data.

qcow2 image is damaged.

Image can be repaired with following steps:

1. Create new VM (for example VM 121).
2. Shutdwon production VM with demaged image (for example VM 120).
3. Copy image of VM 120 to VM 121 (overwrite image of new VM) with command:
cp /var/lib/vz/images/120/vm-120-disk-1.qcow2 /var/lib/vz/images/121/vm-121-disk-1.qcow2
4. run:
qemu-img check -r leaks /var/lib/vz/images/121/vm-121-disk-1.qcow2
5. Start new VM.
6. Fill up all partitions in new VM (overwrite free sectors with zeros):
http://tuxnetworks.blogspot.com/2011/08/shrink-kvm-disk-image.html
7. Shutdwon new VM.
8. Convert qcow2 image (run this):
qemu-img convert -p -f qcow2 /var/lib/vz/images/121/vm-121-disk-1.qcow2 -O qcow2 /var/lib/vz/images/120/vm-120-disk-1_new.qcow2
9. If convertation ends successful, make backup of old image
(mv /var/lib/vz/images/120/vm-120-disk-1.qcow2 /var/lib/vz/images/120/vm-120-disk-1_bak.qcow2) and then rename new
image with:
mv /var/lib/vz/images/120/vm-120-disk-1_new.qcow2 /var/lib/vz/images/120/vm-120-disk-1.qcow2
10. Start old (production) VM and test live backup. If VM and snapshot works, remove backup image:
rm /var/lib/vz/images/120/vm-120-disk-1_bak.qcow2

Regards,
Nugzar

JMoreno · May 14, 2013

Hi all,

I am experiencig exactly the same problem as you after moving to 2.3. I am attaching the vzdump log:

PHP:

INFO: starting new backup job: vzdump 1550 --remove 0 --mode stop --compress lzo --storage sv-NFS --node pmox11
INFO: Starting Backup of VM 1550 (qemu)
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/sv-NFS/dump/vzdump-qemu-1550-2013_05_14-08_12_58.vma.lzo'
INFO: starting kvm to execute backup task
INFO: started backup task 'e4c527c7-282a-4c0b-92ea-f4dadc8e5583'
INFO: status: 0% (162660352/751619276800), sparse 0% (11141120), duration 3, 54/50 MB/s
INFO: status: 1% (7830568960/751619276800), sparse 0% (6686650368), duration 43, 191/24 MB/s
INFO: status: 2% (15060828160/751619276800), sparse 1% (13916708864), duration 59, 451/0 MB/s
INFO: status: 3% (22771990528/751619276800), sparse 2% (21627666432), duration 76, 453/0 MB/s
INFO: status: 4% (30087184384/751619276800), sparse 3% (28942848000), duration 92, 457/0 MB/s
INFO: status: 5% (37815779328/751619276800), sparse 4% (36671238144), duration 109, 454/0 MB/s
INFO: status: 6% (45460029440/751619276800), sparse 5% (44315467776), duration 126, 449/0 MB/s
INFO: status: 7% (52836892672/751619276800), sparse 6% (51666464768), duration 143, 433/1 MB/s
INFO: status: 8% (60479832064/751619276800), sparse 7% (59309387776), duration 160, 449/0 MB/s
INFO: status: 9% (67669721088/751619276800), sparse 8% (66499264512), duration 176, 449/0 MB/s
INFO: status: 10% (75515953152/751619276800), sparse 9% (74319040512), duration 194, 435/1 MB/s
INFO: status: 11% (82784550912/751619276800), sparse 10% (81587625984), duration 210, 454/0 MB/s
INFO: status: 12% (90385416192/751619276800), sparse 11% (89188290560), duration 228, 422/0 MB/s
INFO: status: 13% (98034188288/751619276800), sparse 12% (96837042176), duration 245, 449/0 MB/s
INFO: status: 14% (105469771776/751619276800), sparse 13% (101310144512), duration 311, 112/44 MB/s
INFO: status: 15% (112771923968/751619276800), sparse 14% (106794164224), duration 365, 135/33 MB/s
INFO: status: 16% (120345067520/751619276800), sparse 14% (109380796416), duration 469, 72/47 MB/s
INFO: status: 17% (128021299200/751619276800), sparse 15% (117057011712), duration 486, 451/0 MB/s
INFO: status: 18% (135584612352/751619276800), sparse 16% (124500131840), duration 506, 378/6 MB/s
INFO: status: 19% (142856683520/751619276800), sparse 17% (131620876288), duration 526, 363/7 MB/s
INFO: status: 20% (150534225920/751619276800), sparse 18% (139249954816), duration 544, 426/2 MB/s
INFO: status: 21% (158164582400/751619276800), sparse 19% (146435358720), duration 569, 305/17 MB/s
INFO: status: 22% (165708693504/751619276800), sparse 20% (153878147072), duration 587, 419/5 MB/s
INFO: status: 23% (173190086656/751619276800), sparse 21% (161338171392), duration 604, 440/1 MB/s
INFO: status: 24% (180586217472/751619276800), sparse 22% (168680669184), duration 621, 435/3 MB/s
INFO: status: 25% (188155494400/751619276800), sparse 23% (176243597312), duration 638, 445/0 MB/s
INFO: status: 26% (195421011968/751619276800), sparse 24% (183509102592), duration 654, 454/0 MB/s
INFO: status: 27% (203136630784/751619276800), sparse 25% (191224692736), duration 671, 453/0 MB/s
INFO: status: 28% (210913329152/751619276800), sparse 26% (199001362432), duration 688, 457/0 MB/s
INFO: status: 29% (218209189888/751619276800), sparse 27% (206297210880), duration 704, 455/0 MB/s
INFO: status: 30% (225485783040/751619276800), sparse 28% (213573791744), duration 720, 454/0 MB/s
ERROR: backup_complete_cb -5
INFO: aborting backup job
INFO: stopping kvm after backup task
ERROR: Backup of VM 1550 failed - backup_complete_cb -5
INFO: Backup job finished with errors
TASK ERROR: job errors

I have several KVM machines (all qcow) and they all are backup correctly except for one (read log above). It always stops at 30%. My tests:

- From the ProxMox WUI, I tried vzdump with: SnapShot, Suspend and Stop with NO success.
- From Linux command-line, I tried vzdump with NO success.
- From Linux command-line, I copied the corresponding qcow file to a different folder with NO errors.
- From Linux command-line, I compressed with lzop with NO errors.

That makes me believe it is not a "physical" problem with the qcow file, to the extend that Linux can handle it normally. On the other side, it makes me feel it's a vzdump issue.

I would appreciate any help.

Thanks in advance.

dietmar · May 14, 2013

JMoreno said:
- From Linux command-line, I copied the corresponding qcow file to a different folder with NO errors.

A reasonable test is to use 'qemu-img convert ...' to copy the whole qcow2 file to another location. Does that work without any errors?

JMoreno · May 14, 2013

Thanks Dietmar.

After running:

Code:

 qemu-img check /var/lib/vz/images/1550/vm-1550-disk-1.qcow2

I got:

Code:

ERROR: cluster 211458634712447: copied flag must never be set for compressed clusters
Warning: cluster offset=0x120cce857f0000 is after the end of the image file, can't properly check refcounts.
ERROR: cluster 264071044498815: copied flag must never be set for compressed clusters
ERROR: invalid cluster offset=0x2bd4cd857f0000
ERROR: invalid cluster offset=0x2bd4cd85800000

4 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.

1 internal errors have occurred during the check.

An error has occurred during the check: Success
The check is not complete and may have missed error.

May you please help me in fixing this issue?

Thanks.

dietmar · May 14, 2013

JMoreno said:
May you please help me in fixing this issue?

Seems the file has some errors. Before you do anything, make a copy of the origial file as backup.

The you can try to repair the file with:

# qemu-img check -r all /var/lib/vz/images/1550/vm-1550-disk-1.qcow2

Or try to copy the whole file using "qemu-img convert ..."

# qemu-img convert /var/lib/vz/images/1550/vm-1550-disk-1.qcow2 /var/lib/vz/images/1550/vm-1550-new-disk-1.qcow2

KVM backup error since migrate to 2.3

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

We value your privacy