Backup issues with proxmox 2.3?

scontin

Renowned Member
Feb 24, 2009
40
1
73
Hi all...
I read about some problems with backup in Proxmox 2.3...

It's a frequent issue?

How many of you are suffering this trouble?

No problem for me during last week...... I'm only lucky?

Thanks....
 
Which issue do you mean? Give details.

A few comments:

A side effect of the new KVM backup is also doing checks on your virtual disks.
This means, if you have come corrupted virtual disks, you see it now. Before 2.3, most did not realized this. A lot of people still run on single and unreliable disk drives (and mdraid) and disks are dying slowly these days quite frequently. As a consequence, data gots corrupted and you did not get it. Also, caches on hard disk are lost if you got power failure.

Use a reliable hardware raid controller with BBU and cache and you are on a good way (and don´t forget to turn off hard drive cache)

Some did forget to poweroff/poweron the VM´s, always follow our upgrade guides.

Overall, the new backup works great and so far there is no open bug in our bugzilla. Our user base is currently greater than 35.000 servers, so just a very small number of reported issues so far, see above.
 
Since I updated Proxmox to 2.3 / pve-kernel-2.6.32-18 I have errors on my CT snapshot backups.
Not on all of them just a few and manual snapshots work fine on them.
I just upgraded to pve-kernel-2.6.32-19 and the issue persists.
I already increased the size to 4096 in /etc/vzdump.conf

Never had an issue before the upgrade. Error message is:

command '(cd /mnt/vzsnap0/private/109;find . '(' -regex '^\.$' ')' -o '(' -type 's' -prune ')' -o -print0|sed 's/\\/\\\\/g'|tar cpf - --totals --sparse --numeric-owner --no-recursion --one-file-system --null -T -|gzip) >/media/backup/dump/vzdump-openvz-109-2013_03_22-02_31_27.tar.dat' failed: exit code 2

and a lot of Read error at byte 0, while reading 1024 bytes: Input/output error messages.

Thanks,
Sven
 
Since upgrading to 2.3 about 10% of our backups fail on any given night (there doesn't seem to be any pattern to which ones fail). They usually fail with an error like this:

ERROR: Backup of VM 193 failed - got timeout

Manual backups the next morning always seem to work.
 
Here's one (complete .log file):

Mar 15 03:50:03 INFO: Starting Backup of VM 109 (qemu)Mar 15 03:50:03 INFO: status = running
Mar 15 03:50:06 INFO: backup mode: snapshot
Mar 15 03:50:06 INFO: ionice priority: 7
Mar 15 03:50:06 INFO: creating archive '/mnt/pve/adhoc/dump/vzdump-qemu-109-2013_03_15-03_50_03.vma.lzo'
Mar 15 03:50:09 ERROR: got timeout
Mar 15 03:50:09 INFO: aborting backup job
Mar 15 03:50:38 ERROR: Backup of VM 109 failed - got timeout
 
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-18-pve
proxmox-ve-2.6.32: 2.3-93
pve-kernel-2.6.32-19-pve: 2.6.32-93
pve-kernel-2.6.32-18-pve: 2.6.32-88
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-18
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-6
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-8
ksm-control-daemon: 1.1-1
 
Here is another slightly different log:

Mar 22 03:27:28 INFO: Starting Backup of VM 186 (qemu)Mar 22 03:27:28 INFO: status = running
Mar 22 03:27:28 INFO: backup mode: snapshot
Mar 22 03:27:28 INFO: ionice priority: 7
Mar 22 03:27:28 INFO: creating archive '/mnt/pve/proxsnaps2/dump/vzdump-qemu-186-2013_03_22-03_27_28.vma.lzo'
Mar 22 03:27:30 INFO: started backup task '2c1a4b8e-f9e0-43c9-a0a2-228974caa322'
Mar 22 03:27:33 INFO: status: 0% (25165824/3221225472), sparse 0% (23953408), duration 3, 8/0 MB/s
Mar 22 05:20:34 INFO: status: 1% (32636928/3221225472), sparse 0% (26722304), duration 6784, 0/0 MB/s
Mar 23 04:56:17 ERROR: VM 186 qmp command 'query-backup' failed - got timeout
Mar 23 04:56:17 INFO: aborting backup job
Mar 23 04:56:17 ERROR: Backup of VM 186 failed - VM 186 qmp command 'query-backup' failed - got timeout
 
Only the ones that fail. Here are a few of the transfer times from last night...

vzdump-qemu-160-2013_03_23-02_49_40.log:Mar 23 03:18:38 INFO: transferred 21474 MB in 1737 seconds (12 MB/s)vzdump-qemu-161-2013_03_23-03_18_42.log:Mar 23 03:29:36 INFO: transferred 19327 MB in 650 seconds (29 MB/s)
vzdump-qemu-162-2013_03_23-01_37_21.log:Mar 23 02:11:22 INFO: transferred 21474 MB in 2030 seconds (10 MB/s)
vzdump-qemu-163-2013_03_23-06_32_28.log:Mar 23 06:37:51 INFO: transferred 17611 MB in 323 seconds (54 MB/s)
vzdump-qemu-164-2013_03_23-03_29_42.log:Mar 23 03:44:18 INFO: transferred 21474 MB in 866 seconds (24 MB/s)
vzdump-qemu-165-2013_03_23-02_11_30.log:Mar 23 02:17:09 INFO: transferred 8589 MB in 333 seconds (25 MB/s)
vzdump-qemu-166-2013_03_23-06_37_55.log:Mar 23 06:50:55 INFO: transferred 10737 MB in 778 seconds (13 MB/s)
vzdump-qemu-167-2013_03_23-06_50_57.log:Mar 23 07:02:26 INFO: transferred 10200 MB in 688 seconds (14 MB/s)
vzdump-qemu-168-2013_03_23-01_04_58.log:Mar 23 01:35:24 INFO: transferred 17179 MB in 1825 seconds (9 MB/s)
vzdump-qemu-169-2013_03_23-02_17_12.log:Mar 23 02:27:23 INFO: transferred 8589 MB in 608 seconds (14 MB/s)

Part of the reason it's not faster is that we have 6 Proxmox 2 nodes backing up at the same time (along with 2 Proxmox 1.9 nodes).
 
We got this one last night:

Mar 25 01:12:39 INFO: Starting Backup of VM 188 (qemu)Mar 25 01:12:39 INFO: status = running
Mar 25 01:12:40 INFO: backup mode: snapshot
Mar 25 01:12:40 INFO: ionice priority: 7
Mar 25 01:12:40 INFO: creating archive '/mnt/pve/proxsnaps2/dump/vzdump-qemu-188-2013_03_25-01_12_39.vma.lzo'
Mar 25 01:12:43 ERROR: got timeout
Mar 25 01:12:43 INFO: aborting backup job
Mar 25 01:12:48 ERROR: Backup of VM 188 failed - got timeout

The load average was just under 1.2 (that blade has two 4 5148 Xeon processors).
 
Hello all !

This is very strange because something is wrong with this new version of proxmox.
Before this new version, all backup worked perfectly.
Now, i've got the same problem as Cayuga :

When i make a manual backup, sometimes, this error appears :

INFO: starting new backup job: vzdump 103 --remove 0 --mode snapshot --compress gzip --storage backups --node proxmox1
INFO: Starting Backup of VM 103 (qemu)
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/backups/dump/vzdump-qemu-103-2013_04_11-09_14_23.vma.gz'
ERROR: got timeout
INFO: aborting backup job
ERROR: Backup of VM 103 failed - got timeout
INFO: Backup job finished with errors
TASK ERROR: job errors

And sometimes it's works !

INFO: starting new backup job: vzdump 103 --remove 0 --mode snapshot --compress gzip --storage backups --node proxmox1
INFO: Starting Backup of VM 103 (qemu)
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/backups/dump/vzdump-qemu-103-2013_04_11-09_22_43.vma.gz'
INFO: started backup task '1e2fca71-47e3-48be-96eb-62a3e73ab420'
INFO: status: 0% (72220672/85899345920), sparse 0% (33574912), duration 3, 24/12 MB/s
INFO: status: 1% (870449152/85899345920), sparse 0% (416231424), duration 34, 25/13 MB/s
INFO: status: 2% (1725693952/85899345920), sparse 0% (469909504), duration 91, 15/14 MB/s
INFO: status: 3% (2599944192/85899345920), sparse 0% (606945280), duration 141, 17/14 MB/s
INFO: status: 4% (3443785728/85899345920), sparse 0% (613629952), duration 189, 17/17 MB/s
INFO: status: 5% (4316921856/85899345920), sparse 0% (630640640), duration 233, 19/19 MB/s
INFO: status: 6% (5161877504/85899345920), sparse 0% (767893504), duration 273, 21/17 MB/s
INFO: status: 7% (6013321216/85899345920), sparse 0% (830566400), duration 330, 14/13 MB/s
INFO: status: 8% (6879969280/85899345920), sparse 1% (987348992), duration 364, 25/20 MB/s
INFO: status: 9% (7731412992/85899345920), sparse 1% (992608256), duration 415, 16/16 MB/s
INFO: status: 10% (8596750336/85899345920), sparse 1% (997015552), duration 469, 16/15 MB/s
INFO: status: 11% (9453305856/85899345920), sparse 1% (1133707264), duration 514, 19/15 MB/s
INFO: status: 12% (10319953920/85899345920), sparse 1% (1133969408), duration 553, 22/22 MB/s
INFO: status: 13% (11182800896/85899345920), sparse 1% (1295372288), duration 597, 19/15 MB/s
INFO: status: 14% (12838109184/85899345920), sparse 3% (2871119872), duration 607, 165/7 MB/s
INFO: status: 20% (17599037440/85899345920), sparse 8% (7597400064), duration 610, 1586/11 MB/s
INFO: status: 22% (19530842112/85899345920), sparse 10% (9440317440), duration 617, 275/12 MB/s
INFO: status: 23% (19788464128/85899345920), sparse 11% (9625419776), duration 620, 85/24 MB/s
INFO: status: 25% (21736390656/85899345920), sparse 13% (11546398720), duration 624, 486/6 MB/s
INFO: status: 27% (23683465216/85899345920), sparse 15% (13445898240), duration 627, 649/15 MB/s
INFO: status: 30% (26189496320/85899345920), sparse 18% (15901470720), duration 630, 835/16 MB/s
INFO: status: 33% (28736552960/85899345920), sparse 21% (18424328192), duration 633, 849/8 MB/s
INFO: status: 37% (32632340480/85899345920), sparse 25% (22272999424), duration 636, 1298/15 MB/s
INFO: status: 40% (34776154112/85899345920), sparse 28% (24385843200), duration 639, 714/10 MB/s
INFO: status: 42% (36927569920/85899345920), sparse 30% (26466701312), duration 643, 537/17 MB/s
INFO: status: 43% (36976984064/85899345920), sparse 30% (26473779200), duration 646, 16/14 MB/s
INFO: status: 45% (39094190080/85899345920), sparse 33% (28541853696), duration 649, 705/16 MB/s
INFO: status: 46% (39519387648/85899345920), sparse 33% (28561670144), duration 673, 17/16 MB/s
INFO: status: 47% (40382758912/85899345920), sparse 33% (28684378112), duration 733, 14/12 MB/s
INFO: status: 48% (41234202624/85899345920), sparse 33% (28862681088), duration 788, 15/12 MB/s
INFO: status: 49% (42100064256/85899345920), sparse 33% (28868403200), duration 847, 14/14 MB/s
INFO: status: 50% (42967498752/85899345920), sparse 33% (28871475200), duration 903, 15/15 MB/s
INFO: status: 51% (43818942464/85899345920), sparse 33% (29003927552), duration 954, 16/14 MB/s
INFO: status: 52% (44669468672/85899345920), sparse 33% (29009190912), duration 1006, 16/16 MB/s
INFO: status: 53% (45533233152/85899345920), sparse 33% (29140414464), duration 1055, 17/14 MB/s
INFO: status: 54% (46399881216/85899345920), sparse 33% (29160984576), duration 1104, 17/17 MB/s
INFO: status: 55% (47255126016/85899345920), sparse 33% (29161488384), duration 1143, 21/21 MB/s
INFO: status: 56% (48106569728/85899345920), sparse 34% (29290717184), duration 1188, 18/16 MB/s
INFO: status: 57% (48969416704/85899345920), sparse 34% (29295337472), duration 1249, 14/14 MB/s
INFO: status: 58% (50171609088/85899345920), sparse 35% (30437126144), duration 1254, 240/12 MB/s
INFO: status: 62% (54108487680/85899345920), sparse 39% (34306777088), duration 1257, 1312/22 MB/s
INFO: status: 65% (56093638656/85899345920), sparse 42% (36266090496), duration 1260, 661/8 MB/s
INFO: status: 66% (56741986304/85899345920), sparse 42% (36390076416), duration 1274, 46/37 MB/s
INFO: status: 68% (58917322752/85899345920), sparse 44% (38234574848), duration 1287, 167/25 MB/s
INFO: status: 75% (64840073216/85899345920), sparse 51% (44132986880), duration 1290, 1974/8 MB/s
INFO: status: 79% (67908993024/85899345920), sparse 54% (47191646208), duration 1293, 1022/3 MB/s
INFO: status: 82% (71285604352/85899345920), sparse 58% (50532331520), duration 1296, 1125/11 MB/s
INFO: status: 83% (71327416320/85899345920), sparse 58% (50532343808), duration 1299, 13/13 MB/s
INFO: status: 84% (72306262016/85899345920), sparse 59% (51455795200), duration 1304, 195/11 MB/s
INFO: status: 89% (76856098816/85899345920), sparse 65% (55981826048), duration 1307, 1516/7 MB/s
INFO: status: 98% (84806074368/85899345920), sparse 74% (63907758080), duration 1310, 2649/8 MB/s
INFO: status: 100% (85899345920/85899345920), sparse 75% (64842768384), duration 1319, 121/17 MB/s
INFO: transferred 85899 MB in 1319 seconds (65 MB/s)
INFO: archive file size: 7.37GB
INFO: Finished Backup of VM 103 (00:22:02)
INFO: Backup job finished successfully
TASK OK

I don't know what can i do to resolve this problem

Many thanks in advance for your help !
 
got the same here, i run two backups of two vm on the same nas as destination. last night one ran successfully while one failed with

INFO: Starting Backup of VM 101 (qemu)
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/nas/dump/vzdump-qemu-101-2013_04_10-01_00_01.vma.lzo'
ERROR: got timeout
INFO: aborting backup job
ERROR: Backup of VM 101 failed - got timeout
INFO: Backup job finished with errors

the previous day both backups ran successfully.
 
I think, i've resolved this problem by doing a manual backup of my VM.
At the begining, i've tried 2 or 3 times to backup manually my VM without success (ERROR: got timeout)
But i've tried again and again but finally my backup has started.
Now, everyday my backup is correct.

I think, you should try to do a manual backup and your problem would be resolved.
 
Finally, it's not resolved.
Doing a manual backup has resolved it for a short time : now i've got this message everyday even after doing a manual backup.
There is no other topics on the web and i've no idea to focus my research.

Any helps would be very appreciated !
Thanks in advance !
 
Realize we are on an old version and should upgrade, I am planning on upgrading soon, but thought I would contribute my information in case it helps someone else. I also just recently started having this problem even though my backups have been reliable since I upgraded to 2.3 at the beginning of the year.

Historically I have used my iSCSI connector for the live disks and used NFS for backup. A couple weeks ago I ran low on storage on my iSCSI connector so I spun up a couple VMs using the NFS connector which is when I started getting these:

INFO: starting new backup job: vzdump 220 --remove 0 --mode snapshot --compress lzo --storage NetappNFS --node proxmox2
INFO: Starting Backup of VM 220 (qemu)
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/NetappNFS/dump/vzdump-qemu-220-2013_09_16-07_40_05.vma.lzo'
ERROR: got timeout
INFO: aborting backup job
ERROR: Backup of VM 220 failed - got timeout
INFO: Backup job finished with errors
TASK ERROR: job errors

The above error is for a server using iSCSI as its disk storage backing up to the NFS dump. It has a 32G disk and I had to manually run the backup two more times for it to actually start. The error points to a timeout, could it be that now that I have disks running on the NFS it is just slow enough to respond that it triggers a timeout? Perhaps there is a timeout value that could be adjusted?

I get alerts for failed backups so for now I manually run the failed ones again and they have been working after the first or second try.


 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!