Another VZDump backup problem: Input/Output error

gkovacs

Renowned Member
Dec 22, 2008
512
50
93
Budapest, Hungary
We have 2 Proxmox hosts (kernel 2.6.24), each hosting several OpenVZ VMs. Host1 has an Adaptec RAID controller with 6 disks in RAID10, backuping over NFS, while Host2 is a single disk installation, backuping to a dedicated backup HDD. Most of our large VMs are on Host1, due to larger capacity.

The VM in question is called 102. When it was on Host2 (the single disk, local backup system), it backed up fine. Couple of days ago I migrated it to Host1 (the RAID system). Backup was fine for a day or two, then this started happening (30MB error log):

Code:
Aug 23 00:05:03 INFO: Starting Backup of VM 102 (openvz)
Aug 23 00:05:03 INFO: CTID 102 exist mounted running
Aug 23 00:05:03 INFO: status = CTID 102 exist mounted running
Aug 23 00:05:04 INFO: backup mode: snapshot
Aug 23 00:05:04 INFO: bandwidth limit: 10240 KB/s
Aug 23 00:05:04 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-proxmox-0')
Aug 23 00:05:09 INFO:   Logical volume "vzsnap-proxmox-0" created
Aug 23 00:05:12 INFO: creating archive '/mnt/pve/pimpnfs-daily/vzdump-openvz-102-2010_08_23-00_05_03.tar'
Aug 23 01:22:30 INFO: tar: ./var/clients/client7/web77/web/images/phocagallery/img_166.jpg: File shrank by 13613 bytes; padding with zeros
Aug 23 01:22:30 INFO: tar: ./var/clients/client7/web77/web/images/phocagallery/img_058.jpg: Warning: Read error at byte 0, while reading 9216 bytes: Input/output error
Aug 23 01:22:30 INFO: tar: ./var/clients/client7/web77/web/images/phocagallery/img_267.jpg: Warning: Read error at byte 0, while reading 7168 bytes: Input/output error
Aug 23 01:22:30 INFO: tar: ./var/clients/client7/web77/web/images/phocagallery/img_024.jpg: Warning: Read error at byte 0, while reading 3584 bytes: Input/output error
.....
Aug 23 01:22:40 INFO: tar: ./lib/libatm.so.1.0.0: Warning: Cannot stat: No such file or directory
Aug 23 01:22:40 INFO: tar: ./lib/libconsole.so.0.0.0: Warning: Cannot stat: No such file or directory
Aug 23 01:22:40 INFO: tar: ./lib/libcidn.so.1: Warning: Cannot stat: No such file or directory
Aug 23 01:22:40 INFO: tar: ./lib/libgpg-error.so.0: Warning: Cannot stat: No such file or directory
Aug 23 01:22:40 INFO: tar: ./media: Warning: Cannot stat: No such file or directory
Aug 23 01:22:40 INFO: Total bytes written: 33166540800 (31GiB, 8.1MiB/s)
Aug 23 01:22:40 INFO: archive file size: 30.89GB
Aug 23 01:22:40 INFO: delete old backup '/mnt/pve/nfs-daily/vzdump-openvz-102-2010_08_21-00_05_02.tar'
Aug 23 01:22:43 INFO:   Logical volume "vzsnap-proxmox-0" successfully removed
Aug 23 01:22:43 INFO: Finished Backup of VM 102 (01:17:40)
- Other VMs backup fine on Host1.
- The error is not related to destination of backup (NFS or local), since it happened before NFS backup was enabled.
- Again: VM 102 backups fine on Host2.
- VM 102 has the most files of all our VM's: WinRAR counts over 1 million when opening, but shows only 740k in INFO.
-
The errors start at the relative end of the backup process, since the error log has roughly 200 thousand lines (so that many files had a problem).
- Some errors are "Input/Output Error", but most are "No such file or directory".

It looks like it has something to do with the number/size of files on Host1 (many 30-50GB VE's with millions of files), since Host2 has much less and smaller VE's with much less files each.

Code:
host1:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/pve-root   20G  2.0G   17G  11% /
tmpfs                 3.9G     0  3.9G   0% /lib/init/rw
udev                   10M  2.7M  7.4M  27% /dev
tmpfs                 3.9G  4.0K  3.9G   1% /dev/shm
/dev/mapper/pve-data  660G  199G  461G  31% /var/lib/vz
/dev/sda1             504M   43M  436M   9% /boot
10.10.10.10:/d/backup/nfs/weekly
                      1.9T  905G  959G  49% /mnt/pve/nfs-weekly
10.10.10.10:/d/backup/nfs/daily
                      1.9T  905G  959G  49% /mnt/pve/nfs-daily
Code:
host2:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/pve-root   20G  1.4G   18G   8% /
tmpfs                 3.9G     0  3.9G   0% /lib/init/rw
udev                   10M  2.7M  7.4M  27% /dev
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/mapper/pve-data  115G   20G   95G  18% /var/lib/vz
/dev/sda1             504M   69M  410M  15% /boot
/dev/sdb1             147G   39G  101G  28% /backup

Is it possible that the default LVM snapshot size is not enough when there are that many files?
Any other ideas?
 
Last edited:
Is it possible that the default LVM snapshot size is not enough when there are that many files?

Snapshot size depends on the number of changed blocks (not file count).
But is is likely that there are many changes on such large VM.

So simply increase snapshot size and test again.
 
We are experiencing the very same problem, thus we created a /etc/vzdump.conf containing the following line:

size: 2048 (first try)

then

size: 4096 (second try)

In both cases vzdump yileds errors such as:

INFO: tar: ./etc/hostname: Warning: Cannot stat: No such file or directory

The node is running 8 containers for a total of about 1,1TB of space. Errors appears also in the first dump, a container of less than 1TB with few changes per day on the file system.

Any suggestion greatly appreciated.
Andrea
 
The node is running 8 containers for a total of about 1,1TB of space. Errors appears also in the first dump, a container of less than 1TB with few changes per day on the file system.

All containers are on the same LV, so you need enough space to store changes of all containers during backup.

Any suggestion greatly appreciated.
Andrea

What is the output of

# pveperf

(please run when there is no load on the servers)

And what version do you run

# pveversion -v
 
Sorry, first container is 1GB, not 1TB... LV has 300GB space available.

# pveperf
CPU BOGOMIPS: 9199.52
REGEX/SECOND: 425544
HD SIZE: 18.21 GB (/dev/mapper/pve-root)
BUFFERED READS: 63.74 MB/sec
AVERAGE SEEK TIME: 10.14 ms
FSYNCS/SECOND: 705.90
DNS EXT: 58.48 ms
DNS INT: 1.14 ms

# pveversion -v
pve-manager: 1.6-2 (pve-manager/1.6/5087)
running kernel: 2.6.24-11-pve
proxmox-ve-2.6.24: 1.5-23
pve-kernel-2.6.24-11-pve: 2.6.24-23
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-18
pve-firmware: 1.0-7
libpve-storage-perl: 1.0-13
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-7
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.12.5-1
 
Last edited:
The thing that ultimately solved all backup problems for us was adding the following to /etc/updatedb.conf on the Proxmox host:

PRUNEPATHS="/tmp /var/spool /media /mnt"
Since this modification, we have not had any backup problems.
We have also increased snapshot size to 2GB, but that turned out to be irrelevant.
 
Last edited:
Please try to observe the size of the snapshot volume during backup (using 'lvs'). Does it still run out of space?
 
The thing that ultimately solved all backup problems for us was adding the following to /etc/updatedb.conf on the Proxmox host:

Since this modification, we have not had any backup problems.
We have also increased snapshot size to 2GB, but that turned out to be irrelevant.

It seems to have resolved my backup issues as well. It happened very often that one VM (big one in fact) was not ending backup. I had to kill the process, then vzdump resumes to the next VM. Since I made the change three days ago, I did not see any failure in my backups.

I don't know how you came to the conclusion that it updatedb that was causing the failure, but it is a fine catch. If I understand this well, the failure occurs because updatedb touch the backup files in /tmp, as it occurs at 6h25 each day for me.

Does this change in updatedb.conf should not be made default on Proxmox installs ?
 
Dietmar,

I had another backup "hang" yesterday, that is a dat file is created, but the tar archive is never created. And I did add in updatedb.conf '/tmp' in prunepaths. So I think it is anyway not the solution. It hang on a rather big VM, three disks of 40 GB, 5 GB, and 6 GB. It is almost always the same VM. Today, the same backup completed successfully...
I too don't think it is a good thing to add '/tmp' permanently in prunepaths.

I did not see evident errors in logs, but perhaps I should have a closer look...

Alain
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!