We have 2 Proxmox hosts (kernel 2.6.24), each hosting several OpenVZ VMs. Host1 has an Adaptec RAID controller with 6 disks in RAID10, backuping over NFS, while Host2 is a single disk installation, backuping to a dedicated backup HDD. Most of our large VMs are on Host1, due to larger capacity.
The VM in question is called 102. When it was on Host2 (the single disk, local backup system), it backed up fine. Couple of days ago I migrated it to Host1 (the RAID system). Backup was fine for a day or two, then this started happening (30MB error log):
- Other VMs backup fine on Host1.
- The error is not related to destination of backup (NFS or local), since it happened before NFS backup was enabled.
- Again: VM 102 backups fine on Host2.
- VM 102 has the most files of all our VM's: WinRAR counts over 1 million when opening, but shows only 740k in INFO.
- The errors start at the relative end of the backup process, since the error log has roughly 200 thousand lines (so that many files had a problem).
- Some errors are "Input/Output Error", but most are "No such file or directory".
It looks like it has something to do with the number/size of files on Host1 (many 30-50GB VE's with millions of files), since Host2 has much less and smaller VE's with much less files each.
Is it possible that the default LVM snapshot size is not enough when there are that many files?
Any other ideas?
The VM in question is called 102. When it was on Host2 (the single disk, local backup system), it backed up fine. Couple of days ago I migrated it to Host1 (the RAID system). Backup was fine for a day or two, then this started happening (30MB error log):
Code:
Aug 23 00:05:03 INFO: Starting Backup of VM 102 (openvz)
Aug 23 00:05:03 INFO: CTID 102 exist mounted running
Aug 23 00:05:03 INFO: status = CTID 102 exist mounted running
Aug 23 00:05:04 INFO: backup mode: snapshot
Aug 23 00:05:04 INFO: bandwidth limit: 10240 KB/s
Aug 23 00:05:04 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-proxmox-0')
Aug 23 00:05:09 INFO: Logical volume "vzsnap-proxmox-0" created
Aug 23 00:05:12 INFO: creating archive '/mnt/pve/pimpnfs-daily/vzdump-openvz-102-2010_08_23-00_05_03.tar'
Aug 23 01:22:30 INFO: tar: ./var/clients/client7/web77/web/images/phocagallery/img_166.jpg: File shrank by 13613 bytes; padding with zeros
Aug 23 01:22:30 INFO: tar: ./var/clients/client7/web77/web/images/phocagallery/img_058.jpg: Warning: Read error at byte 0, while reading 9216 bytes: Input/output error
Aug 23 01:22:30 INFO: tar: ./var/clients/client7/web77/web/images/phocagallery/img_267.jpg: Warning: Read error at byte 0, while reading 7168 bytes: Input/output error
Aug 23 01:22:30 INFO: tar: ./var/clients/client7/web77/web/images/phocagallery/img_024.jpg: Warning: Read error at byte 0, while reading 3584 bytes: Input/output error
.....
Aug 23 01:22:40 INFO: tar: ./lib/libatm.so.1.0.0: Warning: Cannot stat: No such file or directory
Aug 23 01:22:40 INFO: tar: ./lib/libconsole.so.0.0.0: Warning: Cannot stat: No such file or directory
Aug 23 01:22:40 INFO: tar: ./lib/libcidn.so.1: Warning: Cannot stat: No such file or directory
Aug 23 01:22:40 INFO: tar: ./lib/libgpg-error.so.0: Warning: Cannot stat: No such file or directory
Aug 23 01:22:40 INFO: tar: ./media: Warning: Cannot stat: No such file or directory
Aug 23 01:22:40 INFO: Total bytes written: 33166540800 (31GiB, 8.1MiB/s)
Aug 23 01:22:40 INFO: archive file size: 30.89GB
Aug 23 01:22:40 INFO: delete old backup '/mnt/pve/nfs-daily/vzdump-openvz-102-2010_08_21-00_05_02.tar'
Aug 23 01:22:43 INFO: Logical volume "vzsnap-proxmox-0" successfully removed
Aug 23 01:22:43 INFO: Finished Backup of VM 102 (01:17:40)
- The error is not related to destination of backup (NFS or local), since it happened before NFS backup was enabled.
- Again: VM 102 backups fine on Host2.
- VM 102 has the most files of all our VM's: WinRAR counts over 1 million when opening, but shows only 740k in INFO.
- The errors start at the relative end of the backup process, since the error log has roughly 200 thousand lines (so that many files had a problem).
- Some errors are "Input/Output Error", but most are "No such file or directory".
It looks like it has something to do with the number/size of files on Host1 (many 30-50GB VE's with millions of files), since Host2 has much less and smaller VE's with much less files each.
Code:
host1:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/pve-root 20G 2.0G 17G 11% /
tmpfs 3.9G 0 3.9G 0% /lib/init/rw
udev 10M 2.7M 7.4M 27% /dev
tmpfs 3.9G 4.0K 3.9G 1% /dev/shm
/dev/mapper/pve-data 660G 199G 461G 31% /var/lib/vz
/dev/sda1 504M 43M 436M 9% /boot
10.10.10.10:/d/backup/nfs/weekly
1.9T 905G 959G 49% /mnt/pve/nfs-weekly
10.10.10.10:/d/backup/nfs/daily
1.9T 905G 959G 49% /mnt/pve/nfs-daily
Code:
host2:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/pve-root 20G 1.4G 18G 8% /
tmpfs 3.9G 0 3.9G 0% /lib/init/rw
udev 10M 2.7M 7.4M 27% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/mapper/pve-data 115G 20G 95G 18% /var/lib/vz
/dev/sda1 504M 69M 410M 15% /boot
/dev/sdb1 147G 39G 101G 28% /backup
Is it possible that the default LVM snapshot size is not enough when there are that many files?
Any other ideas?
Last edited: