Problem during CT backup with i/o error on lvm snapshot

stef1777 · Oct 23, 2012

Hello!

I'm pretty lost. If someone have an idea on this problem?

I've a Proxmox VE 1.8 cluster of 3 nodes (not updated, sorry). The servers are HP DL360 G7 with SAS disk.

On one node ONLY, the openvz CT backup generate errors during scheduled backup (each time). The openvz CT is 40Go.

The VM backup log generate megs output like this below:

Oct 23 13:16:02 INFO: tar: ./lib32/libresolv-2.11.3.so: Warning: Cannot stat: Input/output error
Oct 23 13:16:02 INFO: tar: ./lib32/libanl.so.1: Warning: Cannot stat: Input/output error
Oct 23 13:16:02 INFO: tar: ./lib32/ld-2.11.3.so: Warning: Cannot stat: Input/output error
Oct 23 13:16:02 INFO: tar: ./lib32/librt.so.1: Warning: Cannot stat: Input/output error
Oct 23 13:16:02 INFO: tar: ./lib32/libnss_dns-2.11.3.so: Warning: Cannot stat: Input/output error
Oct 23 13:16:02 INFO: tar: ./lib32/libnss_nis-2.11.3.so: Warning: Cannot stat: Input/output error
Oct 23 13:16:02 INFO: tar: ./lib32/libmemusage.so: Warning: Cannot stat: Input/output error
Oct 23 13:16:02 INFO: tar: ./lib32/libSegFault.so: Warning: Cannot stat: Input/output error
Oct 23 13:16:02 INFO: tar: ./lib32/libthread_db-1.0.so: Warning: Cannot stat: Input/output error
Oct 23 13:16:02 INFO: tar: ./lib32/libnss_hesiod-2.11.3.so: Warning: Cannot stat: Input/output error
Oct 23 13:16:02 INFO: tar: ./lib32/libnss_nisplus.so.2: Warning: Cannot stat: Input/output error
Oct 23 13:16:02 INFO: tar: ./lib32/libcrypt-2.11.3.so: Warning: Cannot stat: Input/output error
Oct 23 13:16:02 INFO: tar: ./proc/: Warning: Cannot savedir: Input/output error
Oct 23 13:16:02 INFO: tar: ./proc: Warning: Cannot close: Bad file descriptor
Oct 23 13:16:02 INFO: Total bytes written: 40080384000 (38GiB, 7.1MiB/s)
Oct 23 13:16:02 INFO: archive file size: 25.95GB
Oct 23 13:16:02 INFO: delete old backup '/mnt/pve/Backup-VZDump/vzdump-openvz-115-2012_10_23-06_12_27.tgz'
Oct 23 13:17:07 INFO: Logical volume "vzsnap-mama-vs004-0" successfully removed
Oct 23 13:17:07 INFO: Finished Backup of VM 115 (01:45:05)

I checked file system with fsck /dev/mapper/pve-data and files seems ok. I can access the files listed in the log and copy them using rsync on another system.

I've found this in /var/logs/syslog:

Oct 23 13:15:43 cerimes-vs004 kernel: device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.
Oct 23 13:15:45 cerimes-vs004 kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2624111 offset 0
Oct 23 13:15:45 cerimes-vs004 kernel: Buffer I/O error on device dm-3, logical block 0
Oct 23 13:15:45 cerimes-vs004 kernel: lost page write due to I/O error on dm-3
Oct 23 13:15:45 cerimes-vs004 kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2624111 offset 0
Oct 23 13:15:45 cerimes-vs004 kernel: ------------[ cut here ]------------
Oct 23 13:15:45 cerimes-vs004 kernel: WARNING: at fs/buffer.c:1164 mark_buffer_dirty+0x23/0x80()
Oct 23 13:15:45 cerimes-vs004 kernel: Hardware name: ProLiant DL360 G7
Oct 23 13:15:45 cerimes-vs004 kernel: Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc kvm_intel kvm vzethdev vznetdev simfs vzrst vzcpt vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_tcpudp xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables x_tables vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp snd_pcm snd_timer snd soundcore snd_page_alloc psmouse evdev pcspkr serio_raw joydev hpilo container power_meter button processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot usbhid hid ata_piix ehci_hcd ata_generic uhci_hcd libata usbcore nls_base bnx2 cciss thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Oct 23 13:15:45 cerimes-vs004 kernel: Pid: 2483, comm: tar Not tainted 2.6.32-4-pve #1
Oct 23 13:15:45 cerimes-vs004 kernel: Call Trace:
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff8111149d>] ? mark_buffer_dirty+0x23/0x80
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff8111149d>] ? mark_buffer_dirty+0x23/0x80
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff8104e21c>] ? warn_slowpath_common+0x77/0xa3
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff8111149d>] ? mark_buffer_dirty+0x23/0x80
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffffa01352de>] ? ext3_commit_super+0x4f/0x6f [ext3]
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffffa0136b55>] ? ext3_handle_error+0x83/0xaa [ext3]
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffffa0136c85>] ? ext3_error+0x83/0x90 [ext3]
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff81110a0e>] ? submit_bh+0x11c/0x123
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff811120ae>] ? ll_rw_block+0xb4/0xf8
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffffa0133119>] ? ext3_find_entry+0x3e1/0x560 [ext3]
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff81073da6>] ? charge_dcache+0x61/0xb9
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffffa0133ae2>] ? ext3_lookup+0x30/0xe4 [ext3]
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff810f9412>] ? do_lookup+0xf1/0x178
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff810f9eab>] ? __link_path_walk+0x689/0x811
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff810fa1bb>] ? path_walk+0x44/0x85
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff810fb4db>] ? do_path_lookup+0x20/0x77
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff810fc81f>] ? user_path_at+0x48/0x79
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff81066a16>] ? autoremove_wake_function+0x0/0x2e
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff81073d07>] ? do_uncharge_dcache+0x3d/0x51
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff810f4d50>] ? vfs_fstatat+0x2c/0x57
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff810f4dd1>] ? sys_newlstat+0x11/0x30
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff810f1ee8>] ? vfs_write+0xcd/0x102
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff810f2030>] ? sys_write+0x49/0xc1
Oct 23 13:15:45 cerimes-vs004 kernel: [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
Oct 23 13:15:45 cerimes-vs004 kernel: ---[ end trace 4c43195544452298 ]---
Oct 23 13:15:45 cerimes-vs004 kernel: Buffer I/O error on device dm-3, logical block 0
Oct 23 13:15:45 cerimes-vs004 kernel: lost page write due to I/O error on dm-3
Oct 23 13:15:45 cerimes-vs004 kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #2624111 offset 0
Oct 23 13:15:45 cerimes-vs004 kernel: Buffer I/O error on device dm-3, logical block 0
Oct 23 13:15:45 cerimes-vs004 kernel: lost page write due to I/O error on dm-3

# pveversion -v
pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.8-33
pve-kernel-2.6.32-3-pve: 2.6.32-13
pve-kernel-2.6.32-4-pve: 2.6.32-33
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.28-1pve1
vzdump: 1.2-14
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.1-1
ksm-control-daemon: 1.0-6

tom · Oct 23, 2012

you run a VERY outdated system so you should upgrade. if you cannot move to 2.x, move at least to latest 1.9.

stef1777 · Oct 23, 2012

tom said:
you run a VERY outdated system so you should upgrade. if you cannot move to 2.x, move at least to latest 1.9.

Hi Tom!

Yes, I know. My others servers are running 2.1 but for the 3 ones, I can't migrate before January for various reason.

Do you have an idea for my problem?

dietmar · Oct 23, 2012

Most likely you run out of snapshot space. Try to increase snapshot size in /etc/vzdump.conf

stef1777 · Oct 23, 2012

dietmar said:
Most likely you run out of snapshot space. Try to increase snapshot size in /etc/vzdump.conf

/etc/vzdump.conf doesn't exist for PVE 1.8?

stef1777 · Oct 23, 2012

stef1777 said:
/etc/vzdump.conf doesn't exist for PVE 1.8?

Sorry,

May I create it if using pve 1.8?

With:
/etc/vzdump.conf

size:8192

tom · Oct 23, 2012

yes, create the file but monitor if your snapshot size during backup (lvdisplay)

stef1777 · Oct 23, 2012

Thanks Tom.

I'll try it.

stef1777 · Oct 24, 2012

Hello!

Unfortunately, it doesn't work.

Oct 23 19:30:02 cerimes-vs004 vzdump[19427]: INFO: Starting Backup of VM 115 (openvz)
Oct 23 19:30:03 cerimes-vs004 vzdump[19427]: ERROR: Backup of VM 115 failed - command 'lvcreate --size 8192M --snapshot --name vzsnap-cerimes-vs004-0 /dev/pve/data' failed with exit code 5

I tried with 6144, same error.

Oct 23 20:00:02 cerimes-vs004 vzdump[27263]: INFO: Starting Backup of VM 115 (openvz)
Oct 23 20:00:02 cerimes-vs004 vzdump[27263]: ERROR: Backup of VM 115 failed - command 'lvcreate --size 6144M --snapshot --name vzsnap-cerimes-vs004-0 /dev/pve/data' failed with exit code 5

I made another test but this time without "compression tgz" and with default value for vzdump. This time, all work fine. The backup is working and no i/o error.

I can stay as this but I try to figure the problem because on the others servers, compression is working well with biggest CT (65Go).

I loaded for testing the CT tar backup on a pve 2 node. The CT backup is working. The tgz backup was not working with corrupted and missing files.

tom · Oct 24, 2012

did you monitor your snapshot size during backup (lvdisplay)?

and do you have enough free space for the specified snapshots?

output of:

> pvs

stef1777 · Oct 24, 2012

No, I don't check snapshot size during tgz backup as backup aborted due to the failed snapshot creation. I'll try again later tgz with default value and will check snapshot size.

Current disk usage:

vs004:~# pvs
PV VG Fmt Attr PSize PFree
/dev/block/104:2 pve lvm2 a- 558.23G 4.00G

vs004:~# pvdisplay
--- Physical volume ---
PV Name /dev/block/104:2
VG Name pve
PV Size 558.23 GB / not usable 288.00 KB
Allocatable yes
PE Size (KByte) 4096
Total PE 142907
Free PE 1023
Allocated PE 141884
PV UUID DGa5zL-VBII-zgEW-WfYB-OMVm-rT4u-rnKKbm

vs004:~# lvdisplay
--- Logical volume ---
LV Name /dev/pve/swap
VG Name pve
LV UUID Urc6h0-dlq5-sw6W-qQzk-5jE1-zybJ-ixrsKs
LV Write Access read/write
LV Status available
# open 1
LV Size 23.00 GB
Current LE 5888
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:0

--- Logical volume ---
LV Name /dev/pve/root
VG Name pve
LV UUID Fi0FwM-n3we-Fbo5-MLTT-xs0i-iTXK-OqM7nw
LV Write Access read/write
LV Status available
# open 1
LV Size 96.00 GB
Current LE 24576
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:1

--- Logical volume ---
LV Name /dev/pve/data
VG Name pve
LV UUID 3J5wlF-SW32-8nAi-CwXd-L2N2-0Qgc-EsfTwH
LV Write Access read/write
LV Status available
# open 1
LV Size 435.23 GB
Current LE 111420
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:2

tom · Oct 24, 2012

if you got only 4GB free space, you cannot allocate more than 4GB (see 'pvs')

stef1777 · Oct 24, 2012

I're right. Je suis une buse !

On another 1.8 node with the same 4G space, a tgz backup of à 65Go CT backup works fine...

I hope to upgrade this cluster to 2.1 as soon as possible.

Thanks,

Search

Search

Problem during CT backup with i/o error on lvm snapshot

stef1777

Active Member

tom

Proxmox Staff Member

stef1777

Active Member

dietmar

Proxmox Staff Member

stef1777

Active Member

stef1777

Active Member

tom

Proxmox Staff Member

stef1777

Active Member

stef1777

Active Member

tom

Proxmox Staff Member

stef1777

Active Member

tom

Proxmox Staff Member

stef1777

Active Member