system made unusable during endless vzdump

bread-baker

Member
Mar 6, 2010
432
0
16
a cron vzdump started 2pm yesterday, and that was the beginning of making the containers go into a coma.

I've been trying to remotely reboot the system but it will not occur.

the system is a 2010 dual cpu Supermicro with a raid-10 3ware card. there is also a 500gb disk attached to on board sata for /bkup


here is part of vzdump log openvz-8002.log which was causing i think the problem:
Code:
Nov 22 14:10:37 INFO: Starting Backup of VM 8002 (openvz)
Nov 22 14:10:37 INFO: CTID 8002 exist mounted running
Nov 22 14:10:37 INFO: status = running
Nov 22 14:10:38 INFO: backup mode: snapshot
Nov 22 14:10:38 INFO: ionice priority: 7
Nov 22 14:10:38 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-fbc8-0')
Nov 22 14:11:36 INFO:   Logical volume "vzsnap-fbc8-0" created
Nov 22 14:11:38 INFO: creating archive '/bkup/pvebkp-fbc8//dump/vzdump-openvz-8002-2011_11_22-14_10_37.tgz'
Nov 22 14:11:43 INFO: find: `./usr/share/doc/lynx-cur': Input/output error
Nov 22 14:11:43 INFO: find: `./usr/share/doc/telnet': Input/output error
Nov 22 14:11:43 INFO: find: `./usr/share/doc/tofrodos': Input/output error
Nov 22 14:11:43 INFO: find: `./usr/share/doc/hostname': Input/output error
Nov 22 14:11:43 INFO: find: `./usr/share/doc/libkeyutils1': Input/output error
....
Nov 22 14:11:52 INFO: tar: ./etc/pm: Warning: Cannot stat: Input/output error
Nov 22 14:11:52 INFO: tar: ./etc/zm: Warning: Cannot stat: Input/output error
Nov 22 14:11:52 INFO: tar: ./aquota.user: Warning: Cannot stat: Input/output error
Nov 22 14:11:52 INFO: tar: ./var: Warning: Cannot stat: Input/output error
Nov 22 14:11:52 INFO: Total bytes written: 445767680 (426MiB, 47MiB/s)
Nov 22 14:11:52 INFO: archive file size: 13MB
Nov 22 14:11:53 INFO:   /dev/pve/vzsnap-fbc8-0: read failed after 0 of 4096 at 140391677952: Input/output error
Nov 22 14:11:53 INFO:   /dev/pve/vzsnap-fbc8-0: read failed after 0 of 4096 at 140391735296: Input/output error
Nov 22 14:11:53 INFO:   /dev/pve/vzsnap-fbc8-0: read failed after 0 of 4096 at 0: Input/output error
Nov 22 14:11:53 INFO:   /dev/pve/vzsnap-fbc8-0: read failed after 0 of 4096 at 4096: Input/output error
Nov 22 14:12:57 INFO:   Unable to deactivate open pve-vzsnap--fbc8--0-cow (253:7)
Nov 22 14:12:57 INFO:   Failed to resume vzsnap-fbc8-0.
Nov 22 14:12:57 INFO:   libdevmapper exiting with 1 device(s) still suspended.
Nov 22 14:12:57 INFO: lvremove failed - trying again in 1 seconds

...
openvz-8002.log lines 3856-3885/3885 byte 464939/464939 (END)   (press RETURN)

syslog
Code:
Nov 22 14:10:01 fbc8 pvestatd[2351]: status update time (12.976 seconds)
Nov 22 14:10:31 fbc8 rrdcached[1615]: flushing old values
Nov 22 14:10:31 fbc8 rrdcached[1615]: rotating journals
Nov 22 14:10:33 fbc8 rrdcached[1615]: started new journal /var/lib/rrdcached/journal//rrd.journal.1321989033.354893
Nov 22 14:10:33 fbc8 rrdcached[1615]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1321981831.406289
Nov 22 14:10:37 fbc8 vzdump[311835]: INFO: Finished Backup of VM 8001 (00:10:27)
Nov 22 14:10:37 fbc8 vzdump[311835]: INFO: Starting Backup of VM 8002 (openvz)
Nov 22 14:11:37 fbc8 pvestatd[2351]: status update time (56.325 seconds)
Nov 22 14:11:37 fbc8 kernel: EXT3-fs: barriers disabled
Nov 22 14:11:37 fbc8 kernel: kjournald starting.  Commit interval 5 seconds
Nov 22 14:11:37 fbc8 kernel: EXT3-fs (dm-5): using internal journal
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 617291
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1935533
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1936537
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1935499
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1935499
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1936535
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1936570
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1936454
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 617349
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 666177
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 666176
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 666175
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 666084
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 666083
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 2189185
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 2189182
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 2189181
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 2189180
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 2189179
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1254925
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1254924
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1254921
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1254917
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1254913
Nov 22 14:11:37 fbc8 kernel: EXT3-fs (dm-5): 23 orphan inodes deleted
Nov 22 14:11:37 fbc8 kernel: EXT3-fs (dm-5): recovery complete
Nov 22 14:11:38 fbc8 kernel: EXT3-fs (dm-5): mounted filesystem with ordered data mode
Nov Nov 22 14:11:43 fbc8 kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=1722390, block=6881411
Nov 22 14:11:43 fbc8 kernel: Buffer I/O error on device dm-5, logical block 0
Nov 22 14:11:43 fbc8 kernel: lost page write due to I/O error on dm-5
Nov 22 14:11:43 fbc8 kernel: EXT3-fs (dm-5): I/O error while writing superblock
Nov 22 14:11:43 fbc8 kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=1722390, block=6881411
Nov 22 14:11:43 fbc8 kernel: Buffer I/O error on device dm-5, logical block 0
Nov 22 14:11:43 fbc8 kernel: lost page write due to I/O error on dm-5
Nov 22 14:11:43 fbc8 kernel: EXT3-fs (dm-5): I/O error while writing superblock
Nov 22 14:11:43 fbc8 kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=1722390, block=6881411
Nov 22 14:11:43 fbc8 kernel: Buffer I/O error on device dm-5, logical block 0
Nov 22 14:11:43 fbc8 kernel: lost page write due to I/O error on dm-5
Nov 22 14:11:43 fbc8 kernel: EXT3-fs (dm-5): I/O error while writing superblock
Nov 22 14:11:43 fbc8 kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=1721392, block=6881348
22 14:11:42 fbc8 kernel: device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception

....
thousands more
....
Nov 22 14:11:52 fbc8 kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=894794, block=3571830
Nov 22 14:11:52 fbc8 kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=1246633, block=498082
8
Nov 22 14:11:52 fbc8 kernel: EXT3-fs (dm-5): error: ext3_put_super: Couldn't clean up the journal
Nov 22 14:14:21 fbc8 kernel: INFO: task kjournald:1056 blocked for more than 120 seconds.
Nov 22 14:14:21 fbc8 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 22 14:14:21 fbc8 kernel: kjournald     D ffff88043d5e2b20     0  1056      2 0x00000000
Nov 22 14:14:21 fbc8 kernel: ffff88043dbd7c30 0000000000000046 0000000000000000 ffff880411490f00
Nov 22 14:14:21 fbc8 kernel: ffff88043dbd7c00 ffffffff813f22ec ffff88043dbd7be0 0000000104b59c5a
Nov 22 14:14:21 fbc8 kernel: ffff88043d5e30e8 ffff88043dbd7fd8 000000000000f788 ffff88043d5e30e8
Nov 22 14:14:21 fbc8 kernel: Call Trace:
Nov 22 14:14:21 fbc8 kernel: [<ffffffff813f22ec>] ? dm_table_unplug_all+0x5c/0xd0
Nov 22 14:14:21 fbc8 kernel: [<ffffffff814e5223>] io_schedule+0xa3/0x110
Nov 22 14:14:21 fbc8 kernel: [<ffffffff811b9460>] ? sync_buffer+0x0/0x50
Nov 22 14:14:21 fbc8 kernel: [<ffffffff811b94a5>] sync_buffer+0x45/0x50
Nov 22 14:14:21 fbc8 kernel: [<ffffffff814e5abf>] __wait_on_bit+0x5f/0x90
Nov 22 14:14:21 fbc8 kernel: [<ffffffff811b9460>] ? sync_buffer+0x0/0x50
Nov 22 14:14:21 fbc8 kernel: [<ffffffff814e5b68>] out_of_line_wait_on_bit+0x78/0x90
Nov 22 14:14:21 fbc8 kernel: [<ffffffff81092390>] ? wake_bit_function+0x0/0x40
Nov 22 14:14:21 fbc8 kernel: [<ffffffffa00997a6>] ? __journal_remove_journal_head+0xc6/0x150 [jbd]
Nov 22 14:14:21 fbc8 kernel: [<ffffffff811b9456>] __wait_on_buffer+0x26/0x30
Nov 22 14:14:21 fbc8 kernel: [<ffffffffa0096eee>] journal_commit_transaction+0x9ce/0x1130 [jbd]
Nov 22 14:14:21 fbc8 kernel: [<ffffffff8107b02c>] ? lock_timer_base+0x3c/0x70
Nov 22 14:14:21 fbc8 kernel: [<ffffffff8107bc8b>] ? try_to_del_timer_sync+0x7b/0xe0
Nov 22 14:14:21 fbc8 kernel: [<ffffffffa0099fc8>] kjournald+0xe8/0x250 [jbd]
Nov 22 14:14:21 fbc8 kernel: [<ffffffff81092350>] ? autoremove_wake_function+0x0/0x40
Nov 22 14:14:21 fbc8 kernel: [<ffffffffa0099ee0>] ? kjournald+0x0/0x250 [jbd]
Nov 22 14:14:21 fbc8 kernel: [<ffffffff81091d76>] kthread+0x96/0xa0

let me know if you want syslog and other info .

the computer needs to be manually reset. reboot , init 6 , killing most f the processes does not work.

any quick suggestions to save a trip?
 
ok, let me make sure I understand
lvs:
Code:
root@fbc8 ~ # lvs
  LV             VG      Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  data           pve     -wi-ao 130.75g                                      
  root           pve     -wi-ao  49.75g                                      
  swap           pve     -wi-ao  15.00g                                      
  vzsnap-fbc8-0  pve     -wi-a-   1.00g                                      
  vm-8033-disk-1 vg-fbc8 -wi-a-  12.00g                                      
  vm-8033-disk-2 vg-fbc8 -wi-a-  32.00g

df:
Code:
root@fbc8 /fbc/adm/lvm # df
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/pve-root
              ext3     49G  978M   46G   3% /
tmpfs        tmpfs    7.9G     0  7.9G   0% /lib/init/rw
udev         tmpfs     10M  248K  9.8M   3% /dev
tmpfs        tmpfs    7.9G   19M  7.9G   1% /dev/shm
/dev/mapper/pve-data
              ext3    129G   42G   88G  33% /var/lib/vz
/dev/sda1     ext3    495M   35M  435M   8% /boot
/dev/sdc1      xfs    466G  109G  357G  24% /bkup
/dev/fuse     fuse     30M   20K   30M   1% /etc/pve

I assume vzsnap-fbc8-0 needs to be increased. so i'll do this:
use lvreduce to take away from data , and add that to vzsnap-fbc8-0 using lvextend ?

btw overall proxmox 2.0 is great!
 
on another prox 2.0 system there is not a vsnap logical volume . So I assume that came from the failed vzdump ?

if so then probably all I need to do make more snapshot area is reduct the data or swap?

and then remove the hung over vzdump ?
 
so i did this:
Code:
lvremove /dev/mapper/pve-vzsnap--fbc8--0
Do you really want to remove active logical volume vzsnap-fbc8-0? [y/n]: y
  Logical volume "vzsnap-fbc8-0" successfully removed

Code:
swapoff /dev/mapper/pve-swap
Code:
lvreduce -L-1G /dev/mapper/pve-swap
  WARNING: Reducing active logical volume to 14.00 GiB
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce swap? [y/n]: y
  Reducing logical volume swap to 14.00 GiB
  Logical volume swap successfully resized
Code:
swapon /dev/pve/swap
swapon: /dev/pve/swap: swapon failed: Invalid argument

mkswap /dev/pve/swap
mkswap: /dev/pve/swap: warning: don't erase bootbits sectors
        on whole disk. Use -f to force.
Setting up swapspace version 1, size = 14680060 KiB
no label, UUID=09f56f43-e0ac-4fc6-b317-489fa2310e4d
Code:
swapon /dev/pve/swap

I assume vzdump will auto use the extra space?
Code:
 vgs
  VG      #PV #LV #SN Attr   VSize   VFree
  pve       1   3   0 wz--n- 199.50g 5.00g
  vg-fbc8   1   2   0 wz--n-   1.17t 1.13t

there is 5g extra, so there must have been 4 g extra before, as i decreased swap by 1g.
So how does vzdump determine how much space to yse for a snapshot?

edir: just tried a snapshot backup and ran into this:
Code:
[code]
INFO: starting new backup job: vzdump 8002 --mode snapshot --compress 1 --storage bkup8 --node fbc8
INFO: Starting Backup of VM 8002 (openvz)
INFO: CTID 8002 exist mounted running
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-fbc8-0')
INFO: Logical volume "vzsnap-fbc8-0" created
INFO: creating archive '/bkup/pvebkp-fbc8//dump/vzdump-openvz-8002-2011_11_23-14_50_44.tgz'
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/035-capture.jpg': Input/output error
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/015-analyse.jpg': Input/output error
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/017-capture.jpg': Input/output error
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/015-capture.jpg': Input/output error
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/012-analyse.jpg': Input/output error
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/024-capture.jpg': Input/output error
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/041-capture.jpg': Input/output error

next I'll try a backup in stop mode
 
Last edited: