system made unusable during endless vzdump

bread-baker

Member
Mar 6, 2010
432
0
16
a cron vzdump started 2pm yesterday, and that was the beginning of making the containers go into a coma.

I've been trying to remotely reboot the system but it will not occur.

the system is a 2010 dual cpu Supermicro with a raid-10 3ware card. there is also a 500gb disk attached to on board sata for /bkup


here is part of vzdump log openvz-8002.log which was causing i think the problem:
Code:
Nov 22 14:10:37 INFO: Starting Backup of VM 8002 (openvz)
Nov 22 14:10:37 INFO: CTID 8002 exist mounted running
Nov 22 14:10:37 INFO: status = running
Nov 22 14:10:38 INFO: backup mode: snapshot
Nov 22 14:10:38 INFO: ionice priority: 7
Nov 22 14:10:38 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-fbc8-0')
Nov 22 14:11:36 INFO:   Logical volume "vzsnap-fbc8-0" created
Nov 22 14:11:38 INFO: creating archive '/bkup/pvebkp-fbc8//dump/vzdump-openvz-8002-2011_11_22-14_10_37.tgz'
Nov 22 14:11:43 INFO: find: `./usr/share/doc/lynx-cur': Input/output error
Nov 22 14:11:43 INFO: find: `./usr/share/doc/telnet': Input/output error
Nov 22 14:11:43 INFO: find: `./usr/share/doc/tofrodos': Input/output error
Nov 22 14:11:43 INFO: find: `./usr/share/doc/hostname': Input/output error
Nov 22 14:11:43 INFO: find: `./usr/share/doc/libkeyutils1': Input/output error
....
Nov 22 14:11:52 INFO: tar: ./etc/pm: Warning: Cannot stat: Input/output error
Nov 22 14:11:52 INFO: tar: ./etc/zm: Warning: Cannot stat: Input/output error
Nov 22 14:11:52 INFO: tar: ./aquota.user: Warning: Cannot stat: Input/output error
Nov 22 14:11:52 INFO: tar: ./var: Warning: Cannot stat: Input/output error
Nov 22 14:11:52 INFO: Total bytes written: 445767680 (426MiB, 47MiB/s)
Nov 22 14:11:52 INFO: archive file size: 13MB
Nov 22 14:11:53 INFO:   /dev/pve/vzsnap-fbc8-0: read failed after 0 of 4096 at 140391677952: Input/output error
Nov 22 14:11:53 INFO:   /dev/pve/vzsnap-fbc8-0: read failed after 0 of 4096 at 140391735296: Input/output error
Nov 22 14:11:53 INFO:   /dev/pve/vzsnap-fbc8-0: read failed after 0 of 4096 at 0: Input/output error
Nov 22 14:11:53 INFO:   /dev/pve/vzsnap-fbc8-0: read failed after 0 of 4096 at 4096: Input/output error
Nov 22 14:12:57 INFO:   Unable to deactivate open pve-vzsnap--fbc8--0-cow (253:7)
Nov 22 14:12:57 INFO:   Failed to resume vzsnap-fbc8-0.
Nov 22 14:12:57 INFO:   libdevmapper exiting with 1 device(s) still suspended.
Nov 22 14:12:57 INFO: lvremove failed - trying again in 1 seconds

...
openvz-8002.log lines 3856-3885/3885 byte 464939/464939 (END)   (press RETURN)

syslog
Code:
Nov 22 14:10:01 fbc8 pvestatd[2351]: status update time (12.976 seconds)
Nov 22 14:10:31 fbc8 rrdcached[1615]: flushing old values
Nov 22 14:10:31 fbc8 rrdcached[1615]: rotating journals
Nov 22 14:10:33 fbc8 rrdcached[1615]: started new journal /var/lib/rrdcached/journal//rrd.journal.1321989033.354893
Nov 22 14:10:33 fbc8 rrdcached[1615]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1321981831.406289
Nov 22 14:10:37 fbc8 vzdump[311835]: INFO: Finished Backup of VM 8001 (00:10:27)
Nov 22 14:10:37 fbc8 vzdump[311835]: INFO: Starting Backup of VM 8002 (openvz)
Nov 22 14:11:37 fbc8 pvestatd[2351]: status update time (56.325 seconds)
Nov 22 14:11:37 fbc8 kernel: EXT3-fs: barriers disabled
Nov 22 14:11:37 fbc8 kernel: kjournald starting.  Commit interval 5 seconds
Nov 22 14:11:37 fbc8 kernel: EXT3-fs (dm-5): using internal journal
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 617291
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1935533
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1936537
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1935499
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1935499
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1936535
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1936570
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1936454
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 617349
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 666177
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 666176
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 666175
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 666084
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 666083
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 2189185
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 2189182
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 2189181
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 2189180
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 2189179
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1254925
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1254924
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1254921
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1254917
Nov 22 14:11:37 fbc8 kernel: ext3_orphan_cleanup: deleting unreferenced inode 1254913
Nov 22 14:11:37 fbc8 kernel: EXT3-fs (dm-5): 23 orphan inodes deleted
Nov 22 14:11:37 fbc8 kernel: EXT3-fs (dm-5): recovery complete
Nov 22 14:11:38 fbc8 kernel: EXT3-fs (dm-5): mounted filesystem with ordered data mode
Nov Nov 22 14:11:43 fbc8 kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=1722390, block=6881411
Nov 22 14:11:43 fbc8 kernel: Buffer I/O error on device dm-5, logical block 0
Nov 22 14:11:43 fbc8 kernel: lost page write due to I/O error on dm-5
Nov 22 14:11:43 fbc8 kernel: EXT3-fs (dm-5): I/O error while writing superblock
Nov 22 14:11:43 fbc8 kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=1722390, block=6881411
Nov 22 14:11:43 fbc8 kernel: Buffer I/O error on device dm-5, logical block 0
Nov 22 14:11:43 fbc8 kernel: lost page write due to I/O error on dm-5
Nov 22 14:11:43 fbc8 kernel: EXT3-fs (dm-5): I/O error while writing superblock
Nov 22 14:11:43 fbc8 kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=1722390, block=6881411
Nov 22 14:11:43 fbc8 kernel: Buffer I/O error on device dm-5, logical block 0
Nov 22 14:11:43 fbc8 kernel: lost page write due to I/O error on dm-5
Nov 22 14:11:43 fbc8 kernel: EXT3-fs (dm-5): I/O error while writing superblock
Nov 22 14:11:43 fbc8 kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=1721392, block=6881348
22 14:11:42 fbc8 kernel: device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception

....
thousands more
....
Nov 22 14:11:52 fbc8 kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=894794, block=3571830
Nov 22 14:11:52 fbc8 kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=1246633, block=498082
8
Nov 22 14:11:52 fbc8 kernel: EXT3-fs (dm-5): error: ext3_put_super: Couldn't clean up the journal
Nov 22 14:14:21 fbc8 kernel: INFO: task kjournald:1056 blocked for more than 120 seconds.
Nov 22 14:14:21 fbc8 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 22 14:14:21 fbc8 kernel: kjournald     D ffff88043d5e2b20     0  1056      2 0x00000000
Nov 22 14:14:21 fbc8 kernel: ffff88043dbd7c30 0000000000000046 0000000000000000 ffff880411490f00
Nov 22 14:14:21 fbc8 kernel: ffff88043dbd7c00 ffffffff813f22ec ffff88043dbd7be0 0000000104b59c5a
Nov 22 14:14:21 fbc8 kernel: ffff88043d5e30e8 ffff88043dbd7fd8 000000000000f788 ffff88043d5e30e8
Nov 22 14:14:21 fbc8 kernel: Call Trace:
Nov 22 14:14:21 fbc8 kernel: [<ffffffff813f22ec>] ? dm_table_unplug_all+0x5c/0xd0
Nov 22 14:14:21 fbc8 kernel: [<ffffffff814e5223>] io_schedule+0xa3/0x110
Nov 22 14:14:21 fbc8 kernel: [<ffffffff811b9460>] ? sync_buffer+0x0/0x50
Nov 22 14:14:21 fbc8 kernel: [<ffffffff811b94a5>] sync_buffer+0x45/0x50
Nov 22 14:14:21 fbc8 kernel: [<ffffffff814e5abf>] __wait_on_bit+0x5f/0x90
Nov 22 14:14:21 fbc8 kernel: [<ffffffff811b9460>] ? sync_buffer+0x0/0x50
Nov 22 14:14:21 fbc8 kernel: [<ffffffff814e5b68>] out_of_line_wait_on_bit+0x78/0x90
Nov 22 14:14:21 fbc8 kernel: [<ffffffff81092390>] ? wake_bit_function+0x0/0x40
Nov 22 14:14:21 fbc8 kernel: [<ffffffffa00997a6>] ? __journal_remove_journal_head+0xc6/0x150 [jbd]
Nov 22 14:14:21 fbc8 kernel: [<ffffffff811b9456>] __wait_on_buffer+0x26/0x30
Nov 22 14:14:21 fbc8 kernel: [<ffffffffa0096eee>] journal_commit_transaction+0x9ce/0x1130 [jbd]
Nov 22 14:14:21 fbc8 kernel: [<ffffffff8107b02c>] ? lock_timer_base+0x3c/0x70
Nov 22 14:14:21 fbc8 kernel: [<ffffffff8107bc8b>] ? try_to_del_timer_sync+0x7b/0xe0
Nov 22 14:14:21 fbc8 kernel: [<ffffffffa0099fc8>] kjournald+0xe8/0x250 [jbd]
Nov 22 14:14:21 fbc8 kernel: [<ffffffff81092350>] ? autoremove_wake_function+0x0/0x40
Nov 22 14:14:21 fbc8 kernel: [<ffffffffa0099ee0>] ? kjournald+0x0/0x250 [jbd]
Nov 22 14:14:21 fbc8 kernel: [<ffffffff81091d76>] kthread+0x96/0xa0

let me know if you want syslog and other info .

the computer needs to be manually reset. reboot , init 6 , killing most f the processes does not work.

any quick suggestions to save a trip?
 
ok, let me make sure I understand
lvs:
Code:
root@fbc8 ~ # lvs
  LV             VG      Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  data           pve     -wi-ao 130.75g                                      
  root           pve     -wi-ao  49.75g                                      
  swap           pve     -wi-ao  15.00g                                      
  vzsnap-fbc8-0  pve     -wi-a-   1.00g                                      
  vm-8033-disk-1 vg-fbc8 -wi-a-  12.00g                                      
  vm-8033-disk-2 vg-fbc8 -wi-a-  32.00g

df:
Code:
root@fbc8 /fbc/adm/lvm # df
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/pve-root
              ext3     49G  978M   46G   3% /
tmpfs        tmpfs    7.9G     0  7.9G   0% /lib/init/rw
udev         tmpfs     10M  248K  9.8M   3% /dev
tmpfs        tmpfs    7.9G   19M  7.9G   1% /dev/shm
/dev/mapper/pve-data
              ext3    129G   42G   88G  33% /var/lib/vz
/dev/sda1     ext3    495M   35M  435M   8% /boot
/dev/sdc1      xfs    466G  109G  357G  24% /bkup
/dev/fuse     fuse     30M   20K   30M   1% /etc/pve

I assume vzsnap-fbc8-0 needs to be increased. so i'll do this:
use lvreduce to take away from data , and add that to vzsnap-fbc8-0 using lvextend ?

btw overall proxmox 2.0 is great!
 
on another prox 2.0 system there is not a vsnap logical volume . So I assume that came from the failed vzdump ?

if so then probably all I need to do make more snapshot area is reduct the data or swap?

and then remove the hung over vzdump ?
 
so i did this:
Code:
lvremove /dev/mapper/pve-vzsnap--fbc8--0
Do you really want to remove active logical volume vzsnap-fbc8-0? [y/n]: y
  Logical volume "vzsnap-fbc8-0" successfully removed

Code:
swapoff /dev/mapper/pve-swap
Code:
lvreduce -L-1G /dev/mapper/pve-swap
  WARNING: Reducing active logical volume to 14.00 GiB
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce swap? [y/n]: y
  Reducing logical volume swap to 14.00 GiB
  Logical volume swap successfully resized
Code:
swapon /dev/pve/swap
swapon: /dev/pve/swap: swapon failed: Invalid argument

mkswap /dev/pve/swap
mkswap: /dev/pve/swap: warning: don't erase bootbits sectors
        on whole disk. Use -f to force.
Setting up swapspace version 1, size = 14680060 KiB
no label, UUID=09f56f43-e0ac-4fc6-b317-489fa2310e4d
Code:
swapon /dev/pve/swap

I assume vzdump will auto use the extra space?
Code:
 vgs
  VG      #PV #LV #SN Attr   VSize   VFree
  pve       1   3   0 wz--n- 199.50g 5.00g
  vg-fbc8   1   2   0 wz--n-   1.17t 1.13t

there is 5g extra, so there must have been 4 g extra before, as i decreased swap by 1g.
So how does vzdump determine how much space to yse for a snapshot?

edir: just tried a snapshot backup and ran into this:
Code:
[code]
INFO: starting new backup job: vzdump 8002 --mode snapshot --compress 1 --storage bkup8 --node fbc8
INFO: Starting Backup of VM 8002 (openvz)
INFO: CTID 8002 exist mounted running
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-fbc8-0')
INFO: Logical volume "vzsnap-fbc8-0" created
INFO: creating archive '/bkup/pvebkp-fbc8//dump/vzdump-openvz-8002-2011_11_23-14_50_44.tgz'
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/035-capture.jpg': Input/output error
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/015-analyse.jpg': Input/output error
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/017-capture.jpg': Input/output error
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/015-capture.jpg': Input/output error
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/012-analyse.jpg': Input/output error
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/024-capture.jpg': Input/output error
INFO: find: `./var/cache/zoneminder/events/13/11/11/22/03/56/59/041-capture.jpg': Input/output error

next I'll try a backup in stop mode
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!