vzdump using lvm snapshot - kills the box

well, thank you, but today the ServeRAID 8k system gave itself a killshot after during the backup-run.after the power-reset the vzsnap is still here: LV Name /dev/pve/vzsnap-k81-0 VG Name pve LV UUID krLMDp-ZGkt-ybWA-26FE-ZpyR-t3c9-VCONRD LV Write Access read/write LV Status availablethough not active.the last entry in the vzdump-log (also the first VE):cat openvz-11102.log Apr 17 22:00:02 INFO: Starting Backup of VM 11102 (openvz)Apr 17 22:00:02 INFO: CTID 11102 exist mounted runningApr 17 22:00:02 INFO: status = runningApr 17 22:00:02 INFO: backup mode: snapshotApr 17 22:00:02 INFO: ionice priority: 7Apr 17 22:00:02 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-k81-0')Apr 17 22:00:02 INFO: Logical volume "vzsnap-k81-0" createdApr 17 22:00:05 INFO: creating archive '/mnt/pve/backup/dump/vzdump-openvz-11102-2012_04_17-22_00_02.tar.lzo'Apr 17 22:04:39 INFO: Total bytes written: 6196131840 (5.8GiB, 24MiB/s)Apr 17 22:04:55 INFO: archive file size: 4.11GBApr 17 22:04:55 INFO: delete old backup '/mnt/pve/backup/dump/vzdump-openvz-11102-2012_04_15-22_00_02.tar.lzo'somehow during this and before "Finished Backup" all things break hell loose....and syslog goes again like this:Apr 17 22:08:04 k81 kernel: INFO: task flush-253:2:893 blocked for more than 120 seconds.Apr 17 22:08:04 k81 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Apr 17 22:08:04 k81 kernel: flush-253:2 D ffff8806b7d165c0 0 893 2 0 0x00000000Apr 17 22:08:04 k81 kernel: ffff8806af975570 0000000000000046 0000000000000000 ffffffff8108e4ffApr 17 22:08:04 k81 kernel: ffff8806af9754f0 ffffffff81412a25 ffff8806b08b5dc8 ffff8806af975580Apr 17 22:08:04 k81 kernel: ffff8806af975540 ffff8806b7d16b60 ffff8806af975fd8 ffff8806af975fd8Apr 17 22:08:04 k81 kernel: Call Trace:Apr 17 22:08:04 k81 kernel: [] ? queue_work+0x1f/0x30Apr 17 22:08:04 k81 kernel: [] ? dispatch_job+0x45/0x80Apr 17 22:08:04 k81 kernel: [] do_get_write_access+0x29d/0x510 [jbd2]Apr 17 22:08:04 k81 kernel: [] ? wake_bit_function+0x0/0x40Apr 17 22:08:04 k81 kernel: [] jbd2_journal_get_write_access+0x31/0x50 [jbd2]Apr 17 22:08:05 k81 kernel: [] __ext4_journal_get_write_access+0x38/0x70 [ext4]Apr 17 22:08:05 k81 kernel: [] ext4_mb_mark_diskspace_used+0x7a/0x2f0 [ext4]Apr 17 22:08:05 k81 kernel: [] ? ext4_mb_initialize_context+0x82/0x1d0 [ext4]Apr 17 22:08:05 k81 kernel: [] ext4_mb_new_blocks+0x3a9/0x670 [ext4]Apr 17 22:08:05 k81 kernel: [] ? ext4_ext_find_extent+0x2ab/0x310 [ext4]Apr 17 22:08:05 k81 kernel: [] ext4_ext_get_blocks+0x1109/0x1a20 [ext4]Apr 17 22:08:05 k81 kernel: [] ? mempool_alloc_slab+0x15/0x20Apr 17 22:08:05 k81 kernel: [] ? mempool_alloc+0x65/0x150Apr 17 22:08:05 k81 kernel: [] ? submit_bio+0xf5/0x1a0....after a few hours the cron-jobs are so filled up and nothing works anymore:Apr 18 02:40:27 k81 rrdcached[1543]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1334702427.721924Apr 18 03:01:09 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8Apr 18 03:01:19 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8Apr 18 03:01:29 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8Apr 18 03:01:39 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8Apr 18 03:01:49 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8Apr 18 03:01:59 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8Apr 18 03:02:09 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8Apr 18 03:02:19 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8Apr 18 03:02:29 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8Apr 18 03:02:39 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8Apr 18 03:02:49 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8Apr 18 03:02:59 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8Apr 18 03:03:09 k81 pvestatd[699189]: command '/usr/sbin/vzctl exec 11104 /bin/cat /proc/net/dev' failed: exit code 8all went well for a week, but I definitely do believe lvm snapshots are broken.we do quite similar vzdumps using snapshots on nearly the same system (different CPU, less HDDs and RAM) that is still running PVE 1.9 - runs without any systemproblems (doing daily dumps too), therefore I can only guess, that the lvm in PVE 2.0 carries some serious trouble within.kindest regardshk
 
Hi,
I have the same problems with the snaptshots. I have a md raid(1) and the problem is only with openvz, kvm running correctly. the fs is ext3.

~ # pveversion -v
pve-manager: 2.0-59 (pve-manager/2.0/18400f07)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-66
pve-kernel-2.6.32-10-pve: 2.6.32-64
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.88-2pve2
clvm: 2.02.88-2pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-38
pve-firmware: 1.0-15
libpve-common-perl: 1.0-26
libpve-access-control: 1.0-18
libpve-storage-perl: 2.0-17
vncterm: 1.0-2
vzctl: 3.0.30-2pve2
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

lvdisplay after Hard reset:
--- Logical volume ---
LV Name /dev/vg0/data
VG Name vg0
LV UUID OH2ADq-VGTt-GyU6-LF8M-W039-6P9I-FMNRYY
LV Write Access read/write
LV Status available
# open 1
LV Size 1.47 TiB
Current LE 385280
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:3

--- Logical volume ---
LV Name /dev/vg0/vzsnap-vm-node02-0
VG Name vg0
LV UUID 3dHTVM-HQO8-thyA-UNgg-RXmx-cvii-FQ8zlO
LV Write Access read/write
LV Status available
# open 0
LV Size 2.00 GiB
Current LE 512
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:4


regards gunforce
 
Last edited:
Still a problem with new Proxmox 3.0.
-HP Smart Array P400 Controller
-OpenVZ containers on HardWare Raid5 setup for additional data2 partition

Could this be because our data2 LVM setup is partition-less?

root@czc63519z7:~# vzctl stop 109
Stopping container ...
Child 828215 exited with status 7
Killing container ...
Child 828217 exited with status 7
Unable to stop container

root@czc63519z7:~# tail /var/log/syslog
Jun 11 10:22:13 czc63519z7 pvestatd[3036]: status update time (6.105 seconds)
Jun 11 10:22:20 czc63519z7 pvestatd[3036]: command '/usr/sbin/vzctl exec 109 /bin/cat /proc/net/dev' failed: exit code 8
Jun 11 10:22:23 czc63519z7 pvestatd[3036]: command '/usr/sbin/vzctl exec 109 /bin/cat /proc/net/dev' failed: exit code 8
Jun 11 10:22:23 czc63519z7 pvestatd[3036]: status update time (6.105 seconds)
 
Last edited:
Last backup failed. ioctl failed.

NFO: CTID 109 exist mounted running
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating lvm snapshot of /dev/mapper/v_c0d1-data2 ('/dev/v_c0d1/vzsnap-czc63519z7-0')
INFO: device-mapper: suspend ioctl on failed: Unknown error 524
INFO: Unable to suspend v_c0d1-data2 (253:1)
INFO: Failed to suspend origin data2
INFO: libdevmapper exiting with 1 device(s) still suspended.
ERROR: Backup of VM 109 failed - command 'lvcreate --size 1024M --snapshot --name vzsnap-czc63519z7-0 /dev/v_c0d1/data2' failed: exit code 5
INFO: Backup job finished with errors

Migrate works though even after this. I'm going to move all the VM's away from this volume and re-install it.
 
Last edited:
Hi, we need your help please,

@dietmar, you make good support for the nice software proxmox, can you help us please?


all our hp servers with smart array p400 kills on backup/snapshot with new version v3. in version 2 it has works fine.

here screens on top and error in webinterface. we must every restart the servers by reset. our cust are very angry.

please help us. very very thanks.

webinterface.png

top_on_this_time.png
regards
 
Last edited:

@dietmar, you make good support for the nice software proxmox, can you help us please?

You need to understand that I am unable to directly support 50.000 users for free. So please stop requesting direct support from me.

Instead, you should consider to buy subscriptions (if forum support is not good enough for you).
 
hi dietmar,

i understand you, but this bug have more users, we have create a bug entry, well its a critical issue. i understand you, if you can´t support 50k users, but we hope, you can help us and the other users to a solution of this bug/problem. it´s comes with the new version 3, before it´s works fine. you as developer have the internal information to check updates in the new version and can help us and the other users to find the solution. i hope, you or a other people of your team can help us for this problem.

very thanks for this and the nice works on proxmox.

regards
 
Last edited:
Good Morning

I'm running several HP Proliant Server (with HP SA P400) with PVE 3 on it w/o any issue . We store all backups to a File Server with LVM. The performance is exactly what we expect on a Gigabit network.
I just have some suggestions, which may help you to get rid of performance issues :
1. Try to fine tune your HP Smart Array P400 Controller. >> google
2. check the Write/Read ration in your controller settings.
3. If possible, avoid to run the PVE Containers on a RAID5 volume. The P400 Controller It is known as a "bad performer" if it comes to RAID5. Use RAID1+0 at least for the KVM/OpenVZ Containers.
- if you have a budget, try to purchase a better RAID Controller.

cheers
tom
 
hi sumsum,

thanks for your post, but the performance is not the problem. We have a transferrate over 800mb/s onour internal storage and with gbit to external storage we have 119mb/s. it is normal and ok for this controller.
The problem comes with a update v2.3 to v3.0. Before it´s works very fine and without errors. The p400 is not the best, but performance we have and it run normal.

We think, it was a other problem, well on servers with v2.3 it´s works normal. We have test following: update a normal running server with version 2.3 to 3.0. The problem comes directly after upgrade. And we are not allown with this problem, show after posts in this thread. You understand?

regards
 
Same problem here.
After upgrading to 3.0 snapshots are killing the server.
It looks like an LVM issue. When I turn off snapshots and use suspend mode for openvz backups it works fine.

Best,
Sven
 
Hi all,

do anywhere have a solution for this bug/error? we must reset in last 6 days 32 nodes with this error. We do not have a solution. We hope, the proxmox team or a other user have a idea or a solution, well we do not have any ideas.

please help us. very thanks.

regards
 
hi proxmoy team,

can you check it please, if the bug is this:
-> http://www.thomas-krenn.com/en/wiki...en_activating_LVM_volume_group_with_snapshots
(http://www.redhat.com/archives/linux-lvm/2012-February/msg00043.html)
(https://bugzilla.redhat.com/show_bug.cgi?id=490245)

and here are the logs of the last server kill:

on start a openvz snapshot:
Code:
Jun 20 07:35:40 pegasus pmxcfs[5487]: [dcdb] notice: data verification successful
[B]Jun 20 07:53:34 pegasus kernel: device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.[/B]
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #189809698 offset 0
Jun 20 07:53:34 pegasus kernel: Buffer I/O error on device dm-3, logical block 0
Jun 20 07:53:34 pegasus kernel: lost page write due to I/O error on dm-3
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #189809698 offset 0
Jun 20 07:53:34 pegasus kernel: Buffer I/O error on device dm-3, logical block 0
Jun 20 07:53:34 pegasus kernel: lost page write due to I/O error on dm-3
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #189809698 offset 0
Jun 20 07:53:34 pegasus kernel: Buffer I/O error on device dm-3, logical block 0
Jun 20 07:53:34 pegasus kernel: lost page write due to I/O error on dm-3
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #190194372 offset 0
Jun 20 07:53:34 pegasus kernel: Buffer I/O error on device dm-3, logical block 0
Jun 20 07:53:34 pegasus kernel: lost page write due to I/O error on dm-3
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #190194372 offset 0
Jun 20 07:53:34 pegasus kernel: Buffer I/O error on device dm-3, logical block 0
Jun 20 07:53:34 pegasus kernel: lost page write due to I/O error on dm-3
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #190194372 offset 0
Jun 20 07:53:34 pegasus kernel: Buffer I/O error on device dm-3, logical block 0
Jun 20 07:53:34 pegasus kernel: lost page write due to I/O error on dm-3
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #189809162 offset 0
Jun 20 07:53:34 pegasus kernel: Buffer I/O error on device dm-3, logical block 0
Jun 20 07:53:34 pegasus kernel: lost page write due to I/O error on dm-3
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #189809162 offset 0
Jun 20 07:53:34 pegasus kernel: Buffer I/O error on device dm-3, logical block 0
Jun 20 07:53:34 pegasus kernel: lost page write due to I/O error on dm-3
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #189809162 offset 0
Jun 20 07:53:34 pegasus kernel: Buffer I/O error on device dm-3, logical block 0
Jun 20 07:53:34 pegasus kernel: lost page write due to I/O error on dm-3
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #189809162 offset 0
Jun 20 07:53:34 pegasus kernel: Buffer I/O error on device dm-3, logical block 0
Jun 20 07:53:34 pegasus kernel: lost page write due to I/O error on dm-3
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #189809162 offset 0
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #189809162 offset 0
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock
Jun 20 07:53:34 pegasus kernel: EXT3-fs error (device dm-3): ext3_find_entry: reading directory #189809162 offset 0
Jun 20 07:53:34 pegasus kernel: EXT3-fs (dm-3): I/O error while writing superblock

this is the first error msg whitch we have found since the problem.

@dietmar in this post http://forum.proxmox.com/threads/68...le-Server-(with-kernel-bug)?p=38867#post38867 you wrote, you have add a check for free space before a snapshot start, is this test in v3 more implemented, well in v2 this problem does not exist?

please check it and tell us a answer. very thanks and regards
 
Last edited:
Hi, sorry for spam, but we have the same issue again:

- hp dl 380 g5
- hp smart array p400 in raid 5 with bbu and 512mb cache
Code:
hpacucli ctrl all show status

Smart Array P400 in Slot 1
   Controller Status: OK
   Cache Status: OK
   Battery/Capacitor Status: OK

load avg very high and kill the maschine.

modinfo megaraid_sas:
Code:
filename:       /lib/modules/2.6.32-20-pve/kernel/drivers/scsi/megaraid/megaraid_sas.ko
description:    LSI MegaRAID SAS Driver
author:         megaraidlinux@lsi.com
version:        00.00.06.14-rh1
license:        GPL
srcversion:     B5718A893E029F0BEDF21A6
alias:          pci:v00001000d0000005Dsv*sd*bc*sc*i*
alias:          pci:v00001000d0000005Bsv*sd*bc*sc*i*
alias:          pci:v00001028d00000015sv*sd*bc*sc*i*
alias:          pci:v00001000d00000413sv*sd*bc*sc*i*
alias:          pci:v00001000d00000071sv*sd*bc*sc*i*
alias:          pci:v00001000d00000073sv*sd*bc*sc*i*
alias:          pci:v00001000d00000079sv*sd*bc*sc*i*
alias:          pci:v00001000d00000078sv*sd*bc*sc*i*
alias:          pci:v00001000d0000007Csv*sd*bc*sc*i*
alias:          pci:v00001000d00000060sv*sd*bc*sc*i*
alias:          pci:v00001000d00000411sv*sd*bc*sc*i*
depends:
vermagic:       2.6.32-20-pve SMP mod_unload modversions
parm:           max_sectors:Maximum number of sectors per IO command (int)
parm:           msix_disable:Disable MSI-X interrupt handling. Default: 0 (int)

lvs
Code:
LV               VG   Attr     LSize  Pool Origin Data%  Move Log Copy%  Convert
  data             pve  -wi-ao--  6,21t
  root             pve  -wi-ao-- 96,00g
  swap             pve  -wi-ao-- 47,00g
  vzsnap-pegasus-0 pve  -wi-a---  8,00g

pveperf
Code:
CPU BOGOMIPS:      47999.36
REGEX/SECOND:      1160111
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    29.09 MB/sec
AVERAGE SEEK TIME: 14.49 ms
FSYNCS/SECOND:     2438.76
DNS EXT:           66.94 ms
DNS INT:           32.86 ms (domain.de)

/etc/vzdump.conf
Code:
bwlimit: 0
size: 8192

snapshot/suspend on this host:
Code:
Jun 25 00:30:01 pegasus /USR/SBIN/CRON[106986]: (root) CMD (vzdump 5133 --quiet 1 --mailto info@domain.de --mode snapshot --node pegasus --compress gzip --storage SLS-001)
Jun 25 00:30:01 pegasus vzdump[106987]: <root@pam> starting task UPID:pegasus:0001A225:02204A7C:51C8C869:vzdump::root@pam:
Jun 25 00:30:01 pegasus vzdump[107045]: INFO: starting new backup job: vzdump 5133 --quiet 1 --mode snapshot --mailto info@speedloc.de --compress gzip --storage SLS-001 --node pegasus
Jun 25 00:30:01 pegasus vzdump[107045]: INFO: Starting Backup of VM 5133 (openvz)
Jun 25 00:30:02 pegasus kernel: EXT3-fs: barriers disabled
Jun 25 00:30:02 pegasus kernel: kjournald starting. Commit interval 5 seconds
Jun 25 00:30:02 pegasus kernel: EXT3-fs (dm-3): using internal journal
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 177561770
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 177561769
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 158942572
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 158942570
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 158942569
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 158942557
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 158942562
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 158942561
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 164773993
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 156370811
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 158942265
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 158942264
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 158941725
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 158941723
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 158941230
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 155992343
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 177561626
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 177561625
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 177561623
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 177561622
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 177561617
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 177766479
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 164773977
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 164773974
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 164773973
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 164773972
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 164773969
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 165339401
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 156370793
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 156370491
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 156370488
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 156370482
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 156370470
Jun 25 00:30:02 pegasus kernel: ext3_orphan_cleanup: deleting unreferenced inode 155287563
Jun 25 00:30:02 pegasus kernel: EXT3-fs (dm-3): 34 orphan inodes deleted
Jun 25 00:30:02 pegasus kernel: EXT3-fs (dm-3): recovery complete
Jun 25 00:30:02 pegasus kernel: EXT3-fs (dm-3): mounted filesystem with ordered data mode
Jun 25 00:33:30 pegasus rrdcached[5432]: flushing old values
Jun 25 00:33:30 pegasus rrdcached[5432]: rotating journals
Jun 25 00:33:30 pegasus rrdcached[5432]: started new journal /var/lib/rrdcached/journal/rrd.journal.1372113210.224938
Jun 25 00:33:30 pegasus rrdcached[5432]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1372106010.224929
Jun 25 00:33:30 pegasus pmxcfs[5447]: [dcdb] notice: data verification successful
Jun 25 00:41:44 pegasus pmxcfs[5447]: [status] notice: received log
Jun 25 00:45:01 pegasus /USR/SBIN/CRON[112379]: (root) CMD (vzdump 5130 --quiet 1 --mode suspend --mailto info@domain.de --node pegasus --compress gzip --storage SLS-001)
Jun 25 00:45:01 pegasus vzdump[112380]: <root@pam> starting task UPID:pegasus:0001B700:0221AA12:51C8CBED:vzdump::root@pam:
Jun 25 00:45:01 pegasus vzdump[112384]: INFO: trying to get global lock - waiting...
Jun 25 00:54:55 pegasus kernel: INFO: task flush-253:2:3640 blocked for more than 120 seconds.
Jun 25 00:54:55 pegasus kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 25 00:54:55 pegasus kernel: flush-253:2 D ffff880e24b8b2e0 0 3640 2 0 0x00000000
Jun 25 00:54:55 pegasus kernel: ffff880dd0e8f7b0 0000000000000046 0000000000000000 ffff880b02424340
Jun 25 00:54:55 pegasus kernel: ffff880dd0e8f770 ffffffff8140b5ec ffff880dd0e8f740 ffff880e23ce5000
Jun 25 00:54:55 pegasus kernel: ffff8808913c4f00 000000011551a8b2 ffff880dd0e8ffd8 ffff880dd0e8ffd8
Jun 25 00:54:55 pegasus kernel: Call Trace:
Jun 25 00:54:55 pegasus kernel: [<ffffffff8140b5ec>] ? dm_table_unplug_all+0x5c/0x100
Jun 25 00:54:55 pegasus kernel: [<ffffffff811c7760>] ? sync_buffer+0x0/0x50
Jun 25 00:54:55 pegasus kernel: [<ffffffff8151b563>] io_schedule+0x73/0xc0
Jun 25 00:54:55 pegasus kernel: [<ffffffff811c77a0>] sync_buffer+0x40/0x50
Jun 25 00:54:55 pegasus kernel: [<ffffffff8151c31b>] __wait_on_bit_lock+0x5b/0xc0
Jun 25 00:54:55 pegasus kernel: [<ffffffff811c7760>] ? sync_buffer+0x0/0x50
Jun 25 00:54:55 pegasus kernel: [<ffffffff8151c3fc>] out_of_line_wait_on_bit_lock+0x7c/0x90
Jun 25 00:54:55 pegasus kernel: [<ffffffff81095340>] ? wake_bit_function+0x0/0x50
Jun 25 00:54:55 pegasus kernel: [<ffffffff811c92f0>] ? end_buffer_async_write+0x0/0x180
Jun 25 00:54:55 pegasus kernel: [<ffffffff811c8136>] __lock_buffer+0x36/0x40
Jun 25 00:54:55 pegasus kernel: [<ffffffff811c9e85>] __block_write_full_page+0x185/0x4e0
Jun 25 00:54:55 pegasus kernel: [<ffffffff8113addc>] ? release_pages+0x38c/0x420
Jun 25 00:54:55 pegasus kernel: [<ffffffff811c92f0>] ? end_buffer_async_write+0x0/0x180
Jun 25 00:54:55 pegasus kernel: [<ffffffff811c76f0>] ? generic_submit_bh_handler+0x0/0x10
Jun 25 00:54:55 pegasus kernel: [<ffffffff811ca686>] generic_block_write_full_page+0x106/0x110
Jun 25 00:54:55 pegasus kernel: [<ffffffff811ca6a8>] block_write_full_page_endio+0x18/0x20
Jun 25 00:54:55 pegasus kernel: [<ffffffff811ca6c5>] block_write_full_page+0x15/0x20
Jun 25 00:54:55 pegasus kernel: [<ffffffffa00b83dd>] ext3_ordered_writepage+0x1ed/0x240 [ext3]
Jun 25 00:54:55 pegasus kernel: [<ffffffff81137887>] __writepage+0x17/0x40
Jun 25 00:54:55 pegasus kernel: [<ffffffff811380b6>] write_cache_pages+0x1f6/0x440
Jun 25 00:54:55 pegasus kernel: [<ffffffff81056a7a>] ? update_curr+0xba/0x160
Jun 25 00:54:55 pegasus kernel: [<ffffffff81137870>] ? __writepage+0x0/0x40
Jun 25 00:54:55 pegasus kernel: [<ffffffff81138324>] generic_writepages+0x24/0x30
Jun 25 00:54:55 pegasus kernel: [<ffffffff81138dbd>] do_writepages+0x3d/0x50
Jun 25 00:54:55 pegasus kernel: [<ffffffff811be472>] __writeback_single_inode+0xb2/0x280
Jun 25 00:54:55 pegasus kernel: [<ffffffff811be6c3>] writeback_single_inode+0x83/0xc0
Jun 25 00:54:55 pegasus kernel: [<ffffffff811bf233>] writeback_sb_inodes+0x103/0x1e0
Jun 25 00:54:55 pegasus kernel: [<ffffffff811bf45f>] writeback_inodes_wb+0xff/0x170
Jun 25 00:54:55 pegasus kernel: [<ffffffff811bf78b>] wb_writeback+0x2bb/0x3f0
Jun 25 00:54:55 pegasus kernel: [<ffffffff811bfa46>] wb_do_writeback+0x186/0x240
Jun 25 00:54:55 pegasus kernel: [<ffffffff8107b900>] ? process_timeout+0x0/0x10
Jun 25 00:54:55 pegasus kernel: [<ffffffff811bfb5b>] bdi_writeback_task+0x5b/0x1b0
Jun 25 00:54:55 pegasus kernel: [<ffffffff81094fa7>] ? bit_waitqueue+0x17/0xc0
Jun 25 00:54:55 pegasus kernel: [<ffffffff8114d4d2>] bdi_start_fn+0x92/0x100
Jun 25 00:54:55 pegasus kernel: [<ffffffff8114d440>] ? bdi_start_fn+0x0/0x100
Jun 25 00:54:55 pegasus kernel: [<ffffffff81094d68>] kthread+0x88/0x90
Jun 25 00:54:55 pegasus kernel: [<ffffffff810096d2>] ? __switch_to+0xc2/0x2f0
Jun 25 00:54:55 pegasus kernel: [<ffffffff8100c22a>] child_rip+0xa/0x20
Jun 25 00:54:55 pegasus kernel: [<ffffffff81094ce0>] ? kthread+0x0/0x90
Jun 25 00:54:55 pegasus kernel: [<ffffffff8100c220>] ? child_rip+0x0/0x20
Jun 25 00:54:55 pegasus kernel: INFO: task kjournald:4520 blocked for more than 120 seconds.
Jun 25 00:54:55 pegasus kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 25 00:54:55 pegasus kernel: kjournald D ffff880e24b8a800 0 4520 2 0 0x00000000
Jun 25 00:54:55 pegasus kernel: ffff880dd0f5fc70 0000000000000046 0000000000000000 ffff880b02424340
Jun 25 00:54:55 pegasus kernel: ffff880dd0f5fc30 ffffffff8140b5ec ffff880dd0f5fc20 ffffffff8109f5b3
Jun 25 00:54:55 pegasus kernel: ffff880dd0f5fc20 0000000000000282 ffff880dd0f5ffd8 ffff880dd0f5ffd8
Jun 25 00:54:55 pegasus kernel: Call Trace:
Jun 25 00:54:55 pegasus kernel: [<ffffffff8140b5ec>] ? dm_table_unplug_all+0x5c/0x100
Jun 25 00:54:55 pegasus kernel: [<ffffffff8109f5b3>] ? ktime_get_ts+0xb3/0xf0
Jun 25 00:54:55 pegasus kernel: [<ffffffff811c7760>] ? sync_buffer+0x0/0x50
Jun 25 00:54:55 pegasus kernel: [<ffffffff8151b563>] io_schedule+0x73/0xc0
Jun 25 00:54:55 pegasus kernel: [<ffffffff811c77a0>] sync_buffer+0x40/0x50
Jun 25 00:54:55 pegasus kernel: [<ffffffff8151c470>] __wait_on_bit+0x60/0x90

on attachment our backup-jobs. we start 2 jobs on same time. one job must wait of the 1st job, before the next can start.backup-jobs.png
Code:
INFO: trying to get global lock - waiting...

a other log entry:
Code:
INFO: starting new backup job: vzdump 5131 --quiet 1 --mailto info@domain.de --mode snapshot --compress gzip --storage SLS-001 --node pegasus
INFO: Starting Backup of VM 5131 (openvz)
INFO: CTID 5131 exist mounted running
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-pegasus-0')
INFO: Logical volume "vzsnap-pegasus-0" created
INFO: creating archive '/mnt/pve/SLS-001/dump/vzdump-openvz-5131-2013_06_19-23_00_01.tar.gz'
INFO: tar: ./home/ghg/html/m/data/37/22.dsk: File shrank by 64387 bytes; padding with zeros
INFO: tar: ./home/ghg/html/m/data/37/40.dsk: Read error at byte 0, while reading 4608 bytes: Input/output error
INFO: tar: ./home/ghg/html/m/data/37/94.dsk: Read error at byte 0, while reading 1536 bytes: Input/output error
INFO: tar: ./home/ghg/html/m/data/37/63/48/5.msg: Read error at byte 0, while reading 2048 bytes: Input/output error
...
INFO: Total bytes written: 270657372160 (253GiB, 9.1MiB/s)
INFO: tar: Exiting with failure status due to previous errors
ERROR: Backup of VM 5131 failed - command '(cd /mnt/vzsnap0/private/5131;find . '(' -regex '^\.$' ')' -o '(' -type 's' -prune ')' -o -print0|sed 's/\\/\\\\/g'|tar cpf - --totals --sparse --numeric-owner --no-recursion --one-file-system --null -T -|gzip) >/mnt/pve/SLS-001/dump/vzdump-openvz-5131-2013_06_19-23_00_01.tar.dat' failed: exit code 2
INFO: Backup job finished with errors
postdrop: warning: uid=0: File too large
sendmail: fatal: root(0): message file too big

but only ~50mb data was changed in backup time? why exit code 2

what can we do here and do you need more informations for help?

very thanks. regards
 
Last edited:
We also have a similar issue - Dell PowerEdge 815 servers, 64GB RAM, RAID5 array for ProxMox - all VMs and related are stored on iSCSI targets. Our backups were going to an NFS mount set up in ProxMox

ProxMox 3.0 - backups set to go after midnight of about 15 OpenVZ containers and 3 KVM VMs. The server would go to an astronomically high load (300-700) but I was still able to log into it. Running 'top' did not show what process was creating the load. Running 'ps aux' would hang and I would have to ssh again to do anything in the system. I tried htop, atop and a few other utils - all of which froze and would not report any information.

Removed the NFS mount for backups as I suspected this was to blame. Created a local directory on an iSCSI target for backups, formatted it ext4, same issue last night.

I have had to physically reboot the server every night for the last 4 nights. Today I disabled the backups as I cannot have this server going up and down every day.

I ran the updates and am now at the following version of ProxMox (I noticed a test version of 3.0-30 has been announced but I will have to wait to see if this works for anyone else)

root@xx5:~# pveversion -v
pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
running kernel: 2.6.32-20-pve
proxmox-ve-2.6.32: 3.0-100
pve-kernel-2.6.32-20-pve: 2.6.32-100
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-22
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-8
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!