vzdump using lvm snapshot - kills the box

Hello more on this. May be a way to look.

I have 2 Dell R420 servers with the same config. The first one contains 3 larges openvz CT and was crashing. I removed the night backup script, no more crash. The 3 CT contains 10 of thousand of file (small jpg and text cache files). I have a second Dell R420 servers with 7 small openvz CT. This one is not crashing.

We have had very similar symptoms (though current -107 kernel is not crashing so far).
We have a CT that has folders with thousands of small jpg files, if that CT is present on any of our nodes than the node would crash during snapshot backup, if not then everything is fine.

Are you using ext3 or ext4?
 
Last edited:
Since dell raid cards are LSI Megaraid cards and other threads reported that an update of the kernel module for the raid card fixed the issue I installed the latest megaraid kernel module yesterday.
Last night was the first time my vzdump snaphots ran without crashing my server. I am on the following driver version now:
modinfo megaraid_sas
filename: /lib/modules/2.6.32-22-pve/kernel/drivers/scsi/megaraid/megaraid_sas.ko
description: LSI MegaRAID SAS Driver
author: megaraidlinux@lsi.com
version: 06.600.18.00
license: GPL
srcversion: D7C72B1660AC8DB45B7443A
alias: pci:v00001000d0000005Fsv*sd*bc*sc*i*
alias: pci:v00001000d0000005Dsv*sd*bc*sc*i*
alias: pci:v00001000d0000005Bsv*sd*bc*sc*i*
alias: pci:v00001028d00000015sv*sd*bc*sc*i*
alias: pci:v00001000d00000413sv*sd*bc*sc*i*
alias: pci:v00001000d00000071sv*sd*bc*sc*i*
alias: pci:v00001000d00000073sv*sd*bc*sc*i*
alias: pci:v00001000d00000079sv*sd*bc*sc*i*
alias: pci:v00001000d00000078sv*sd*bc*sc*i*
alias: pci:v00001000d0000007Csv*sd*bc*sc*i*
alias: pci:v00001000d00000060sv*sd*bc*sc*i*
alias: pci:v00001000d00000411sv*sd*bc*sc*i*
depends:
vermagic: 2.6.32-22-pve SMP mod_unload modversions
parm: max_sectors:Maximum number of sectors per IO command (int)
parm: msix_disable:Disable MSI-X interrupt handling. Default: 0 (int)
parm: msix_vectors:MSI-X max vector count. Default: Set by FW (int)
parm: throttlequeuedepth:Adapter queue depth when throttled due to I/O timeout. Default: 16 (int)
parm: resetwaittime:Wait time in seconds after I/O timeout before resetting adapter. Default: 180 (int)

I will monitor the situation for a week. If the problems are gone can someone from the proxmox team add the latest megaraid driver to the proxmox kernel???

Thanks
Sven
 
Hello,

here i have the same Problem with 3 HP DL360pG8 Server with P420i Smart Array Controller with Battery Write Cache and RAID 10.

Is the Problem now fixed in the new 3day old kernel for Version3 ?
 
Since dell raid cards are LSI Megaraid cards and other threads reported that an update of the kernel module for the raid card fixed the issue I installed the latest megaraid kernel module yesterday.

I have also uploaded a new kernel including that driver to the pvetest repository (pve-kernel-2.6.32-23-pve_2.6.32-108_amd64.deb).
 
Hi,

I'm having the same issue. This is on supermicro, with areca controler, so it doesn't seem to be linked with LSI controlers, or at least not only with them.

Until 2 days ago, I was still on proxmox 2, and having no issue (I was already using ext4 at that time).

I still have the previous kernel. Should I try booting on it ?
 
@marco44

do you have rebooted after upgrade again? if not, please do it and test it again.

regards
 
I also have Areca cards and am seeing much higher load with backups in Proxmox 3.X than in 2.X
Some Asus boards, some supermicro, various models of both.

I am backing up to a dedicated backup disk encrypted with LUKS, backup filesystem is ext4
Looking at this graph you can clearly see when we upgraded:

loadup.png
 
Unfortunately we had another occurrence of the problem, with -107 kernel and the newest Adaptec firmware.

We are now almost positive that it is connected to ve's having thousands of files in one or more folders.
 
Hi all,

As promised - I upgraded the kernel as recommended (version info below) - the same problem is still rearing it's ugly head. The box locks up tight when backups run and the only way to recover is to power down and reboot. This is a Dell R815 / Dell PERC H700 controller.

In my previous post I stated that I had 3.0.23 working on some new Dell R620 servers with PERC H310 controllers. This is no longer the case. Now two of those servers are doing the same thing, locking up during backups with very high loads (3000!) - I have upgraded that entire cluster (6 machines) to the version 2.6.32-107 as indicated below and disabled backups on all but two of the servers. These are very lightly used ProxMox 3 servers, with 32 GB/RAM / 12 CPU cores and only 4-6 VMs per server. I don't think any of the VMs are special outside of running an Ubuntu 64 bit OpenVZ container, not a high number of files or load on any of these VMs.

I will continue to follow similar threads here until we can come to a solution. I have 4 other servers I'd like to get current from 1.9 to 3.0.x ProxMox but I am waiting until this issue is resolved.

Thanks everyone!
Joe Jenkins

root@xx5:~# pveversion -v
pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
running kernel: 2.6.32-22-pve
proxmox-ve-2.6.32: 3.0-107
pve-kernel-2.6.32-22-pve: 2.6.32-107
pve-kernel-2.6.32-20-pve: 2.6.32-100
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-23
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-8
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1
 
I've tried downgrading to 2.6.32-20-pve, and the kernel hung just the same. This kernel has been stable for months before upgrading to proxmox 3. So it seems it isn't only the kernel...

While I was at it, I also checked that this error wasn't triggered by a snapshot full. It isn't. When the problem occured tonight, only 3% of the snapshot was used.
 
Hi dietmar,

I have same problem on Dell R610 and R710 with snapshot vzdump (CT and VM) on pve-kernel-2.6.32-23-pve_2.6.32-108_amd64.

These problems appeared after the upgrade to wheezy.

Code:
$ uname -a
Linux skywalker 2.6.32-23-pve #1 SMP Tue Jul 23 07:58:26 CEST 2013 x86_64 GNU/Linux

Code:
$ modinfo megaraid_sas
filename:       /lib/modules/2.6.32-23-pve/kernel/drivers/scsi/megaraid/megaraid_sas.ko
description:    LSI MegaRAID SAS Driver
author:         megaraidlinux@lsi.com
version:        06.600.18.00
license:        GPL
srcversion:     D7C72B1660AC8DB45B7443A
alias:          pci:v00001000d0000005Fsv*sd*bc*sc*i*
alias:          pci:v00001000d0000005Dsv*sd*bc*sc*i*
alias:          pci:v00001000d0000005Bsv*sd*bc*sc*i*
alias:          pci:v00001028d00000015sv*sd*bc*sc*i*
alias:          pci:v00001000d00000413sv*sd*bc*sc*i*
alias:          pci:v00001000d00000071sv*sd*bc*sc*i*
alias:          pci:v00001000d00000073sv*sd*bc*sc*i*
alias:          pci:v00001000d00000079sv*sd*bc*sc*i*
alias:          pci:v00001000d00000078sv*sd*bc*sc*i*
alias:          pci:v00001000d0000007Csv*sd*bc*sc*i*
alias:          pci:v00001000d00000060sv*sd*bc*sc*i*
alias:          pci:v00001000d00000411sv*sd*bc*sc*i*
depends:
vermagic:       2.6.32-23-pve SMP mod_unload modversions
parm:           max_sectors:Maximum number of sectors per IO command (int)
parm:           msix_disable:Disable MSI-X interrupt handling. Default: 0 (int)
parm:           msix_vectors:MSI-X max vector count. Default: Set by FW (int)
parm:           throttlequeuedepth:Adapter queue depth when throttled due to I/O timeout. Default: 16 (int)
parm:           resetwaittime:Wait time in seconds after I/O timeout before resetting adapter. Default: 180 (int)

Code:
$ pveversion -v
pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
running kernel: 2.6.32-23-pve
proxmox-ve-2.6.32: 3.0-107
pve-kernel-2.6.32-23-pve: 2.6.32-108
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-22-pve: 2.6.32-107
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-23
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-8
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1

Example tonight error :
Code:
Jul 31 01:00:02 skywalker vzdump[866605]: INFO: starting new backup job: vzdump --quiet 1 --mailto root --mode snapshot --compress lzo --storage windu --node skywalker --all 1
Jul 31 01:00:02 skywalker vzdump[866605]: INFO: Starting Backup of VM 101 (openvz)
Jul 31 01:00:17 skywalker vzdump[866605]: INFO: Finished Backup of VM 101 (00:00:15)
Jul 31 01:00:17 skywalker vzdump[866605]: INFO: Starting Backup of VM 107 (openvz)
Jul 31 01:00:18 skywalker kernel: EXT3-fs: barriers disabled
Jul 31 01:00:18 skywalker kernel: kjournald starting.  Commit interval 5 seconds
Jul 31 01:00:18 skywalker kernel: EXT3-fs (dm-2): using internal journal
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93431985
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94570207
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93342024
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93980817
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93538456
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94143035
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94142895
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94142917
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94142924
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94143009
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94142861
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94142871
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94142920
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94143016
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94142991
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94143050
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94143015
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94142997
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 94142863
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93380896
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93380894
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93380890
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93413380
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93357893
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93357890
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93357882
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93357618
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93357522
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93464627
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93464621
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93464600
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93464599
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 93464597
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 98575223
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 96921674
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 96921673
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 96921672
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 96921671
Jul 31 01:00:18 skywalker kernel: ext3_orphan_cleanup: deleting unreferenced inode 96921654
Jul 31 01:00:18 skywalker kernel: EXT3-fs (dm-2): 39 orphan inodes deleted
Jul 31 01:00:18 skywalker kernel: EXT3-fs (dm-2): recovery complete
Jul 31 01:00:18 skywalker kernel: EXT3-fs (dm-2): mounted filesystem with ordered data mode

Jul 31 01:11:05 skywalker kernel: INFO: task kjournald:1691 blocked for more than 120 seconds.
Jul 31 01:11:05 skywalker kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 31 01:11:05 skywalker kernel: kjournald     D ffff88082ba9b2e0     0  1691      2    0 0x00000000
Jul 31 01:11:05 skywalker kernel: ffff88082d343c70 0000000000000046 0000000000000000 ffff8805a9eaa300
Jul 31 01:11:05 skywalker kernel: ffff88082d343c30 ffffffff8142a42c ffff88082d343c20 ffffffff810a5563
Jul 31 01:11:05 skywalker kernel: ffff88082d343c20 0000000000000282 ffff88082d343fd8 ffff88082d343fd8
Jul 31 01:11:05 skywalker kernel: Call Trace:
Jul 31 01:11:05 skywalker kernel: [<ffffffff8142a42c>] ? dm_table_unplug_all+0x5c/0x100
Jul 31 01:11:05 skywalker kernel: [<ffffffff810a5563>] ? ktime_get_ts+0xb3/0xf0
Jul 31 01:11:05 skywalker kernel: [<ffffffff811d5490>] ? sync_buffer+0x0/0x50
Jul 31 01:11:05 skywalker kernel: [<ffffffff81540023>] io_schedule+0x73/0xc0
Jul 31 01:11:05 skywalker kernel: [<ffffffff811d54d0>] sync_buffer+0x40/0x50
Jul 31 01:11:05 skywalker kernel: [<ffffffff81540f30>] __wait_on_bit+0x60/0x90
Jul 31 01:11:05 skywalker kernel: [<ffffffff811d5490>] ? sync_buffer+0x0/0x50
Jul 31 01:11:05 skywalker kernel: [<ffffffff81540fdc>] out_of_line_wait_on_bit+0x7c/0x90
Jul 31 01:11:05 skywalker kernel: [<ffffffff8109b2c0>] ? wake_bit_function+0x0/0x50
Jul 31 01:11:05 skywalker kernel: [<ffffffff811d5e56>] __wait_on_buffer+0x26/0x30
Jul 31 01:11:05 skywalker kernel: [<ffffffffa0089100>] journal_commit_transaction+0xc00/0x1200 [jbd]
Jul 31 01:11:05 skywalker kernel: [<ffffffff81080ea8>] ? lock_timer_base.isra.52+0x38/0x70
Jul 31 01:11:05 skywalker kernel: [<ffffffff810829a4>] ? try_to_del_timer_sync+0x84/0xe0
Jul 31 01:11:05 skywalker kernel: [<ffffffffa008e895>] kjournald+0xe5/0x230 [jbd]
Jul 31 01:11:05 skywalker kernel: [<ffffffff8109b280>] ? autoremove_wake_function+0x0/0x40
Jul 31 01:11:05 skywalker kernel: [<ffffffffa008e7b0>] ? kjournald+0x0/0x230 [jbd]
Jul 31 01:11:05 skywalker kernel: [<ffffffff8109ace8>] kthread+0x88/0x90
Jul 31 01:11:05 skywalker kernel: [<ffffffff810096d2>] ? __switch_to+0xc2/0x2f0
Jul 31 01:11:05 skywalker kernel: [<ffffffff8100c22a>] child_rip+0xa/0x20
Jul 31 01:11:05 skywalker kernel: [<ffffffff8109ac60>] ? kthread+0x0/0x90
Jul 31 01:11:05 skywalker kernel: [<ffffffff8100c220>] ? child_rip+0x0/0x20
Jul 31 01:11:05 skywalker kernel: INFO: task flush-253:1:2664 blocked for more than 120 seconds.
Jul 31 01:11:05 skywalker kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 31 01:11:05 skywalker kernel: flush-253:1   D ffff880823d14540     0  2664      2    0 0x00000000
Jul 31 01:11:05 skywalker kernel: ffff880823d217b0 0000000000000046 0000000000000000 ffff8805a9eaa300
Jul 31 01:11:05 skywalker kernel: ffff880823d21770 ffffffff8142a42c ffff880823d21740 ffff88082badec00

Thanks
 
Here with Dell R420 and HP DL360G6 and G7, the crash only occur on node with large openvz CT. If the CT contains 10 of thousand of files and folder, (as jpg or cache files), the backup kill the box. I've a R420 with 7 regular CT (small web servers), no crash. I have another R420 with only 2 big CT (web servers with 10 of thousand of small files), it crash each time. I stopped the backup on this box, no crash.

CTs are running Debian Sqeeeze and some have NFS mount.
 
Last edited:
Downgrading to 2.6.32-19-pve didn't solve the problem either. I just hung once more.
 
Thanks for the advice Dietmar, per your instructions I did upgrade to pve-kernel-2.6.32-23-pve: 2.6.32-108, rebooted the box, and enabled some backups again last night. It seemed to make it through 3-4 OpenVZ containers just fine and then it tried to backup a busy KVM VM, the load skyrocketed (slowly) and eventually I was forced to reboot the server hard.

I noticed your comment above about bad drives - just so you know, on our servers, they are RAID5 and RAID10 arrays that this is happening on, 6 of the servers are brand new with new drives in the arrays.

What's interesting when backups ran this time around is that when Prox tries to backup VM 165 - a KVM VM (after doing 3-4 openVZ containers without issue) there are tons of timeout errors in the log and you can see where there are kernel alerts due to hang-ups.

Hopefully all this back and forth will help us all pinpoint and resolve the issue, thanks for your patience and help :)

Joe Jenkins

Version info / Syslogs below:

The "pvestatd[3708]: WARNING: unable to connect to VM 165 socket - timeout after 31 retries" line is repeated dozens of times so i removed many of them from my post below.

Code:
root@xx5:~# modinfo megaraid_sas
filename:       /lib/modules/2.6.32-23-pve/kernel/drivers/scsi/megaraid/megaraid_sas.ko
description:    LSI MegaRAID SAS Driver
author:         megaraidlinux@lsi.com
version:        06.600.18.00
license:        GPL
srcversion:     D7C72B1660AC8DB45B7443A
alias:          pci:v00001000d0000005Fsv*sd*bc*sc*i*
alias:          pci:v00001000d0000005Dsv*sd*bc*sc*i*
alias:          pci:v00001000d0000005Bsv*sd*bc*sc*i*
alias:          pci:v00001028d00000015sv*sd*bc*sc*i*
alias:          pci:v00001000d00000413sv*sd*bc*sc*i*
alias:          pci:v00001000d00000071sv*sd*bc*sc*i*
alias:          pci:v00001000d00000073sv*sd*bc*sc*i*
alias:          pci:v00001000d00000079sv*sd*bc*sc*i*
alias:          pci:v00001000d00000078sv*sd*bc*sc*i*
alias:          pci:v00001000d0000007Csv*sd*bc*sc*i*
alias:          pci:v00001000d00000060sv*sd*bc*sc*i*
alias:          pci:v00001000d00000411sv*sd*bc*sc*i*
depends:
vermagic:       2.6.32-23-pve SMP mod_unload modversions
parm:           max_sectors:Maximum number of sectors per IO command (int)
parm:           msix_disable:Disable MSI-X interrupt handling. Default: 0 (int)
parm:           msix_vectors:MSI-X max vector count. Default: Set by FW (int)
parm:           throttlequeuedepth:Adapter queue depth when throttled due to I/O timeout. Default: 16 (int)
parm:           resetwaittime:Wait time in seconds after I/O timeout before resetting adapter. Default: 180 (int)
root@xx5:~# pveversion --v
pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
running kernel: 2.6.32-23-pve
proxmox-ve-2.6.32: 3.0-107
pve-kernel-2.6.32-23-pve: 2.6.32-108
pve-kernel-2.6.32-22-pve: 2.6.32-107
pve-kernel-2.6.32-20-pve: 2.6.32-100
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-23
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-8
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1

Code:
Jul 30 23:26:18 xx5 vzdump[38022]: INFO: Finished Backup of VM 1001 (00:01:17)
Jul 30 23:26:18 xx5 vzdump[38022]: INFO: Starting Backup of VM 1002 (openvz)
Jul 30 23:26:20 xx5 kernel: EXT4-fs (dm-4): warning: checktime reached, running e2fsck is recommended
Jul 30 23:26:20 xx5 kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Jul 30 23:26:20 xx5 kernel: BUG: Quota files for 1003 are broken: no quota engine running
Jul 30 23:26:20 xx5 kernel: EXT4-fs (dm-4): ext4_orphan_cleanup: deleting unreferenced inode 73016718
Jul 30 23:26:20 xx5 kernel: EXT4-fs (dm-4): 5 orphan inodes deleted
Jul 30 23:26:20 xx5 kernel: EXT4-fs (dm-4): recovery complete
Jul 30 23:26:20 xx5 kernel: EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts:
Jul 30 23:28:27 xx5 vzdump[38022]: INFO: Finished Backup of VM 1002 (00:02:09)
Jul 30 23:28:27 xx5 vzdump[38022]: INFO: Starting Backup of VM 1003 (openvz)
Jul 30 23:28:27 xx5 kernel: EXT4-fs (dm-4): warning: checktime reached, running e2fsck is recommended
Jul 30 23:28:27 xx5 kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Jul 30 23:28:27 xx5 kernel: BUG: Quota files for 1003 are broken: no quota engine running
Jul 30 23:28:28 xx5 kernel: EXT4-fs (dm-4): 5 orphan inodes deleted
Jul 30 23:28:28 xx5 kernel: EXT4-fs (dm-4): recovery complete
Jul 30 23:28:28 xx5 kernel: EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts:
Jul 30 23:33:15 xx5 pveproxy[3713]: worker 3716 finished
Jul 30 23:33:15 xx5 pveproxy[3713]: starting 1 worker(s)
Jul 30 23:33:15 xx5 pveproxy[3713]: worker 51891 started
Jul 30 23:34:41 xx5 pvestatd[3708]: WARNING: unable to connect to VM 165 socket - timeout after 31 retries
Jul 30 23:34:51 xx5 pvestatd[3708]: WARNING: unable to connect to VM 165 socket - timeout after 31 retries
Jul 30 23:37:10 xx5 kernel: INFO: task jbd2/dm-3-8:4149 blocked for more than 120 seconds.
Jul 30 23:37:10 xx5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 23:37:10 xx5 kernel: jbd2/dm-3-8   D ffff88087d762340     0  4149      2    0 0x00000000
Jul 30 23:37:10 xx5 kernel: ffff88087e5b1b40 0000000000000046 ffff88087e5b1af0 ffffffff8142a42c
Jul 30 23:37:10 xx5 kernel: ffff88087e5b1ab0 ffffffff81012899 ffff88087e5b1af0 0000000000000282
Jul 30 23:37:10 xx5 kernel: ffff88087e5b1af0 00000000220d7e76 ffff88087e5b1fd8 ffff88087e5b1fd8
Jul 30 23:37:10 xx5 kernel: Call Trace:
Jul 30 23:37:10 xx5 kernel: [<ffffffff8142a42c>] ? dm_table_unplug_all+0x5c/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffff81012899>] ? read_tsc+0x9/0x20
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112a400>] ? sync_page+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540023>] io_schedule+0x73/0xc0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112a43b>] sync_page+0x3b/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540f30>] __wait_on_bit+0x60/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112a5d0>] wait_on_page_bit+0x80/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b2c0>] ? wake_bit_function+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112a90a>] wait_on_page_writeback_range.part.36+0xea/0x180
Jul 30 23:37:10 xx5 kernel: [<ffffffff8125cd1d>] ? submit_bio+0x8d/0x1a0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112b2b5>] wait_on_page_writeback_range+0x15/0x20
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112bedf>] filemap_fdatawait+0x2f/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffffa01eed28>] jbd2_journal_commit_transaction+0x798/0x1410 [jbd2]
Jul 30 23:37:10 xx5 kernel: [<ffffffff810096d2>] ? __switch_to+0xc2/0x2f0
Jul 30 23:37:10 xx5 kernel: [<ffffffff81080ea8>] ? lock_timer_base.isra.52+0x38/0x70
Jul 30 23:37:10 xx5 kernel: [<ffffffffa01f50a8>] kjournald2+0xb8/0x200 [jbd2]
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b280>] ? autoremove_wake_function+0x0/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffffa01f4ff0>] ? kjournald2+0x0/0x200 [jbd2]
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109ace8>] kthread+0x88/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff810096d2>] ? __switch_to+0xc2/0x2f0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8100c22a>] child_rip+0xa/0x20
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109ac60>] ? kthread+0x0/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8100c220>] ? child_rip+0x0/0x20
Jul 30 23:37:10 xx5 kernel: INFO: task java:7645 blocked for more than 120 seconds.
Jul 30 23:37:10 xx5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 23:37:10 xx5 kernel: java          D ffff88027ac82be0     0  7645   5905  136 0x00020000
Jul 30 23:37:10 xx5 kernel: ffff88027ac35828 0000000000000086 0000000000000000 ffff88067d7fe180
Jul 30 23:37:10 xx5 kernel: ffff88027ac357e8 ffffffff8142a42c ffff881077c46bc0 ffffffff81382330
Jul 30 23:37:10 xx5 kernel: 0000000000000fe0 0000000100142fa1 ffff88027ac35fd8 ffff88027ac35fd8
Jul 30 23:37:10 xx5 kernel: Call Trace:
Jul 30 23:37:10 xx5 kernel: [<ffffffff8142a42c>] ? dm_table_unplug_all+0x5c/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffff81382330>] ? scsi_sg_alloc+0x0/0x60
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d5490>] ? sync_buffer+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540023>] io_schedule+0x73/0xc0
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d54d0>] sync_buffer+0x40/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540ddb>] __wait_on_bit_lock+0x5b/0xc0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112acee>] ? find_get_page+0x1e/0xa0
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d5490>] ? sync_buffer+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540ebc>] out_of_line_wait_on_bit_lock+0x7c/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b2c0>] ? wake_bit_function+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d5e96>] __lock_buffer+0x36/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffffa01ed93c>] do_get_write_access+0x8c/0x480 [jbd2]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa01ede80>] jbd2_journal_get_write_access+0x30/0x50 [jbd2]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa0234706>] __ext4_journal_get_write_access+0x36/0x80 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa020d953>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa020d9c9>] ext4_mark_inode_dirty+0x49/0x1a0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa0212d4f>] ext4_dirty_inode+0x3f/0x60 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffff811cb61b>] __mark_inode_dirty+0x3b/0x170
Jul 30 23:37:10 xx5 kernel: [<ffffffff811bff62>] ? mnt_clone_write+0x12/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff811bbb0a>] file_update_time+0x10a/0x1a0
Jul 30 23:37:10 xx5 kernel: [<ffffffffa0212f04>] ext4_page_mkwrite+0x74/0x3b0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffff8115a008>] do_wp_page+0x7a8/0x9a0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109f2f7>] ? hrtimer_try_to_cancel+0x47/0xd0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8115e955>] handle_pte_fault+0x545/0x1620
Jul 30 23:37:10 xx5 kernel: [<ffffffff810b9cda>] ? get_futex_key+0x15a/0x280
Jul 30 23:37:10 xx5 kernel: [<ffffffff8115fc69>] handle_mm_fault+0x239/0x310
Jul 30 23:37:10 xx5 kernel: [<ffffffff810447a1>] __do_page_fault+0x181/0x490
Jul 30 23:37:10 xx5 kernel: [<ffffffff8100bcce>] ? apic_timer_interrupt+0xe/0x20
Jul 30 23:37:10 xx5 kernel: [<ffffffff8100bcce>] ? apic_timer_interrupt+0xe/0x20
Jul 30 23:37:10 xx5 kernel: [<ffffffff81012899>] ? read_tsc+0x9/0x20
Jul 30 23:37:10 xx5 kernel: [<ffffffff810a560a>] ? ktime_get+0x6a/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffff815455ab>] do_page_fault+0x3b/0xa0
Jul 30 23:37:10 xx5 kernel: [<ffffffff815429c5>] page_fault+0x25/0x30
Jul 30 23:37:10 xx5 kernel: INFO: task syslogd:8398 blocked for more than 120 seconds.
Jul 30 23:37:10 xx5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 23:37:10 xx5 kernel: syslogd       D ffff880c73f30800     0  8398   7515  128 0x00020004
Jul 30 23:37:10 xx5 kernel: ffff880c717cfcd8 0000000000000086 0000000000000000 ffffffff8142a42c
Jul 30 23:37:10 xx5 kernel: ffff880c717cfc58 ffff88027db3dc00 ffff88024fbddec0 0000000000000282
Jul 30 23:37:10 xx5 kernel: ffff880c717cfc88 000000010013c8b6 ffff880c717cffd8 ffff880c717cffd8
Jul 30 23:37:10 xx5 kernel: Call Trace:
Jul 30 23:37:10 xx5 kernel: [<ffffffff8142a42c>] ? dm_table_unplug_all+0x5c/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112a400>] ? sync_page+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540023>] io_schedule+0x73/0xc0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112a43b>] sync_page+0x3b/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540f30>] __wait_on_bit+0x60/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112a5d0>] wait_on_page_bit+0x80/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b2c0>] ? wake_bit_function+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112a90a>] wait_on_page_writeback_range.part.36+0xea/0x180
Jul 30 23:37:10 xx5 kernel: [<ffffffff8114194f>] ? do_writepages+0x1f/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112b22b>] ? __filemap_fdatawrite_range+0x5b/0x60
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112b2b5>] wait_on_page_writeback_range+0x15/0x20
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112b32f>] filemap_write_and_wait_range+0x6f/0x80
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d1acf>] vfs_fsync_range+0xcf/0x1c0
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d1c2d>] vfs_fsync+0x1d/0x20
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d1e56>] do_fsync+0x66/0xa0
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d25f0>] sys_fsync+0x10/0x20
Jul 30 23:37:10 xx5 kernel: [<ffffffff8104afc0>] cstar_dispatch+0x7/0x2e
Jul 30 23:37:10 xx5 kernel: INFO: task mysqld:9612 blocked for more than 120 seconds.
Jul 30 23:37:10 xx5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 23:37:10 xx5 kernel: mysqld        D ffff88107e0f8800     0  9612   9306  121 0x00020000
Jul 30 23:37:10 xx5 kernel: ffff88107437f878 0000000000000082 ffff88107437f928 ffff88067d7fe180
Jul 30 23:37:10 xx5 kernel: ffff88107437f838 ffffffff8142a42c ffff88107437f948 ffffffffa019e4f0
Jul 30 23:37:10 xx5 kernel: ffff88107437f958 000000010014778e ffff88107437ffd8 ffff88107437ffd8
Jul 30 23:37:10 xx5 kernel: Call Trace:
Jul 30 23:37:10 xx5 kernel: [<ffffffff8142a42c>] ? dm_table_unplug_all+0x5c/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffffa019e4f0>] ? ipt_do_table+0x370/0x5f0 [ip_tables]
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d5490>] ? sync_buffer+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540023>] io_schedule+0x73/0xc0
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d54d0>] sync_buffer+0x40/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540ddb>] __wait_on_bit_lock+0x5b/0xc0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112acee>] ? find_get_page+0x1e/0xa0
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d5490>] ? sync_buffer+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540ebc>] out_of_line_wait_on_bit_lock+0x7c/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b2c0>] ? wake_bit_function+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d5e96>] __lock_buffer+0x36/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffffa01ed93c>] do_get_write_access+0x8c/0x480 [jbd2]
Jul 30 23:37:10 xx5 kernel: [<ffffffff814a6560>] ? ip_finish_output+0x0/0x330
Jul 30 23:37:10 xx5 kernel: [<ffffffffa01ede80>] jbd2_journal_get_write_access+0x30/0x50 [jbd2]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa0234706>] __ext4_journal_get_write_access+0x36/0x80 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa020d953>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa020d9c9>] ext4_mark_inode_dirty+0x49/0x1a0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa0212d4f>] ext4_dirty_inode+0x3f/0x60 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffff811cb61b>] __mark_inode_dirty+0x3b/0x170
Jul 30 23:37:10 xx5 kernel: [<ffffffff811bff62>] ? mnt_clone_write+0x12/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff811bbb0a>] file_update_time+0x10a/0x1a0
Jul 30 23:37:10 xx5 kernel: [<ffffffff81186d49>] ? ksize+0x19/0x70
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112b9b4>] __generic_file_write_iter+0x1e4/0x3e0
Jul 30 23:37:10 xx5 kernel: [<ffffffff814b02b3>] ? tcp_sendmsg+0x523/0xca0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112bc33>] __generic_file_aio_write+0x83/0xa0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112bcd8>] generic_file_aio_write+0x88/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffffa02061bf>] ext4_file_write+0xbf/0x1d0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119e42e>] do_sync_write+0xfe/0x140
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b280>] ? autoremove_wake_function+0x0/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ebc1>] vfs_write+0xa1/0x190
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ec39>] ? vfs_write+0x119/0x190
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119f07a>] sys_pwrite64+0x7a/0xa0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8104be7c>] sys32_pwrite+0x1c/0x20
Jul 30 23:37:10 xx5 kernel: [<ffffffff8104afc0>] cstar_dispatch+0x7/0x2e
Jul 30 23:37:10 xx5 kernel: INFO: task mysqld:9880 blocked for more than 120 seconds.
Jul 30 23:37:10 xx5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 23:37:10 xx5 kernel: mysqld        D ffff88027ac6c140     0  9880   9306  121 0x00020000
Jul 30 23:37:10 xx5 kernel: ffff880276d67878 0000000000000082 ffff88087964c020 ffff88067d7fe180
Jul 30 23:37:10 xx5 kernel: ffff880276d67838 ffffffff8142a42c ffff880276d67948 ffffffffa019e4f0
Jul 30 23:37:10 xx5 kernel: ffff880276d67958 000000010013db05 ffff880276d67fd8 ffff880276d67fd8
Jul 30 23:37:10 xx5 kernel: Call Trace:
Jul 30 23:37:10 xx5 kernel: [<ffffffff8142a42c>] ? dm_table_unplug_all+0x5c/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffffa019e4f0>] ? ipt_do_table+0x370/0x5f0 [ip_tables]
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d5490>] ? sync_buffer+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540023>] io_schedule+0x73/0xc0
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d54d0>] sync_buffer+0x40/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540ddb>] __wait_on_bit_lock+0x5b/0xc0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112acee>] ? find_get_page+0x1e/0xa0
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d5490>] ? sync_buffer+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff81540ebc>] out_of_line_wait_on_bit_lock+0x7c/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b2c0>] ? wake_bit_function+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d5e96>] __lock_buffer+0x36/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffffa01ed93c>] do_get_write_access+0x8c/0x480 [jbd2]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa01ede80>] jbd2_journal_get_write_access+0x30/0x50 [jbd2]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa0234706>] __ext4_journal_get_write_access+0x36/0x80 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa020d953>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa020d9c9>] ext4_mark_inode_dirty+0x49/0x1a0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa0212d4f>] ext4_dirty_inode+0x3f/0x60 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffff811cb61b>] __mark_inode_dirty+0x3b/0x170
Jul 30 23:37:10 xx5 kernel: [<ffffffff811bff62>] ? mnt_clone_write+0x12/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff811bbb0a>] file_update_time+0x10a/0x1a0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112b9b4>] __generic_file_write_iter+0x1e4/0x3e0
Jul 30 23:37:10 xx5 kernel: [<ffffffff811bafa1>] ? touch_atime+0x81/0x180
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112bc33>] __generic_file_aio_write+0x83/0xa0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112bcd8>] generic_file_aio_write+0x88/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffffa02061bf>] ext4_file_write+0xbf/0x1d0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119e42e>] do_sync_write+0xfe/0x140
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b280>] ? autoremove_wake_function+0x0/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ebc1>] vfs_write+0xa1/0x190
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119edc6>] ? vfs_read+0x116/0x190
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119f07a>] sys_pwrite64+0x7a/0xa0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8104be7c>] sys32_pwrite+0x1c/0x20
Jul 30 23:37:10 xx5 kernel: [<ffffffff8104afc0>] cstar_dispatch+0x7/0x2e
Jul 30 23:37:10 xx5 kernel: INFO: task httpd:9816 blocked for more than 120 seconds.
Jul 30 23:37:10 xx5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 23:37:10 xx5 kernel: httpd         D ffff88107e7d6ce0     0  9816   9801  128 0x00020000
Jul 30 23:37:10 xx5 kernel: ffff8810741d1c88 0000000000200082 0000000000000000 0000000000000009
Jul 30 23:37:10 xx5 kernel: ffff88026a5b0005 ffff880c73b6b4d8 ffff8810741d1c28 ffff8810741d1de0
Jul 30 23:37:10 xx5 kernel: ffff8810741d1cf8 0000000100146c7e ffff8810741d1fd8 ffff8810741d1fd8
Jul 30 23:37:10 xx5 kernel: Call Trace:
Jul 30 23:37:10 xx5 kernel: [<ffffffff81541622>] __mutex_lock_slowpath+0x122/0x170
Jul 30 23:37:10 xx5 kernel: [<ffffffff8154103a>] mutex_lock+0x2a/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112bcc1>] generic_file_aio_write+0x71/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffffa02061bf>] ext4_file_write+0xbf/0x1d0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119e42e>] do_sync_write+0xfe/0x140
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b280>] ? autoremove_wake_function+0x0/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffff811acbf1>] ? path_put+0x31/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffff811a469a>] ? vfs_fstatat+0x4a/0x60
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ebc1>] vfs_write+0xa1/0x190
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ef1a>] sys_write+0x4a/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8104afc0>] cstar_dispatch+0x7/0x2e
Jul 30 23:37:10 xx5 kernel: INFO: task httpd:9817 blocked for more than 120 seconds.
Jul 30 23:37:10 xx5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 23:37:10 xx5 kernel: httpd         D ffff88107d692ee0     0  9817   9801  128 0x00020000
Jul 30 23:37:10 xx5 kernel: ffff8810741cdc88 0000000000200082 0000000000000000 0000000000000009
Jul 30 23:37:10 xx5 kernel: ffff88024fa3e005 ffff880c73b6b4d8 ffff8810741cdc08 ffff8810741cdde0
Jul 30 23:37:10 xx5 kernel: ffff8810741cdcf8 00000001001475f1 ffff8810741cdfd8 ffff8810741cdfd8
Jul 30 23:37:10 xx5 kernel: Call Trace:
Jul 30 23:37:10 xx5 kernel: [<ffffffff81541622>] __mutex_lock_slowpath+0x122/0x170
Jul 30 23:37:10 xx5 kernel: [<ffffffff8154103a>] mutex_lock+0x2a/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112bcc1>] generic_file_aio_write+0x71/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffffa02061bf>] ext4_file_write+0xbf/0x1d0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119e42e>] do_sync_write+0xfe/0x140
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b280>] ? autoremove_wake_function+0x0/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffff811acbf1>] ? path_put+0x31/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffff811a469a>] ? vfs_fstatat+0x4a/0x60
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ebc1>] vfs_write+0xa1/0x190
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ef1a>] sys_write+0x4a/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8104afc0>] cstar_dispatch+0x7/0x2e
Jul 30 23:37:10 xx5 kernel: INFO: task httpd:9818 blocked for more than 120 seconds.
Jul 30 23:37:10 xx5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 23:37:10 xx5 kernel: httpd         D ffff881074176500     0  9818   9801  128 0x00020000
Jul 30 23:37:10 xx5 kernel: ffff88107142dc88 0000000000200086 0000000000000000 0000000000000009
Jul 30 23:37:10 xx5 kernel: ffff88024fa3e005 ffff880c73b6b4d8 ffff88107142dc08 ffff88107142dde0
Jul 30 23:37:10 xx5 kernel: ffff88107142dcf8 00000001001489b8 ffff88107142dfd8 ffff88107142dfd8
Jul 30 23:37:10 xx5 kernel: Call Trace:
Jul 30 23:37:10 xx5 kernel: [<ffffffff81541622>] __mutex_lock_slowpath+0x122/0x170
Jul 30 23:37:10 xx5 kernel: [<ffffffff8154103a>] mutex_lock+0x2a/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112bcc1>] generic_file_aio_write+0x71/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffffa02061bf>] ext4_file_write+0xbf/0x1d0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119e42e>] do_sync_write+0xfe/0x140
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b280>] ? autoremove_wake_function+0x0/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffff811acbf1>] ? path_put+0x31/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffff811a469a>] ? vfs_fstatat+0x4a/0x60
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ebc1>] vfs_write+0xa1/0x190
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ef1a>] sys_write+0x4a/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8104afc0>] cstar_dispatch+0x7/0x2e
Jul 30 23:37:10 xx5 kernel: INFO: task httpd:9819 blocked for more than 120 seconds.
Jul 30 23:37:10 xx5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 23:37:10 xx5 kernel: httpd         D ffff88107e2ea6c0     0  9819   9801  128 0x00020000
Jul 30 23:37:10 xx5 kernel: ffff88107144f988 0000000000200082 0000000000000000 0000000015880289
Jul 30 23:37:10 xx5 kernel: ffff88107144f928 ffffffff811d6ad6 ffff88107144f948 0000000000000000
Jul 30 23:37:10 xx5 kernel: 000000000001b9c0 000000010013fa3d ffff88107144ffd8 ffff88107144ffd8
Jul 30 23:37:10 xx5 kernel: Call Trace:
Jul 30 23:37:10 xx5 kernel: [<ffffffff811d6ad6>] ? __find_get_block_slow+0xb6/0x140
Jul 30 23:37:10 xx5 kernel: [<ffffffffa01edb1d>] do_get_write_access+0x26d/0x480 [jbd2]
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b2c0>] ? wake_bit_function+0x0/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffffa01ede80>] jbd2_journal_get_write_access+0x30/0x50 [jbd2]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa0234706>] __ext4_journal_get_write_access+0x36/0x80 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa020d953>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa020d9c9>] ext4_mark_inode_dirty+0x49/0x1a0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffffa0212d4f>] ext4_dirty_inode+0x3f/0x60 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffff811cb61b>] __mark_inode_dirty+0x3b/0x170
Jul 30 23:37:10 xx5 kernel: [<ffffffff811bff62>] ? mnt_clone_write+0x12/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff811bbb0a>] file_update_time+0x10a/0x1a0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112b9b4>] __generic_file_write_iter+0x1e4/0x3e0
Jul 30 23:37:10 xx5 kernel: [<ffffffff811b9369>] ? dput_nocache+0xc9/0x250
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112bc33>] __generic_file_aio_write+0x83/0xa0
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112bcd8>] generic_file_aio_write+0x88/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffffa02061bf>] ext4_file_write+0xbf/0x1d0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119e42e>] do_sync_write+0xfe/0x140
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b280>] ? autoremove_wake_function+0x0/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffff811acbf1>] ? path_put+0x31/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffff811a469a>] ? vfs_fstatat+0x4a/0x60
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ebc1>] vfs_write+0xa1/0x190
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ef1a>] sys_write+0x4a/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8104afc0>] cstar_dispatch+0x7/0x2e
Jul 30 23:37:10 xx5 kernel: INFO: task httpd:9820 blocked for more than 120 seconds.
Jul 30 23:37:10 xx5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 23:37:10 xx5 kernel: httpd         D ffff88107eb3e540     0  9820   9801  128 0x00020000
Jul 30 23:37:10 xx5 kernel: ffff88107146fc88 0000000000200086 0000000000000000 0000000000000009
Jul 30 23:37:10 xx5 kernel: ffff88024fa3e005 ffff880c73b6b4d8 ffff88107146fc08 ffff88107146fde0
Jul 30 23:37:10 xx5 kernel: ffff88107146fcf8 00000001001439b1 ffff88107146ffd8 ffff88107146ffd8
Jul 30 23:37:10 xx5 kernel: Call Trace:
Jul 30 23:37:10 xx5 kernel: [<ffffffff81541622>] __mutex_lock_slowpath+0x122/0x170
Jul 30 23:37:10 xx5 kernel: [<ffffffff8154103a>] mutex_lock+0x2a/0x50
Jul 30 23:37:10 xx5 kernel: [<ffffffff8112bcc1>] generic_file_aio_write+0x71/0x100
Jul 30 23:37:10 xx5 kernel: [<ffffffffa02061bf>] ext4_file_write+0xbf/0x1d0 [ext4]
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119e42e>] do_sync_write+0xfe/0x140
Jul 30 23:37:10 xx5 kernel: [<ffffffff8109b280>] ? autoremove_wake_function+0x0/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffff811acbf1>] ? path_put+0x31/0x40
Jul 30 23:37:10 xx5 kernel: [<ffffffff811a469a>] ? vfs_fstatat+0x4a/0x60
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ebc1>] vfs_write+0xa1/0x190
Jul 30 23:37:10 xx5 kernel: [<ffffffff8119ef1a>] sys_write+0x4a/0x90
Jul 30 23:37:10 xx5 kernel: [<ffffffff8104afc0>] cstar_dispatch+0x7/0x2e
Jul 30 23:37:11 xx5 pvestatd[3708]: WARNING: unable to connect to VM 165 socket - timeout after 31 retries
Jul 30 23:38:11 xx5 pvestatd[3708]: WARNING: unable to connect to VM 165 socket - timeout after 31 retries
Jul 30 23:38:14 xx5 vzdump[37987]: <root@pam> end task UPID:xx5:00009486:0001A2D9:51F89FAD:vzdump::root@pam: unexpected status
Jul 30 23:39:21 xx5 pvestatd[3708]: WARNING: unable to connect to VM 165 socket - timeout after 31 retries
Jul 30 23:39:25 xx5 pvedaemon[55213]: shutdown VM 165: UPID:xx5:0000D7AD:0002F43D:51F8A30D:qmshutdown:165:root@pam:
Jul 30 23:39:25 xx5 pvedaemon[3691]: <root@pam> starting task UPID:xx5:0000D7AD:0002F43D:51F8A30D:qmshutdown:165:root@pam:
Jul 30 23:39:28 xx5 pvedaemon[55213]: WARNING: unable to connect to VM 165 socket - timeout after 31 retries
Jul 30 23:39:28 xx5 pvedaemon[55213]: VM 165 qmp command failed - VM 165 qmp command 'system_powerdown' failed - unable to connect to VM 165 socket - timeout after 31 retries
Jul 30 23:39:28 xx5 pvedaemon[55213]: VM quit/powerdown failed
Jul 30 23:39:28 xx5 pvedaemon[3691]: <root@pam> end task UPID:xx5:0000D7AD:0002F43D:51F8A30D:qmshutdown:165:root@pam: VM quit/powerdown failed
Jul 30 23:39:28 xx5 pvedaemon[3686]: WARNING: unable to connect to VM 165 socket - timeout after 31 retries
Jul 30 23:39:33 xx5 pvedaemon[55298]: stop VM 165: UPID:xx5:0000D802:0002F758:51F8A315:qmstop:165:root@pam:
Jul 30 23:39:33 xx5 pvedaemon[3689]: <root@pam> starting task UPID:xx5:0000D802:0002F758:51F8A315:qmstop:165:root@pam:
Jul 30 23:39:35 xx5 pvedaemon[3686]: WARNING: unable to connect to VM 165 socket - timeout after 31 retries
Jul 30 23:39:36 xx5 pvedaemon[55298]: VM 165 qmp command failed - VM 165 qmp command 'quit' failed - unable to connect to VM 165 socket - timeout after 31 retries
Jul 30 23:39:36 xx5 pvedaemon[55298]: WARNING: VM quit/powerdown failed - terminating now with SIGTERM
Jul 30 23:39:46 xx5 pvedaemon[55298]: WARNING: VM still running - terminating now with SIGKILL
Jul 30 23:39:47 xx5 pvedaemon[3689]: <root@pam> end task UPID:xx5:0000D802:0002F758:51F8A315:qmstop:165:root@pam: OK
Jul 30 23:39:47 xx5 pvedaemon[3686]: WARNING: unable to connect to VM 165 socket - No such file or directory
 
The new kernel -108 did not help in my case - just an update, I attempted one single command-line backup with vzdump - results below (mirrored from another thread on same subject)

This particular server is a Dell R815 - See below for storage specs. PERC H700, SAS - Proxmox 1.9 worked fine, we did a clean upgrade from 1.9 to 3.0.x (complete system wipe and reinstall)

I attempted a manual backup of one OpenVZ container last night, which failed locking the entire system up. I have tried OpenVZ containers and KVM virtual machines, there is no rhyme or reason to this problem. Log output below and version info. Looks like it completes one backup, the files are there (on a brand new NFS backup system I turned up yesterday) and then it just craps out. Kernel logs are visible here: http://pastebin.com/UuiJ9Yz0

We have the same issue on a few brand new Dell R620s with the Perc H310 controller. Brand new clean installs of Prox 3.0.x - with only OpenVZ containers and local storage for backups. I have tried ext3 / ext4 and NFS mounts for vzdump backups, with the same results. I have also tried both deadline and CFQ schedulers as recommended.

I am happy to keep testing stuff out :) Hope this helps.

Joe Jenkins

root@xx5:/var/log# vzdump 107 -storage ProxBackupsNFS
INFO: starting new backup job: vzdump 107 --storage ProxBackupsNFS
INFO: Starting Backup of VM 107 (openvz)
INFO: CTID 107 exist mounted running
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: trying to remove stale snapshot '/dev/xx5/vzsnap-xx5-0'
INFO: umount: /mnt/vzsnap0: not mounted
ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1
INFO: Logical volume "vzsnap-xx5-0" successfully removed
INFO: creating lvm snapshot of /dev/mapper/xx5-vz ('/dev/xx5/vzsnap-xx5-0')
INFO: Logical volume "vzsnap-xx5-0" created
INFO: creating archive '/mnt/pve/ProxBackupsNFS/dump/vzdump-openvz-107-2013_07_31-21_34_17.tar'
INFO: Total bytes written: 3482634240 (3.3GiB, 5.9MiB/s)
INFO: archive file size: 3.24GB


^^ at this point backup hangs, load skyrockets, server requires reboot


top - 21:59:15 up 22:11, 3 users, load average: 195.39, 147.68, 79.54
Tasks: 1732 total, 1 running, 1731 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.0 us, 0.3 sy, 0.0 ni, 75.0 id, 22.7 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 65954312 total, 42607184 used, 23347128 free, 784576 buffers
KiB Swap: 35651576 total, 0 used, 35651576 free, 27281784 cached
Write failed: Broken pipe


root@xx5:/mnt/pve/ProxBackupsNFS/dump# ls -la
total 12
drwxr-xr-x 3 root root 4096 Jul 31 21:45 .
drwxr-xr-x 4 root root 4096 Jul 31 15:26 ..
-rw-r--r-- 1 root root 3482634240 Jul 31 21:45 vzdump-openvz-107-2013_07_31-21_34_17.tar
drwxr-xr-x 2 root root 4096 Jul 31 21:34 vzdump-openvz-107-2013_07_31-21_34_17.tmp
root@xx5:/mnt/pve/ProxBackupsNFS/dump#




YY6G3
1
Assembly, Cable, PERC7, 2.5, R810, V2

0P77F
0
DISK PROGRAM..., FLASH..., FW6, S-FF

NU209
1
Battery, Primary, 7WHR, 1C Lithium Ion, PERCI

TCVRR
2
Power Supply, 1100W, Redundant, LITEON

XXFVX
1
Assembly, Card, PERCH700-INT, 512M, Serial Attached Scsi

G176J
1
Assembly, Carrier, Hard Drive SAS-SATAU, 2.5, V2

7260R
0
PREPARATION MATERIAL..., DEVIATION..., SERVICE CHARGE..., INCRS #2

X160K
1
HARD DRIVE..., 146G, SAS6, 10K, 2.5, SGT-FF
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!