Node crashed during backup

BeNe

New Member
Mar 30, 2012
11
0
1
Hello together,

i installed Proxmox 2.0 on a Hetzner Server (EX4S). This system is/was online since 4 days and everthing was ok - until tonight.
The Proxmox Node stopped working. There was a ping response but not Webinterface, SSH or any virtual machine was online.
The system is now online again, after a reset.

About the System:

Code:
root@miraculix / # pveversion -v
pve-manager: 2.0-59 (pve-manager/2.0/18400f07)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-66
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.88-2pve2
clvm: 2.02.88-2pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-38
pve-firmware: 1.0-15
libpve-common-perl: 1.0-26
libpve-access-control: 1.0-18
libpve-storage-perl: 2.0-17
vncterm: 1.0-2
vzctl: 3.0.30-2pve2
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

Code:
root@miraculix / # uname -a
Linux miraculix 2.6.32-11-pve #1 SMP Wed Apr 11 07:17:05 CEST 2012 x86_64 GNU/Linux

I use EXT4 on this node.
Code:
root@miraculix / # df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg0-root  5.0G  384M  4.4G   8% /
tmpfs                  16G     0   16G   0% /lib/init/rw
udev                   16G  272K   16G   1% /dev
tmpfs                  16G   19M   16G   1% /dev/shm
/dev/md1              496M   71M  401M  15% /boot
/dev/mapper/vg0-usr   8.0G  583M  7.0G   8% /usr
/dev/mapper/vg0-home  8.0G  146M  7.4G   2% /home
/dev/mapper/vg0-tmp   4.0G  136M  3.7G   4% /tmp
/dev/mapper/vg0-var   1.5T   75G  1.4T   6% /var
/dev/mapper/vg0-backup
                      745G  237G  471G  34% /backup
/dev/fuse              30M   16K   30M   1% /etc/pve


Logs:

Code:
Apr 18 05:45:01 miraculix /USR/SBIN/CRON[965456]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 18 05:48:44 miraculix vzdump[954131]: INFO: Finished Backup of VM 100 (00:48:42)
Apr 18 05:48:44 miraculix vzdump[954131]: INFO: Starting Backup of VM 102 (openvz)
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): warning: maximal mount count reached, running e2fsck is recommended
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 104 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 13774587
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 104 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 13774586
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 104 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 13774585
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 104 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 13767858
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 104 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 13767857
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 14155803
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 100 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 18243961
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 100 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 25168130
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 100 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 25168129
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 100 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 25168128
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 100 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 25168127
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 100 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 25168126
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 102 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 12059227
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 102 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 12058719
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 102 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 12058713
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 102 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 12058694
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 102 are broken: no quota engine running
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 12192010
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): 18 orphan inodes deleted
Apr 18 05:48:51 miraculix kernel: EXT4-fs (dm-6): recovery complete
Apr 18 05:48:51 miraculix kernel: EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts:
Apr 18 05:50:01 miraculix /USR/SBIN/CRON[966792]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
Apr 18 05:50:01 miraculix /USR/SBIN/CRON[966793]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)

REBOOT 

Apr 18 07:57:31 miraculix kernel: imklog 4.6.4, log source = /proc/kmsg started.
Apr 18 07:57:31 miraculix rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="1472" x-info="http://www.rsyslog.com"] (re)start
Apr 18 07:57:31 miraculix kernel: Initializing cgroup subsys cpuset
Apr 18 07:57:31 miraculix kernel: Initializing cgroup subsys cpu
Apr 18 07:57:31 miraculix kernel: Linux version 2.6.32-11-pve (root@maui) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Wed Apr 11 07:17:05 CEST 2012
Apr 18 07:57:31 miraculix kernel: Command line: BOOT_IMAGE=/vmlinuz-2.6.32-11-pve root=/dev/mapper/vg0-root ro
Apr 18 07:57:31 miraculix kernel: KERNEL supported cpus:
Apr 18 07:57:31 miraculix kernel:  Intel GenuineIntel
Apr 18 07:57:31 miraculix kernel:  AMD AuthenticAMD
Apr 18 07:57:31 miraculix kernel:  Centaur CentaurHauls
Apr 18 07:57:31 miraculix kernel: BIOS-provided physical RAM map:
Apr 18 07:57:31 miraculix kernel: BIOS-e820: 0000000000000000 - 000000000009d800 (usable)
Apr 18 07:57:31 miraculix kernel: BIOS-e820: 000000000009d800 - 00000000000a0000 (reserved)
Apr 18 07:57:31 miraculix kernel: BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
Apr 18 07:57:31 miraculix kernel: BIOS-e820: 0000000000100000 - 0000000020000000 (usable)
Apr 18 07:57:31 miraculix kernel: BIOS-e820: 0000000020000000 - 0000000020200000 (reserved)
Apr 18 07:57:31 miraculix kernel: BIOS-e820: 0000000020200000 - 0000000040000000 (usable)
Apr 18 07:57:31 miraculix kernel: BIOS-e820: 0000000040000000 - 0000000040200000 (reserved)
Apr 18 07:57:31 miraculix kernel: BIOS-e820: 0000000040200000 - 00000000bac14000 (usable)
....
....

The Server crashed during the Backup of VM 102. VM 100 was backuped well. I use the snapshot method.

Found more or less the same problem in this Forum here --> http://forum.proxmox.com/threads/5573-OpenVZ-creation-fails-CRASH?p=31592#post31592
Seems it is related to EXT4 and a Kernel problem ?

Thanks for any help.
 
Thanks for your anwser.
Well, you are right. But i´m still searching for the error that the whole node let crash.
These errors are not normal
Code:
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): warning: maximal mount count reached, running e2fsck is recommended 
Apr 18 05:48:49 miraculix kernel: VZDQ: Tried to clean orphans on qmblk with 1 state 
Apr 18 05:48:49 miraculix kernel: BUG: Quota files for 104 are broken: no quota engine running 
Apr 18 05:48:49 miraculix kernel: EXT4-fs (dm-6): ext4_orphan_cleanup: deleting unreferenced inode 13774587 
....
...

Thanks!
 
I get the same thing on a default install from PVE ISO + updates. updating a moodle VZ server. Server becomes unresponsive and console displays this:Prox hang.jpg
 

Attachments

  • Prox hang.jpg
    Prox hang.jpg
    231 KB · Views: 7
I get the same thing on a default install from PVE ISO + updates. updating a moodle VZ server. Server becomes unresponsive and console displays this:View attachment 896

Looks like your IO has stalled.
I see lvremove is one of the tasks that is hung and seems to be the longest hung task, more evidence that there is an issue with LVM.
There are other threads dealing with similar issues including this one I started: http://forum.proxmox.com/threads/9240-Snapshot-removal-fails-after-backup
 
Well this server does seem to have very poor IO (for writes): It's a IBM system X 3650 with 1x Xeon E5520 Proc & 8Gb ram. It has 6 HDD in raid 1+0 on an IBM/LSI raid card with BBU . I built a 2k3 server in KVM on the host leaving everything as default options (but with VirtIO net/disk) during the build and ran the ATTO bench mark. Read speeds in the VM were between 200 & 250 MB/s, but the write speeds never topped 21-22 MB/s. The server build is only 2 days old. Any idea how I could improve the Write speeds?
 
Since we use LVM on our Server, does it make sense to change only the backup volume form ext4 -> ext3 ?
So the vzdump would write on a ext3 instead of a ext4 volume. Or did i missed something ?

Thanks!
 
The change from ext4 -> ext3 for the backups storage doesn´t have any effect.
During the backup (snapshot) process a new temp. LVM is created with ext4. where all files are stored.

I need to know if this is only an ext4 filesystem problem that the node let crash.
If so, we would reinstall the whole Server and use ext3 instead of ext4.
The Server needs to be stable!

Thanks for any help.
 
I need to know if this is only an ext4 filesystem problem that the node let crash.
If so, we would reinstall the whole Server and use ext3 instead of ext4.

I suggest you test it (default PVE setup use ext3).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!