proxmox 2 issue during the backup

RRJ

Member
Apr 14, 2010
245
0
16
Estonia, Tallinn
hi,

got a call today. my proxmox server was up and running, but all virtual machines were down.
from the web menu got this:
Code:
[COLOR=#000000][FONT=tahoma]
INFO: trying to get global lock - waiting...[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: can't aquire lock '/var/run/vzdump.lock' - got timeout

[/FONT][/COLOR]

before this i had in logs this:

Code:
[COLOR=#000000][FONT=tahoma]NFO: starting new backup job: vzdump 101 103 104 106 109 --quiet 1 --mode snapshot --compress lzo --storage Backup[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 101 (openvz)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: CTID 101 exist mounted running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-sisemon-0')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-sisemon-0" created[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/usr/backup/dump/vzdump-openvz-101-2012_04_14-00_00_03.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Total bytes written: 832348160 (794MiB, 15MiB/s)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: archive file size: 459MB[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: delete old backup '/usr/backup/dump/vzdump-openvz-101-2012_04_07-00_00_02.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Finished Backup of VM 101 (00:01:23)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 103 (openvz)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: CTID 103 exist mounted running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-sisemon-0')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-sisemon-0" created[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/usr/backup/dump/vzdump-openvz-103-2012_04_14-00_01_26.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Total bytes written: 898713600 (858MiB, 12MiB/s)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: archive file size: 434MB[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: delete old backup '/usr/backup/dump/vzdump-openvz-103-2012_04_07-00_01_25.tar.lzo'[/FONT][/COLOR]

both events were marked as "unexpected status"
do You guys need some more information on this issue, or can you suggest, what could be a reason for such behavior?
 
got a call today. my proxmox server was up and running, but all virtual machines were down.
from the web menu got this:
Code:
[COLOR=#000000][FONT=tahoma]
INFO: trying to get global lock - waiting...[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: can't aquire lock '/var/run/vzdump.lock' - got timeout

[/FONT][/COLOR]

When do you get that information exactly?

do You guys need some more information on this issue, or can you suggest, what could be a reason for such behavior?

The backup log you posted looks quite normal.
 
Last edited by a moderator:
Hello,

I am having the same problem: The backup job does not finish and the virtual machines are not accessible after that (at least not by network, i did not try to login to the console via the proxmox gui).

I tried to reboot the machine with "shutdown -r now" but the host didn't react to the command. After a cold reset i see the "Error: Unexpected Status" message in the "Tasks" Window of the GUI.
The log in the GUI for the backup job looks pretty normal at the beginning, but then just stops:
--
INFO: starting new backup job: vzdump --quiet 1 --mailto proxmox@XXXXXXXXXXXXX.de --mode snapshot --compress lzo --storage daten1 --all 1
INFO: Starting Backup of VM 101 (openvz)
INFO: CTID 101 exist mounted running
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-proxmox-0')
INFO: Logical volume "vzsnap-proxmox-0" created
INFO: creating archive '/mnt/daten1/dump/vzdump-openvz-101-2012_04_16-02_30_01.tar.lzo'
INFO: Total bytes written: 1789450240 (1.7GiB, 36MiB/s)
INFO: archive file size: 1.09GB
INFO: delete old backup '/mnt/daten1/dump/vzdump-openvz-101-2012_04_15-02_30_01.tar.lzo'
--

(there should follow the backups of the other virtual machines)


I had this yesterday and a couple of days before.

These are error messages in kernel.log at the scheduled backup time yesterday night (There are several blocks like this)

----
Apr 16 02:33:50 proxmox kernel: INFO: task bash:1035489 blocked for more than 120 seconds.
Apr 16 02:33:50 proxmox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 16 02:33:50 proxmox kernel: bash D ffff88043abed0c0 0 1035489 1035488 101 0x00020000
Apr 16 02:33:50 proxmox kernel: ffff8802159f76f8 0000000000000086 0000000000000000 ffffffff811cca57
Apr 16 02:33:50 proxmox kernel: ffff8802159f7748 ffffffff811cd34d ffffffffa0097850 0000000000000286
Apr 16 02:33:50 proxmox kernel: ffff8802159f76a8 ffff88043abed660 ffff8802159f7fd8 ffff8802159f7fd8
Apr 16 02:33:50 proxmox kernel: Call Trace:
Apr 16 02:33:50 proxmox kernel: [<ffffffff811cca57>] ? mpage_bio_submit+0x27/0x30
Apr 16 02:33:50 proxmox kernel: [<ffffffff811cd34d>] ? mpage_readpages+0x11d/0x140
Apr 16 02:33:50 proxmox kernel: [<ffffffffa0097850>] ? ext3_get_block+0x0/0x120 [ext3]
Apr 16 02:33:50 proxmox kernel: [<ffffffff81120bf0>] ? sync_page+0x0/0x50
Apr 16 02:33:50 proxmox kernel: [<ffffffff81512663>] io_schedule+0x73/0xc0
Apr 16 02:33:50 proxmox kernel: [<ffffffff81120c2d>] sync_page+0x3d/0x50
Apr 16 02:33:50 proxmox kernel: [<ffffffff8151302f>] __wait_on_bit+0x5f/0x90
Apr 16 02:33:50 proxmox kernel: [<ffffffff81120de3>] wait_on_page_bit+0x73/0x80
Apr 16 02:33:50 proxmox kernel: [<ffffffff810944f0>] ? wake_bit_function+0x0/0x40
Apr 16 02:33:50 proxmox kernel: [<ffffffff81120e8a>] __lock_page_or_retry+0x3a/0x60
Apr 16 02:33:50 proxmox kernel: [<ffffffff8112241f>] filemap_fault+0x20f/0x560
Apr 16 02:33:50 proxmox kernel: [<ffffffff8115302e>] __do_fault+0x7e/0x620
Apr 16 02:33:50 proxmox kernel: [<ffffffff811536c9>] handle_pte_fault+0xf9/0xf60
Apr 16 02:33:50 proxmox kernel: [<ffffffff8117333a>] ? alloc_pages_current+0xaa/0x120
Apr 16 02:33:50 proxmox kernel: [<ffffffff811500e2>] ? __pte_alloc+0xd2/0x350
Apr 16 02:33:50 proxmox kernel: [<ffffffff81154714>] handle_mm_fault+0x1e4/0x2b0
Apr 16 02:33:50 proxmox kernel: [<ffffffff81042a79>] __do_page_fault+0x139/0x490
Apr 16 02:33:50 proxmox kernel: [<ffffffff81157d7c>] ? __vma_link_file+0x4c/0x80
Apr 16 02:33:50 proxmox kernel: [<ffffffff81157e8b>] ? vma_link+0x9b/0xf0
Apr 16 02:33:50 proxmox kernel: [<ffffffff8115ab97>] ? mmap_region+0x347/0x770
Apr 16 02:33:50 proxmox kernel: [<ffffffff815179ae>] do_page_fault+0x3e/0xa0
Apr 16 02:33:50 proxmox kernel: [<ffffffff81514d15>] page_fault+0x25/0x30
Apr 16 02:33:50 proxmox kernel: [<ffffffff81272c2f>] ? __clear_user+0x3f/0x70
Apr 16 02:33:50 proxmox kernel: [<ffffffff81272c11>] ? __clear_user+0x21/0x70
Apr 16 02:33:50 proxmox kernel: [<ffffffff81272c98>] clear_user+0x38/0x40
Apr 16 02:33:50 proxmox kernel: [<ffffffff811e955d>] padzero+0x2d/0x40
Apr 16 02:33:50 proxmox kernel: [<ffffffff811eb52e>] load_elf_binary+0x8de/0x1b30
Apr 16 02:33:50 proxmox kernel: [<ffffffff8114e217>] ? follow_page+0x337/0x480
Apr 16 02:33:50 proxmox kernel: [<ffffffff815179ae>] ? do_page_fault+0x3e/0xa0
Apr 16 02:33:50 proxmox kernel: [<ffffffff811e49ae>] ? load_misc_binary+0xbe/0x400
Apr 16 02:33:50 proxmox kernel: [<ffffffff81514d15>] ? page_fault+0x25/0x30
Apr 16 02:33:50 proxmox kernel: [<ffffffff81272ad6>] ? strnlen_user+0x36/0x90
Apr 16 02:33:50 proxmox kernel: [<ffffffff811968f5>] search_binary_handler+0xf5/0x330
Apr 16 02:33:50 proxmox kernel: [<ffffffff81197e71>] do_execve+0x241/0x340
Apr 16 02:33:50 proxmox kernel: [<ffffffff8100966a>] sys_execve+0x4a/0x80
Apr 16 02:33:50 proxmox kernel: [<ffffffff8100b5da>] stub_execve+0x6a/0xc0
Apr 16 02:35:50 proxmox kernel: INFO: task lvremove:1035467 blocked for more than 120 seconds.
Apr 16 02:35:50 proxmox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 16 02:35:50 proxmox kernel: lvremove D ffff880042722600 0 1035467 1035187 0 0x00000000
Apr 16 02:35:50 proxmox kernel: ffff88024c8a7b18 0000000000000082 ffff88024c8a7ad8 ffffffff8140c1bc
Apr 16 02:35:50 proxmox kernel: 0000000000000008 0000000000001000 0000000000000000 000000000000000c
Apr 16 02:35:50 proxmox kernel: ffff88024c8a7b08 ffff880042722ba0 ffff88024c8a7fd8 ffff88024c8a7fd8
Apr 16 02:35:50 proxmox kernel: Call Trace:
-----

The host is a simple machine for home use (Q67 board with i5, SATA disks, no raid)

Virtual discs and backups are on an ext4 formatted disc (Storage type "Directory").

Backup mode was "Snapshot" (I have set it now to "Suspend" to see this prevents the failure)

pveversion -v is

--
pve-manager: 2.0-59 (pve-manager/2.0/18400f07)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-66
pve-kernel-2.6.32-10-pve: 2.6.32-63
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.88-2pve2
clvm: 2.02.88-2pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-38
pve-firmware: 1.0-15
libpve-common-perl: 1.0-26
libpve-access-control: 1.0-18
libpve-storage-perl: 2.0-17
vncterm: 1.0-2
vzctl: 3.0.30-2pve2
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1
--
 
It looks like the problem that is floating around in the forum.

We have the same problem some times, but this week we had none, after the latest kernel.

what kind of raid chipset do you use? What driver? We use the:
04:00.0 SCSI storage controller [0100]: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS [1000:0056] (rev 08)
 
No RAID, just a standard Intel Board (Sandy Bridge) with standard SATA drives.

It's a small simple machine (for home use - unfortunately at work i have to use VMWare ;-) ). With my post i just wanted to give a hint that this problem also occurs without RAID controller - so it may not only be related to RAID controller drivers.
I now switched to Backup in "Suspend" Mode and on an ext3 (instead of ext4) formatted drive to see if this works better.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!