Updates for Proxmox VE pvetest repo - including new KVM live backup

martin

Proxmox Staff Member
Staff member
Apr 28, 2005
754
1,740
223
We just moved a bunch of packages to our pvetest repository (on the road to Proxmox VE 2.3), including some quite cool new functions like KVM live backups (no more LVM needed for KVM backups) and Auto-Ballooning, Disk Resize on the GUI, ...

Two new Wiki pages
http://pve.proxmox.com/wiki/Backup_and_Restore
http://pve.proxmox.com/wiki/Dynamic_Memory_Management

Release Notes
- pve-manager 2.3-6

  • support new qemu vma backup files
  • extend memory GUI to support ballooning
  • implement auto-ballooning
  • add HD resize feature to expand disks.
  • Display 'Resume' button when VM is suspended
- qemu-server 2.3-8

  • support new qemu vma backup files
  • do not delete unused files on restore.
  • fix Bug #293: CDROM size not reset when set to use no media
  • use memory info from balloon driver (if enabled)
  • QMPClient: support fdsets
- pve-qemu-kvm 1.3-17

  • update vma backup patches
  • fix DSA-2608-1
- pve-kernel-2.6.32 2.6.32-88

  • include latest Broadcom bnx2/bnx2x drivers
  • include latest Adaptec aacraid driver
  • update e1000e to 2.2.14
  • update igb to 4.1.2
  • update ixgbe to 3.12.6
  • enable CONFIG_RT_GROUP_SCHED (also update corosync of you install this kernel)
- corosync-pve 1.4.4-4

  • run at high priority using setpriority (-20)
  • disable SCHED_RR (RT_GROUP_SCHED was disabled in our kernel anyways). In newer kernels this is enabled, but causes corosync to crash.
- libiscsi 1.8.0-1

  • update to 1.8.0
We also uploaded new ceph packages for testing.

Everybody is encouraged to test and give feedback!
__________________
Best regards,

Martin Maurer
Proxmox VE project leader
 
Thank you for the latest Broadcom drivers. That will help out a lot since i was just about to embark on a wholesale replacment of of broadcoms with intel.

Will drop on test server and do some live migrations btw current running 2.2 with broadcom and the latest 2.2(3) release.

What stable ovpen vz do we translate into?
 
This is just an information, I don't think my information is helpful enough to locate the problem. But since the KVM live backup is a quite nice feature and this is a call for tests just my notes.

I've installed proxmox 2.3 beta yesterday and tried the new backup system. It seems to work fine but sometimes (in my case 2 backups from about 20) the whole system stops responding. I assume there's a kernel panic but this system is a hosted server without KVM access so I can't really check (e.g. take a look at the console) what's wrong.

from syslog:
Jan 30 00:00:01 pobo vzdump[9248]: <root@pam> starting task UPID:pobo:00002421:0012BB01:51085471:vzdump::root@pam:
Jan 30 00:00:01 pobo vzdump[9249]: INFO: starting new backup job: vzdump --quiet 1 --mailto xxx@example.com --mode snapshot --compress lzo --storage backup --all 1
Jan 30 00:00:01 pobo vzdump[9249]: INFO: Starting Backup of VM 100 (qemu)
Jan 30 00:00:02 pobo qm[9254]: <root@pam> update VM 100: -lock backup
Jan 30 09:26:37 pobo kernel: imklog 4.6.4, log source = /proc/kmsg started.

from /var/log/vzdump/qemu-100.log:
Jan 30 00:00:01 INFO: Starting Backup of VM 100 (qemu)
Jan 30 00:00:01 INFO: status = running
Jan 30 00:00:02 INFO: backup mode: snapshot
Jan 30 00:00:02 INFO: ionice priority: 7
Jan 30 00:00:02 INFO: creating archive '/mnt/backup/dump/vzdump-qemu-100-2013_01_30-00_00_01.vma.lzo'
Jan 30 00:00:02 INFO: started backup task '0f071ac3-6bef-4ee6-8f96-b6463104a51f'
Jan 30 00:00:05 INFO: status: 0% (243138560/34359738368), sparse 0% (149667840), duration 3, 81/31 MB/s
Jan 30 00:00:08 INFO: status: 1% (473038848/34359738368), sparse 0% (161615872), duration 6, 76/72 MB/s
Jan 30 00:00:11 INFO: status: 2% (760217600/34359738368), sparse 0% (223019008), duration 9, 95/75 MB/s
...
Jan 30 00:01:53 INFO: status: 36% (12564955136/34359738368), sparse 33% (11386163200), duration 111, 129/0 MB/s
Jan 30 00:01:56 INFO: status: 37% (12913737728/34359738368), sparse 34% (11731509248), duration 114, 116/1 MB/s
Jan 30 00:01:59 INFO: status: 38% (13295419392/34359738368), sparse 34% (12005773312), duration 117, 127/35

pveperf
CPU BOGOMIPS: 24743.24
REGEX/SECOND: 1304641
HD SIZE: 7.34 GB (/dev/md1)
BUFFERED READS: 114.42 MB/sec
AVERAGE SEEK TIME: 8.23 ms
FSYNCS/SECOND: 636.58
DNS EXT: 28.93 ms
DNS INT: 11.55 ms (example.com)

pveversion --verbose
pve-manager: 2.3-6 (pve-manager/2.3/bcbdbcc8)
running kernel: 2.6.32-18-pve
pve-kernel-2.6.32-18-pve: 2.6.32-88
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-8
pve-firmware: not correctly installed
libpve-common-perl: 1.0-44
libpve-access-control: 1.0-25
libpve-storage-perl: 2.3-2
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1

The system is a very cheap solution with one Intel(R) Core(TM) i5-3450 CPU @ 3.10GHz, a MSI MS-7756 motherboard, 32GB DDR3 RAM, 2x Hitachi HDS5C3020ALA632. The directory /var/lib/vz is an ext3 filesystem on LVM on softraid1. The backup destination is an ext4 filesystem on LVM (without raid).
 
I have been testing the new system for two hours now. The new features are very interesting. I've tested the backup-restore feature on a linux kvm-guest. The backup runs and shows its progress in percent. Very convenient, really. The restore speed is really amazing, the 1.49 GB KVM .tgz backup was restored from NFS share within 20 seconds I think. It looks very tricky :) but it works good.
The new bnx2x driver seems to be stable as well. The HP Blade BL460c G6 server runs without crashes and kernel panics.
Disk resize from the gui is also very convenient.
I'm going to do some tests with the Windows KVM guest.
 
Hi,Thank you for your efforts and providing this much appreciated update.I believe I have found a potential issue with the javascript code - it works perfectly in some browsers however fails in others, this is because of the use of the keyword "delete" as a variable name.Example:File : Pvemanagerlib.jsLine: 9955res.delete = "balloon,shares";However delete is a reserved word in Javascript.Although PVE Client works great with the latest versions of Google Chrome, this causes problems with other browsers (eg chromium, ephphany, etc).The error displated in these browsers is : Uncaught SyntaxError: Unexpected token deleteI would suggest renaming the variable from "delete" to an alternative, such as "del" or "deletex"Once again, thank you very much for the update, I hope my feedback will help make Proxmox a better product.
 
Sorry about formatting of last message, I will try posting again:


Thank you for your efforts and providing this much appreciated update.I believe I have found a potential issue with the javascript code - it works perfectly in some browsers however fails in others, this is because of the use of the keyword "delete" as a variable name.

Example:

File : Pvemanagerlib.js

Line: 9955

res.delete = "balloon,shares";

However delete is a reserved word in Javascript.

Although PVE Client works great with the latest versions of Google Chrome, this causes problems with other browsers (eg chromium, ephphany, etc).

The error displated in these browsers is : Uncaught SyntaxError: Unexpected token delete

I would suggest renaming the variable from "delete" to an alternative, such as "del" or "deletex".

Once again, thank you very much for the update, I hope my feedback will help make Proxmox a better product.
 
The server just crashed again but this time I found the following inside syslog after reboot:

Jan 30 13:05:16 pobo qm[8754]: <root@pam> update VM 101: -lock backup
Jan 30 13:05:17 pobo kernel: device tap101i0 entered promiscuous mode
Jan 30 13:05:17 pobo kernel: vmbr0: port 5(tap101i0) entering learning state
Jan 30 13:05:28 pobo kernel: tap101i0: no IPv6 routers present
Jan 30 13:05:30 pobo kernel: vmbr0: port 5(tap101i0) entering disabled state
Jan 30 13:05:30 pobo kernel: vmbr0: port 5(tap101i0) entering disabled state
Jan 30 13:05:31 pobo kernel: swap_free: Unused swap offset entry 00000008
Jan 30 13:05:31 pobo kernel: BUG: Bad page map in process qm pte:00001000 pmd:2aee2c067
Jan 30 13:05:31 pobo kernel: addr:00007f2cd80aa000 vm_flags:080000fb anon_vma:(null) mapping:ffff88081aa91378 index:c5
Jan 30 13:05:31 pobo kernel: vma->vm_ops->fault: shmem_fault+0x0/0x80
Jan 30 13:05:31 pobo kernel: vma->vm_file->f_op->mmap: shmem_mmap+0x0/0x40
Jan 30 13:05:31 pobo kernel: Pid: 8789, comm: qm veid: 0 Not tainted 2.6.32-18-pve #1
Jan 30 13:05:31 pobo kernel: Call Trace:
Jan 30 13:05:31 pobo kernel: [<ffffffff81154e08>] ? print_bad_pte+0x1d8/0x290
Jan 30 13:05:31 pobo kernel: [<ffffffff81156706>] ? unmap_vmas+0x5f6/0xce0
Jan 30 13:05:31 pobo kernel: [<ffffffff811604e7>] ? exit_mmap+0x87/0x190
Jan 30 13:05:31 pobo kernel: [<ffffffff8106a0dc>] ? mmput+0x5c/0x1f0
Jan 30 13:05:31 pobo kernel: [<ffffffff810708b9>] ? exit_mm+0x109/0x150
Jan 30 13:05:31 pobo kernel: [<ffffffff81072687>] ? do_exit+0x187/0x930
Jan 30 13:05:31 pobo kernel: [<ffffffff81072e88>] ? do_group_exit+0x58/0xd0
Jan 30 13:05:31 pobo kernel: [<ffffffff81072f17>] ? sys_exit_group+0x17/0x20
Jan 30 13:05:31 pobo kernel: [<ffffffff8100b102>] ? system_call_fastpath+0x16/0x1b
Jan 30 13:05:31 pobo kernel: Disabling lock debugging due to kernel taint
Jan 30 13:05:31 pobo vzdump[8528]: INFO: Finished Backup of VM 101 (00:00:15)
Jan 30 13:05:31 pobo vzdump[8528]: INFO: Starting Backup of VM 102 (qemu)
 
Good envening.

On the Post with the Backup Failure,

There are couple of items that I can see:

PVEfirmware is showing not propertly installed: It typically should show installed and give version number.

You mentioned 2 of 20 keep crashing, it it the same two that keep crashing?

What OS are you running in the KVMs? Are they all the same OS and do any have different applictions running in them that may alter or increase disk IO unexpectedly during backup.

Have you tried if possible to suspend or stop the machine and then backup?

You mention software raid. Software raid is the subject of many postings and appears to be implicated in number of problems that people are having. Do I do use software raid. YES but only on a machine running a dedicated OS, never on a a box that can run multiple OS instances, kvm or CT. I have just seen too many things go wrong with software raid when there are IO/CPU intensive tasks happen all at the same plus recovery can be painfu and you might as well have standalong drives. I really go for hardware raid from a good quality vendor but when thats not possible due to various budget considerations, IMHO the alternative is to opt for the fastest WD enterprise drives that can be afforded and stack multiple drives in the box and then distribute KVMS/CTS on the various drivers. That way the HN root/core os drive is not getting beat and you are distributing the load accross multiple drives.

One of the problems with software raid is it is highly dependant upon quality of the vendor software implementation, quality of the firmware and quality of motherboard comoponents in general. It is so CPU/IO intensive on the bus where as with the raid controller, it on the card itself and the raid controller can use it own memory to buffer the loads plus the diagnostics are soo much better. A recent expercience --2 weeks ago and 170 miles way--with an HP ML110g7 software raid where a drive went in to failure on windows 2008r2 and it came down too the only hope for recovery was to reload because it was quicker that way. The software raid had so corrupted the OS it would take 2O minutes to boot. We had good backups, called HP and got replacement hardware.

In the case of hardware arrays, when that has happened we have been backup very quickly and have yet to see the OS get shelled like that. To add insult to the injury with the ML110G7, HP had just put out updates to deal with the fact that the ACU was not reporting drive failures, there was a bios update, a raid contoller update to resolve poor recovery capabilities, there were no messages on the OS console, log files or Insight Manager. I ran windows update thinking all is well and on reboot I got the kind message indicating drive 0:1 in cage 0 failed and this is Friday 430, I am 170 miles from home, on a day trip to install firewall, update esx server and all kvms and that server.

Do you have access to the bios to disable as many uneeded hardware components such as serial ports, sound, etc. This can help reduce CPU and Interrupt calls. I would check with the MB vendor to see if there are any new bios updates. Have a working backup that is recoverable before making changes.

You mention hosted, but based on what you said as to hw specs, did you you pruchase the hardware if so they should be willing to allow you to ssh to the HN. Even if they want or restrict acces due to support considerations, ask them to use SUDO and restirict you to cmds like top and access to log file. With ssh acces to the HN you can monitor HN activity with TOP and other tools to see if DISK IO or swap or something is getting hit really hard just as you see the failure happen from an HN prespective.

The additional drive, is that on a separate controller or the same controller or usb connected?
 
On the Post with the Backup Failure,
First of all thanks for your suggestions. Most of them is what I do if possible.

Just for the records. After testing over and over again the solution is a simple hardware failure. The sda drive sometimes produces bad host memory on heavy write (yes, only write) load. That's the reason for all kind of wired errors. But it takes a few days to figure that out. So far the new live backup works really great with the replaced hardware.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!