Hi all. Long thread, but hopefully lots of info on a problem than has puzzled me for some time. I've reported the problem previously and it went unsolved, but figured I'd try again.
Essentially, I have 2 KVM windows 2008 vm's on one PVE host. This box has been in production since PVE .9 beta. One of them is SBS2008 w a 300GB qcow drive, the other is a regular 2008 server with a 80GB qcow drive. PVE host has 8GB physical RAM with 6GB assigned to the vms:
VZdump snapshots of the SBS frequently lock up vm 101 itself, and nothing else. Sometimes for 8-12 hours if left alone on a weekend. Sometimes the web interface will not stop the vm, and I have to use qm to stop it. vmtar sometimes will just keep writing and fill up all space on my backup device if left unchecked. To clarify, I believe vmtar (or some combination of hardware and software settings) is the root problem, not the vzdump perl script. Normally when this this happens, vmtar is pegged at 100% cpu usage.
I've posted these things previously on the forum. logfiles of vzdump failing because it ran out of disk space on a 1tb disk during the backup of a <300gb vm. Initially we were doing this to external USB. Maybe about a year ago we swapped to an internal disk on the same controller and the problem is consistent across both.
Yes I know I need to reboot, I updated this morning.
All the good stuff:
/etc/vzdump.conf:
changing vzdump options has never really seemed to help consistently.
-size 2048 suggested by Dietmar. Tom suggested higher, been using 4096 since, same results. higher?
-bwlimit changing this doesn't seem to matter
here's where I kill it at 8:30 because it has been running for 10+ hours. This particular backup takes between 4 and 6 hours, when it is successful. Lowering bwlimit doesn't seem to effect the locking up, just how long the backup takes to complete.
We have plans to double the physical RAM in the PVE host for good measure. I do not believe this is a PVE-version specific problem, because it has always happened across each version of PVE we install. It only happens to a single VM, not both - so the size of the container seems to be an issue somehow. From the perspective of vzdump and vmtar, that is.
Any help or suggestions would be appreciated. Previous thread(s):
http://forum.proxmox.com/threads/2990-Another-vzdump-problem.-VERY-STRANGE!
Essentially, I have 2 KVM windows 2008 vm's on one PVE host. This box has been in production since PVE .9 beta. One of them is SBS2008 w a 300GB qcow drive, the other is a regular 2008 server with a 80GB qcow drive. PVE host has 8GB physical RAM with 6GB assigned to the vms:
Code:
[root@volt:/etc/qemu-server]$ ls *
101.conf 102.conf
[root@volt:/etc/qemu-server]$ cat 101.conf
name: SBS
sockets: 1
bootdisk: ide0
ide0: vm-101-disk.qcow2
ostype: w2k8
memory: 4096
onboot: 1
vlan0: e1000=26:0A:5B:7F:00:F6
description: <edit>
hostusb: 067b:2303
cores: 2
boot: c
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1
[root@volt:/etc/qemu-server]$ cat 102.conf
name: SQL
ide2: none,media=cdrom
sockets: 1
bootdisk: ide0
ostype: w2k8
memory: 2048
onboot: 1
boot: dc
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1
ide0: vm-102-disk.qcow2
vlan0: e1000=82:A0:A5:67:0F:2F
description: <edit>
cores: 2
I've posted these things previously on the forum. logfiles of vzdump failing because it ran out of disk space on a 1tb disk during the backup of a <300gb vm. Initially we were doing this to external USB. Maybe about a year ago we swapped to an internal disk on the same controller and the problem is consistent across both.
Yes I know I need to reboot, I updated this morning.
All the good stuff:
Code:
[root@volt:~]$ pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.24-9-pve
pve-kernel-2.6.24-7-pve: 2.6.24-11
pve-kernel-2.6.24-1-pve: 2.6.24-4
pve-kernel-2.6.24-9-pve: 2.6.24-18
pve-kernel-2.6.24-5-pve: 2.6.24-6
pve-kernel-2.6.24-2-pve: 2.6.24-5
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
[root@volt:~]$ pveperf
CPU BOGOMIPS: 37240.26
REGEX/SECOND: 627087
HD SIZE: 94.49 GB (/dev/pve/root)
BUFFERED READS: 203.67 MB/sec
AVERAGE SEEK TIME: 9.58 ms
FSYNCS/SECOND: 1345.03
DNS EXT: 156.51 ms
DNS INT: 92.39 ms
[root@volt:~]$ pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 pve lvm2 a- 930.00G 4.00G
[root@volt:~]$ lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
data pve -wi-ao 823.00G
root pve -wi-ao 96.00G
swap pve -wi-ao 7.00G
[root@volt:~]$ vgs
VG #PV #LV #SN Attr VSize VFree
pve 1 3 0 wz--n- 930.00G 4.00G
[root@volt:~]$ lspci | grep RAID
01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
[root@volt:~]$ mount
/dev/pve/root on / type ext3 (rw,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
procbususb on /proc/bus/usb type usbfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/dev/mapper/pve-data on /var/lib/vz type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)
/dev/sdb5 on /backup type ext3 (rw,errors=remount-ro)
[root@volt:~]$ ls /var/lib/vz/images/*
/var/lib/vz/images/101:
vm-101-disk.qcow2
/var/lib/vz/images/102:
vm-102-disk.qcow2
[root@volt:~]$ qm list
VMID NAME STATUS MEM(MB) BOOTDISK(GB) PID
101 SBS running 4096 300.00 24284
102 SQL running 2048 80.00 8663
[root@volt:~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/pve/root 95G 2.0G 88G 3% /
tmpfs 3.9G 0 3.9G 0% /lib/init/rw
udev 10M 2.7M 7.4M 27% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/mapper/pve-data 811G 500G 311G 62% /var/lib/vz
/dev/sda1 496M 100M 371M 22% /boot
/dev/sdb5 917G 499G 372G 58% /backup
changing vzdump options has never really seemed to help consistently.
-size 2048 suggested by Dietmar. Tom suggested higher, been using 4096 since, same results. higher?
-bwlimit changing this doesn't seem to matter
Code:
[root@volt:~]$ cat /etc/vzdump.conf
############################################
# vzdump static options configuration file #
# ALL settings commented out purposely #
# July 1st 2010, new internal raid0 1TB #
# drive installed. VZdump testing #
############################################
# script: /root/script/usb-rm.pl
size: 4096
dumpdir: /backup
# bwlimit: 10000
maxfiles: 2
Code:
[root@volt:~]$ cat vzdump-qemu-101-2011_01_20-20_00_02.log
Jan 20 20:00:02 INFO: Starting Backup of VM 101 (qemu)
Jan 20 20:00:03 INFO: running
Jan 20 20:00:03 INFO: status = running
Jan 20 20:00:03 INFO: backup mode: snapshot
Jan 20 20:00:03 INFO: ionice priority: 7
Jan 20 20:00:04 INFO: Logical volume "vzsnap-volt-0" created
Jan 20 20:00:04 INFO: creating archive '/backup/vzdump-qemu-101-2011_01_20-20_00_02.tar'
Jan 20 20:00:04 INFO: adding '/backup/vzdump-qemu-101-2011_01_20-20_00_02.tmp/qemu-server.conf' to archive ('qemu-server.conf')
Jan 20 20:00:04 INFO: adding '/mnt/vzsnap0/images/101/vm-101-disk.qcow2' to archive ('vm-disk-ide0.qcow2')
Jan 21 08:30:12 INFO: received signal - terminate process
Jan 21 08:30:13 INFO: Logical volume "vzsnap-volt-0" successfully removed
Jan 21 08:32:32 ERROR: Backup of VM 101 failed - command '/usr/lib/qemu-server/vmtar '/backup/vzdump-qemu-101-2011_01_20-20_00_02.tmp/qemu-server.conf' 'qemu-server.conf' '/mnt/vzsnap0/images/101/vm-101-disk.qcow2' 'vm-disk-ide0.qcow2' >/backup/vzdump-qemu-101-2011_01_20-20_00_02.dat' failed with exit code 255
Any help or suggestions would be appreciated. Previous thread(s):
http://forum.proxmox.com/threads/2990-Another-vzdump-problem.-VERY-STRANGE!
Last edited: