scheduled w2k3 backup fails because "unable to create fairsched node"

Nov 24, 2014
10
0
66
Hello,

i have a 2-node proxmox setup and besides different debian-kvm's one w2k3-server (w2k3 guest best practices - check). Now i've discovered that since a week the scheduled backup doesn't run. Since i wasn't there, i'm pretty sure there were no changes made to the setup.

Unfortunately i couldn't find much when searching for "fairsched" and related problems, so i'm hoping someone here could help me.

The backup itself runs at night with:
[TABLE="width: 500"]
[TR]
[TD]Compression:[/TD]
[TD]GZIP[/TD]
[/TR]
[TR]
[TD]Mode:[/TD]
[TD]Stop[/TD]
[/TR]
[TR]
[TD]Include:[/TD]
[TD]only this vm[/TD]
[/TR]
[/TABLE]

The stop-process itself runs fine - the vm does stop, but then the backup immediately fails:

INFO: starting new backup job: vzdump 105 --quiet 1 --mode stop --compress gzip --storage store1
INFO: Starting Backup of VM 105 (qemu)
INFO: status = running
INFO: update VM 105: -lock backup
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: stopping vm
INFO: creating archive '/data/store1/dump/vzdump-qemu-105-2014_11_23-01_00_01.vma.gz'
INFO: starting kvm to execute backup task
unable to create fairsched node
INFO: restarting vm
INFO: vm is online again after 9 seconds
ERROR: Backup of VM 105 failed - start failed: command '/usr/bin/kvm -id 105 -chardev 'socket,id=qmp,path=/var/run/qemu-server/105.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/105.vnc,x509,password -pidfile /var/run/qemu-server/105.pid -daemonize -name ponza -smp 'sockets=2,cores=2' -nodefaults -boot 'menu=on' -vga cirrus -cpu kvm64,+lahf_lm,+x2apic,+sep -k de -m 4096 -cpuunits 1000 -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:49ec7ca2d15' -drive 'file=/data/store0/images/105/vm-105-disk-1.qcow2,if=none,id=drive-ide0,format=qcow2,cache=writethrough,aio=native,detect-zeroes=on' -device 'ide-hd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=100' -drive 'if=none,id=drive-ide2,media=cdrom,aio=native' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -netdev 'type=tap,id=net0,ifname=tap105i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=7E:09:D2:57:C3:22,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-i440fx-1.7' -S' failed: exit code 1
INFO: Backup job finished with errors
TASK ERROR: job errors

I've already rebooted everything (host and vm) multiple times and migrated the vm so the backup could run on different hosts - still no luck.

The weird thing about this is, when i use "backup now" (store0/stop/gzip) the backup runs fine, so apparently the problem only occurs if it runs scheduled.

Here's my version info:
proxmox-ve-2.6.32: 3.3-139 (running kernel: 2.6.32-34-pve)
pve-manager: 3.3-5 (running version: 3.3-5/bfebec03)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-30-pve: 2.6.32-130
pve-kernel-2.6.32-34-pve: 2.6.32-139
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.3-3
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-25
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
 
okay i've tried one more thing - removed the backup job and recreated it. That seemed to help as the first run afterwards was successful, but last night it ended up failing again.

Are there any more logs i could analyse to get to the root of the problem?