Hi,
I have problems with backing up my guests. My architecture now consists of two servers and on one server is virtual guest with pass-through controller running debian installed on local drive (just a sidenote: I made some benchmarks of nfs and a difference in performance between zfs on OpenIndianna or zfs on debian was just up to 5% in my case).
When I run backup of just a few machines (kvm+openvz), there is no problem at all. But if I run backup of more than approximately 5 or all 25 guests that nfs debian guest suddenly stops working - everytime on different id, percentage or technology (both on kvm and openvz containers). Nothing can be found in logs either on host or guest server. Guest just hangs on - it is not possible to connect via console.
I applied the latest patch I found here on a different topic but unfortunatelly nothing has changed.
The biggest problem is that when there is a shortage of any shared storage (I tested with nfs, ceph and gluster), my cluster gets disconnected and gui is hardly usable as there are no names or states of guests visible. That is really annoying because I have not found any easy way how to connect the cluster together again. It does not help to remove and unmount this faulty storages, delete from storage.cfg, restarting pvedaemon or pvestatd. I have to restart any of my two servers or turn on the storage again in order to have my cluster joins again. It happens when using external file servers as well, so I believe it is a bug in PVE.
After this it is not possible to run the nfs guests again, i get:
start failed: command '/usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -daemonize -name indian -smp 'sockets=2,cores=2' -nodefaults -boot 'menu=on' -vga cirrus -cpu host,+x2apic -k en-us -m 16384 -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'pci-assign,host=03:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -drive 'file=/mnt/pve/skladka_virtualy/template/iso/debian-7.2.0-amd64-CD-1.iso,if=none,id=drive-ide2,media=cdrom,aio=native' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/var/lib/vz/images/100/vm-100-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,aio=native,cache=none' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge' -device 'e1000,mac=02:BF:56:9B:10:7C,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -netdev 'type=tap,id=net1,ifname=tap100i1,script=/var/lib/qemu-server/pve-bridge' -device 'vmxnet3,mac=62:93:03:92:18:48,netdev=net1,bus=pci.0,addr=0x13,id=net1,bootindex=301'' failed: got timeout
I have to manually remove interfaces
ip link set tap100i0 down
ip link set tap100i1 down
After this it is possible to start that guest again and in some time cluster gets conected again and I can continue in my work. I am in the middle of moving from ESXI to proxmox and I would be so glad to find a solution to this problem, because using Proxmox is much better in comparison to esxi. Thanks a lot for your work and help as well.
pveversion -v
proxmox-ve-2.6.32: 3.2-124 (running kernel: 2.6.32-28-pve)
pve-manager: 3.2-2 (running version: 3.2-2/82599a65)
pve-kernel-2.6.32-28-pve: 2.6.32-124
pve-kernel-2.6.32-26-pve: 2.6.32-114
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-15
pve-firmware: 1.1-2
libpve-common-perl: 3.0-14
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-6
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1
I have problems with backing up my guests. My architecture now consists of two servers and on one server is virtual guest with pass-through controller running debian installed on local drive (just a sidenote: I made some benchmarks of nfs and a difference in performance between zfs on OpenIndianna or zfs on debian was just up to 5% in my case).
When I run backup of just a few machines (kvm+openvz), there is no problem at all. But if I run backup of more than approximately 5 or all 25 guests that nfs debian guest suddenly stops working - everytime on different id, percentage or technology (both on kvm and openvz containers). Nothing can be found in logs either on host or guest server. Guest just hangs on - it is not possible to connect via console.
I applied the latest patch I found here on a different topic but unfortunatelly nothing has changed.
The biggest problem is that when there is a shortage of any shared storage (I tested with nfs, ceph and gluster), my cluster gets disconnected and gui is hardly usable as there are no names or states of guests visible. That is really annoying because I have not found any easy way how to connect the cluster together again. It does not help to remove and unmount this faulty storages, delete from storage.cfg, restarting pvedaemon or pvestatd. I have to restart any of my two servers or turn on the storage again in order to have my cluster joins again. It happens when using external file servers as well, so I believe it is a bug in PVE.
After this it is not possible to run the nfs guests again, i get:
start failed: command '/usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -daemonize -name indian -smp 'sockets=2,cores=2' -nodefaults -boot 'menu=on' -vga cirrus -cpu host,+x2apic -k en-us -m 16384 -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'pci-assign,host=03:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -drive 'file=/mnt/pve/skladka_virtualy/template/iso/debian-7.2.0-amd64-CD-1.iso,if=none,id=drive-ide2,media=cdrom,aio=native' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/var/lib/vz/images/100/vm-100-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,aio=native,cache=none' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge' -device 'e1000,mac=02:BF:56:9B:10:7C,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -netdev 'type=tap,id=net1,ifname=tap100i1,script=/var/lib/qemu-server/pve-bridge' -device 'vmxnet3,mac=62:93:03:92:18:48,netdev=net1,bus=pci.0,addr=0x13,id=net1,bootindex=301'' failed: got timeout
I have to manually remove interfaces
ip link set tap100i0 down
ip link set tap100i1 down
After this it is possible to start that guest again and in some time cluster gets conected again and I can continue in my work. I am in the middle of moving from ESXI to proxmox and I would be so glad to find a solution to this problem, because using Proxmox is much better in comparison to esxi. Thanks a lot for your work and help as well.
pveversion -v
proxmox-ve-2.6.32: 3.2-124 (running kernel: 2.6.32-28-pve)
pve-manager: 3.2-2 (running version: 3.2-2/82599a65)
pve-kernel-2.6.32-28-pve: 2.6.32-124
pve-kernel-2.6.32-26-pve: 2.6.32-114
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-15
pve-firmware: 1.1-2
libpve-common-perl: 3.0-14
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-6
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1