VMs stops randomly, probably during backup

Vyborny Vladimir

New Member
Feb 13, 2018
4
0
1
67
We have been using 2 installations of Proxmox for many years - now version 4.4. Last year some problems began with VMs stops during night on both installations randomly (2-3 times a week). VMs are "greyed" and we have to start them manually again. Our idea is that it probably happens during Proxmox backup.

It is a very unpleasant situation. Can you help us, please? Our only idea so far is upgrade to last version of Proxmox.
 
We have been using 2 installations of Proxmox for many years - now version 4.4. Last year some problems began with VMs stops during night on both installations randomly (2-3 times a week). VMs are "greyed" and we have to start them manually again. Our idea is that it probably happens during Proxmox backup.

It is a very unpleasant situation. Can you help us, please? Our only idea so far is upgrade to last version of Proxmox.

I would add typical backup log of problem:

200: Feb 17 04:57:38 INFO: status: 33% (212659470336/644245094400), sparse 0% (3748810752), duration 2555, 82/80 MB/s
200: Feb 17 04:59:10 INFO: status: 34% (219094712320/644245094400), sparse 0% (3858046976), duration 2647, 69/68 MB/s
200: Feb 17 05:00:34 INFO: status: 35% (225612201984/644245094400), sparse 0% (3963133952), duration 2731, 77/76 MB/s
200: Feb 17 05:02:44 INFO: status: 36% (232082309120/644245094400), sparse 0% (4066938880), duration 2861, 49/48 MB/s
200: Feb 17 05:03:45 INFO: status: 37% (238472658944/644245094400), sparse 0% (4169191424), duration 2922, 104/103 MB/s
200: Feb 17 05:04:47 ERROR: VM 200 not running
200: Feb 17 05:04:47 INFO: aborting backup job
200: Feb 17 05:04:47 ERROR: VM 200 not running
200: Feb 17 05:05:18 ERROR: Backup of VM 200 failed - VM 200 not running
 
It hapens also with our setup. But only with Debian guest. It crashes at random point on automatic backup (if uses gzip). Backup is on NFS share.

Proxmox versions:
proxmox-ve: 5.1-38 (running kernel: 4.13.13-5-pve)
pve-manager: 5.1-43 (running version: 5.1-43/bdb08029)
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-3-pve: 4.13.13-34
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-20
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-16
pve-qemu-kvm: 2.9.1-6
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.4-pve2~bpo9

VM config:
balloon: 512
bootdisk: sata0
cores: 2
ide2: none,media=cdrom
memory: 8192
name: Bashe
net0: rtl8139=4E:6E:02:EC:50:91,bridge=vmbr0
net1: rtl8139=22:E9:6F:CA:5D:F5,bridge=vmbr5,tag=220
numa: 0
onboot: 1
ostype: l26
sata0: vs0.dsp.lv:220/vm-220-disk-1.qcow2,size=300G
scsihw: virtio-scsi-pci
smbios1: uuid=b25a18ab-99b1-4f9a-931a-000536f3ecdf
sockets: 2

On syslog for this VM crash i got this error logged:
Feb 11 07:00:19 vs2 kernel: [507351.298781] kvm[11436]: segfault at 1040 ip 000055a58f6a53d6 sp 00007ffca6d29e70 error 4 in qemu-system-x86_64[55a58f0cc000+7bd000]
......
Feb 11 14:50:05 vs2 kernel: [535537.036211] kvm[19659]: segfault at 2040 ip 00005650eaa403d6 sp 00007ffd422811c0 error 4 in qemu-system-x86_64[5650ea467000+7bd000]
......
Feb 17 14:20:41 vs2 kernel: [1052174.796788] kvm[658]: segfault at 2040 ip 0000557fb501b3d6 sp 00007ffca96000c0 error 4 in qemu-system-x86_64[557fb4a42000+7bd000]

Guest is Debian 8 with all updates applied.
 
Upgrading din't help:
proxmox-ve: 5.1-40 (running kernel: 4.13.13-6-pve)
pve-manager: 5.1-46 (running version: 5.1-46/ae8241d4)
pve-kernel-4.13.13-6-pve: 4.13.13-40
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.4.98-3-pve: 4.4.98-103
pve-kernel-4.4.35-1-pve: 4.4.35-77
corosync: 2.4.2-pve3
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-common-perl: 5.0-27
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-17
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-2
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-11
pve-cluster: 5.0-20
pve-container: 2.0-19
pve-docs: 5.1-16
pve-firewall: 3.0-5
pve-firmware: 2.0-3
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.9.1-8
pve-xtermjs: 1.0-2
qemu-server: 5.0-21
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.4-pve2~bpo9
 
Dear Juris. Thank you for your info. Have you found any solution so far?

I am adding some info about our setup:

- all hosts are Centos
- we use both GZIP and LZO compression with same problem
- backup is on NFS share
 
I think I have the same problem on an "ancient" Proxmox VE 3.4 server. The strange thing is that it started happening not long ago. Most likely after moving the VM between storages (from one ZFS pool to another). It may be that I changed the virtual storage adapter. It now is "sata" for the boot disk and "virtio" for two additional disks.

The VM stops often (but not always) during backup. It also stops in "regular" use with no obvious "high demand" situation.

I do use pve-zsync. The client OS ist Ubuntu 14.04.5.

Next thing I will try is moving the storage type all to SATA.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!