freeze during and after blackup

riri1310

Active Member
Jul 14, 2017
27
0
41
50
Hi I use the built in backup to backup all the VMs with snapshot mode because they all are running.
I notice some downtime related to SSH connection (I have monit on the vm and monit told me that there is a SSH connexion problem). So during the backup the VMs are sometimes unavailable and the strange thing is that after the complete backup the proxmox freeze all the VMs I need an hard reboot to restart everything.

Thanks for your help, if you need some command line extract feel free to ask
 
I'm not an expert but i have similar problems (not involving NFS though). What you could try as a workaround is a) install the CFQ i/o scheduler on the host and then b) use vzdump in combination with ionice parameter in the shell. See man vzdump.
 
Last edited:
I'm not an expert but i have similar problems (not involving NFS though). What you could try as a workaround is a) install the CFQ i/o scheduler on the host and then b) use vzdump in combination with ionice parameter in the shell. See man vzdump.
Hello my vzdump is already using "ionice priority: 7" should I use CFQ i/o scheduler too?

Thanks for your kind help on this, my server is always stuck after those backups...
 
I just read that ionice generally does not work on NFS:
https://forum.proxmox.com/threads/about-of-ionice-in-vzdump.16485/#post-84955

Maybe you can use another location instead?

Hello my vzdump is already using "ionice priority: 7" should I use CFQ i/o scheduler too?

As far as i know all other availble io schedulers do not honour ionice parameters, so you would have to use CFQ with it. I think it's still worth a try.

Maybe you can tell a bit more on:
- what version of PVE your Host uses
- how many guests you have
- what OS / version do your guests use
- how many free RAM your host has
- if you use Kernel Sampage Merging (KSM)
 
Thanks

Here are some details :

#pveversion -v
Code:
proxmox-ve: 5.2-2 (running kernel: 4.15.18-5-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-8
pve-kernel-4.15.18-5-pve: 4.15.18-24
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-40
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1

#cat /sys/block/sda/queue/scheduler
Code:
noop [deadline] cfq

The host add 4 containers all running debian 9

# free -m
total used free shared buff/cache available
Mem: 64110 24917 8502 306 30690 38174
Swap: 4095 402 3693


I think I will change [noop] to deadline (#cat /sys/block/sda/queue/scheduler) to see the change and effects...

Thanks
 
#cat /sys/block/sda/queue/scheduler
Code:
noop [deadline] cfq

I think I will change [noop] to deadline (#cat /sys/block/sda/queue/scheduler) to see the change and effects...

Looks like you're currently running deadline (the one in the brackets is active). Yes, give it a try.
 
You're right I made the copy after the change ;)

But the bad news is I just made a test and the system still freeze even with deadline option...

I have this in syslog :

Code:
Nov 11 13:02:51 ns3091370 pvestatd[4659]: storage 'ftpbackup_ns3091370' is not online
Nov 11 13:02:52 ns3091370 pvestatd[4659]: status update time (6.146 seconds)

It's look like the NFS backup drive had some problem to connect with the host (the drive is a backup from OVH)