freeze during and after blackup

riri1310 · Oct 28, 2018

Hi I use the built in backup to backup all the VMs with snapshot mode because they all are running.
I notice some downtime related to SSH connection (I have monit on the vm and monit told me that there is a SSH connexion problem). So during the backup the VMs are sometimes unavailable and the strange thing is that after the complete backup the proxmox freeze all the VMs I need an hard reboot to restart everything.

Thanks for your help, if you need some command line extract feel free to ask

Klaus Steinberger · Oct 28, 2018

What is the target for backup?`I had some troubles with NFS as target, the NFS Client in the Linux OS tends to eat up resources.
It looks like CIFS works more reliable in this case.

riri1310 · Oct 28, 2018

Yes the target it's NFS at OVH!

riri1310 · Nov 2, 2018

Any advice to deal with this problem?

Humbug · Nov 6, 2018

I'm not an expert but i have similar problems (not involving NFS though). What you could try as a workaround is a) install the CFQ i/o scheduler on the host and then b) use vzdump in combination with ionice parameter in the shell. See man vzdump.

riri1310 · Nov 11, 2018

Humbug said:
I'm not an expert but i have similar problems (not involving NFS though). What you could try as a workaround is a) install the CFQ i/o scheduler on the host and then b) use vzdump in combination with ionice parameter in the shell. See man vzdump.

Hello my vzdump is already using "ionice priority: 7" should I use CFQ i/o scheduler too?

Thanks for your kind help on this, my server is always stuck after those backups...

Humbug · Nov 11, 2018

I just read that ionice generally does not work on NFS:
https://forum.proxmox.com/threads/about-of-ionice-in-vzdump.16485/#post-84955

Maybe you can use another location instead?

riri1310 said:
Hello my vzdump is already using "ionice priority: 7" should I use CFQ i/o scheduler too?

As far as i know all other availble io schedulers do not honour ionice parameters, so you would have to use CFQ with it. I think it's still worth a try.

Maybe you can tell a bit more on:
- what version of PVE your Host uses
- how many guests you have
- what OS / version do your guests use
- how many free RAM your host has
- if you use Kernel Sampage Merging (KSM)

riri1310 · Nov 11, 2018

Thanks

Here are some details :

#pveversion -v

Code:

proxmox-ve: 5.2-2 (running kernel: 4.15.18-5-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-8
pve-kernel-4.15.18-5-pve: 4.15.18-24
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-40
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1

#cat /sys/block/sda/queue/scheduler

Code:

noop [deadline] cfq

The host add 4 containers all running debian 9

# free -m
total used free shared buff/cache available
Mem: 64110 24917 8502 306 30690 38174
Swap: 4095 402 3693

I think I will change [noop] to deadline (#cat /sys/block/sda/queue/scheduler) to see the change and effects...

Thanks

Humbug · Nov 11, 2018

riri1310 said:
#cat /sys/block/sda/queue/scheduler

Code:

noop [deadline] cfq

I think I will change [noop] to deadline (#cat /sys/block/sda/queue/scheduler) to see the change and effects...

Looks like you're currently running deadline (the one in the brackets is active). Yes, give it a try.

riri1310 · Nov 11, 2018

You're right I made the copy after the change

But the bad news is I just made a test and the system still freeze even with deadline option...

I have this in syslog :

Code:

Nov 11 13:02:51 ns3091370 pvestatd[4659]: storage 'ftpbackup_ns3091370' is not online
Nov 11 13:02:52 ns3091370 pvestatd[4659]: status update time (6.146 seconds)

It's look like the NFS backup drive had some problem to connect with the host (the drive is a backup from OVH)

Search

Search

freeze during and after blackup

riri1310

Active Member

Klaus Steinberger

Renowned Member

riri1310

Active Member

riri1310

Active Member

Humbug

Active Member

riri1310

Active Member

Humbug

Active Member

riri1310

Active Member

Humbug

Active Member

riri1310

Active Member

We value your privacy