NFS Backup Stalls

zhoid

Member
Sep 4, 2019
24
0
21
41
Hi Guys,

We have a proxmox environment with a FreeNAS NFS Share mounted for Backups and tried a number of troubleshooting .


root@pve-204:/mnt/pve/BackupServer-02/dump# df -h
Filesystem Size Used Avail Use% Mounted on
udev 63G 0 63G 0% /dev
tmpfs 13G 1.3G 12G 11% /run
/dev/mapper/pve-root 94G 4.8G 85G 6% /
tmpfs 63G 63M 63G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 63G 0 63G 0% /sys/fs/cgroup
192.168.0.51:/mnt/ZFS104 21T 4.9T 16T 25% /mnt/pve/BackupServer-02
/dev/fuse 30M 196K 30M 1% /etc/pve

192.168.0.51:/mnt/ZFS104 on /mnt/pve/BackupServer-02 type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.51,mountvers=3,mountport=926,mountproto=udp,local_lock=none,addr=192.168.0.51)

When processing a rather large VM, about 550Gb disk space, a cPanel webserver infact, does not matter what time of day, after hours, business hours, the Backup cause the OS to lock-up, I have to then stop the backup, unlock the vm and reset the vm

It stalls on this part of the backup process

INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/BackupServer-02/dump/vzdump-qemu-140-2020_05_08-00_38_13.vma.lzo'

I have setup a dedicated 10GE Nic, Host and Backup server is connect to the Same switch ( Cisco Nexus 3K )

Tried different backup servers, the same problem occurs.

Tried backing up with and without compression, still a problem.

Tried backing up different big cPanel servers, same issue.


Just to check, test the throughput to the backup server, I mounted the NFS share to another linux server and dumped a file to the same, no issues at all so I am pretty sure the Backup server is not an issue.

Is there something I am missing?

Thanks

Zaid
 
Hi,

There's a new QEMU version available on pvetest, it addressed quite a few issues with backup stalls - albeit NFS is always a bit hard to work with due it's technical nature, so cannot make a 100% promise here.

It could be worth to try out that package now already, it has shown as quite stable here and will soon move to the non-test repositories.
See: https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_test_repo

I mean, it's naturally still the test repository, and here it's already Friday, so maybe it's better to just wait it out until you get it through the enterprise repos next week anyway, instead of risking a before-weekend breakage :)
 
Hi Thomas,

I see that Proxmox VE 6.2 introduces ZSTD compression for backups, does this mitigate the issue I am experiencing?

On another node.

I have a 5 node 5.4.x cluster.

I have reviewed the upgrade --> 6.2 procedure and documentation.

Would you recommend we upgrade to the latest build of 5.4.x and then following the distro upgrade process to 6.2

This is very well documented and planning to setup a lab to test this.

One of our consultants mentioned the following

Upgrade to Proxmox 6 will include a distribution upgrade.

While Debian allows distribution upgrade, we highly discourage distribution upgrade in production servers due to the very high probability of unexpected issues.

What is your view on this?

It's going to be very challenging having to install a new host with 6.2 in the existing cluster, most likely uses a different version of corosync, etc
Then migrate VM's to the new 6.2 and then re-install the rest of the nodes..

Thanks

Look forward to your reply.

Regards

Zaid
 
I see that Proxmox VE 6.2 introduces ZSTD compression for backups, does this mitigate the issue I am experiencing?

The compression is only after the data was read from the VM/NFS already, so it may not have that big of an impact.
I mean, it behaves differently, and is quicker to churn through data, and we do not have a good understanding of the cause of your exact problem, so one cannot tell for sure.

I have a 5 node 5.4.x cluster.

I have reviewed the upgrade --> 6.2 procedure and documentation.

Would you recommend we upgrade to the latest build of 5.4.x and then following the distro upgrade process to 6.2

Yes, I'd highly recommend that upgrade. Remember that Proxmox VE 5.4 goes end of life and the end of July this year, so IMO an upgrade should be planned until then.

While Debian allows distribution upgrade, we highly discourage distribution upgrade in production servers due to the very high probability of unexpected issues.

What is your view on this?

Reinstallation can have its merits and as one should have backups anyway it can be a simpler, albeit brute, way to do the upgrade. It's main draw back is VM/CT downtime and the need for reconfiguration - which can be, depending on setup complexity and tools (ansible, puppet, ...) available, almost none but also quite a bit of work to get all right again.

That said, we tested the in place upgrade a lot, and got quite some positive feedback from users which used the pve5to6 upgrade helper/checking tool and followed the upgrade documentation closely.
https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0

I'd not discourage from in place upgrades. I'd recommend testing the in-place upgrade out in a lab test setup, then youll already get a feeling for it, and can much better decide what the way to go for your specific setup is.

Hope that helps.
 
The compression is only after the data was read from the VM/NFS already, so it may not have that big of an impact.
I mean, it behaves differently, and is quicker to churn through data, and we do not have a good understanding of the cause of your exact problem, so one cannot tell for sure.

This is where it does become complicated, There are no issues backing up Smaller/Medium VM'sthat are not busy at all.

Any large Webserver/Mail like a cPanel server or Windows Webserver where random read/writes to the VM is constantly high, even after hours, these VM's are problematic backing up, at an initial look it appears that it struggles to write the data to the NFS share, the VM eventually becomes unresponsive and we have to stop the back, unlock the vm from the host cli and reboot.

At first we thought it was either a networking issue or a problem with the FreeNAS NFS Share.

We then setup a dedicated 10GE Backup network instead of using the management 10GE network but the same issue occurs...

We then mounted this share to a standard linux VM/Server outside of the Proxmox cluster and performing disk I/O actions sequential read/write, random read/write and we experience no issues or latency.

Between myself and our consultants we use to Proxmox support we are unable to pin point this issue, hence us logging this with Proxmox support.

On a separate note,

Unfortunately we do not have additional hardware to re-build a 6.2 cluster and migrate the VM's.
We are unable to backup VM's but the VM's are hosted on a redundant SAN, there is at least some redundancy/backup.
A roll back plan is not clear should we move forward with the upgrade.
Should we upgrade our support from community to Basic are we able to get further assistance on this matter? i.e assistance with the upgrade, hands on investigation to the problem explained above..

Yes, I'd highly recommend that upgrade. Remember that Proxmox VE 5.4 goes end of life and the end of July this year, so IMO an upgrade should be planned until then.

Noted.

Reinstallation can have its merits and as one should have backups anyway it can be a simpler, albeit brute, way to do the upgrade. It's main draw back is VM/CT downtime and the need for reconfiguration - which can be, depending on setup complexity and tools (ansible, puppet, ...) available, almost none but also quite a bit of work to get all right again.

That said, we tested the in place upgrade a lot, and got quite some positive feedback from users which used the pve5to6 upgrade helper/checking tool and followed the upgrade documentation closely.
https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0

I'd not discourage from in place upgrades. I'd recommend testing the in-place upgrade out in a lab test setup, then youll already get a feeling for it, and can much better decide what the way to go for your specific setup is.

Hope that helps.

Noted.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!