NFS for backups and PVE 3.2 in cluster PVE

cesarpk:
I just tried to start backup a container (suspend only as for snapshot I do not have enough space in this testing virtual cluster) and turn off my nfs server after few seconds. The process hang out until that nfs storage is available again.

Many thanks for the report, and... can you also report this error in bugzilla? (it is the first step for obtain the glory)


I have just installed the latest updates. Unfortunately, this error is still not solved and there is no answer on the bugzilla as well.
Although I am very happy for all new features, solving old major bugs should be on the first place.
I use proxmox for few years but this seems to me like a REALLY huge problem distracting me from completely migrating from esxi.
Please can someone from the pve team give us some updates about this?

@liska:
+ 1 Vote , "solving old major bugs should be on the first place !!!".

@PVE Developers:
Please, can anybody fix these bugs? ... this is frustrating while a PVE Cluster is running, and for PVE Node if the NFS Server decomposes while a backup is in progress.
... PVE need to leave well of these type of problems !!!

Best regards
Cesar
 
Is there some similarities between the NFS server used by all those having this issue?

NFS server:
OS: Vendor and version
Hardware: Vendor and version
CPU: Vendor and version
RAM: Size
Disks: Vendor and version, type (Sata, SCSI, SAS, SSD)
RAID: Soft or Hard. If Hard, vendor and version. Type (RAID0, RAID1 ...)

PVE Host:
OS: PVE version
Hardware: Vendor and version
CPU: Vendor and version
RAM: Size
Disks: Vendor and version, type (Sata, SCSI, SAS, SSD)
RAID: Soft or Hard. If Hard, vendor and version. Type (RAID0, RAID1 ...)

Network:
Vendor and version
Type: Ethernet, infiniband, fibre.
Bandwidth: Gigabit, 4 Gigabit, 8 Gigabit, 10 Gigabit.
Bonding: 2, 3 ....
Jumbo frames:

Switch:
Vendor and version
Type: Ethernet, infiniband, fibre.
Bandwidth: Gigabit, 4 Gigabit, 8 Gigabit, 10 Gigabit.
Bonding: 2, 3 ....
Jumbo frames:
 
NFS server
OS: FreeNAS 9.2.1
Hardware: Asus B85M-E/CSM
CPU: Intel i3-3220 3.30 Ghz
RAM: 8GB
Disks: 2TBx4 SATA
RAID: No hardware Raid. Only ZFS RAIDZ.

PVE Host
OS: Proxmox-ve-2.6.32
Hardware: Asus B85M-E/CSM
CPU: Intel i7-4770K
RAM: 32GB DDR3
Disks: Kingston KC300 240GB x 1
RAID: None

Network
Intel Pro GigaBit
Type: Ethernet
Bandwidth: Gigabit
Bonding: None
Jumbo frames: None

Switch
Netgear GS724T Smart 24 Port
Type: Ethernet
Bandwidth: Gigabit
Bonding: None
Jumbo frames: None

Loss of NFS server breaks the WebGUI. But usually by running
service pvedaemon restart
service pvestatd restart
service pveproxy retart
bring the Proxmox WebGUI back. If i still have issue, i remove the old NFS connection through gui and from command line #umount -f /<mount> brings everything back without issue.
 
Have you tried your FreeNAS with 16 GB RAM?
It seems all agree on the fact that FreeNAS should be configured with minimum 16 GB RAM to provide descend performance with NFS when using ZFS.
 
@ dietmar buying a subscription was on my todo list, but now I am waiting for this issue to be solved. I just can imagine myself after some big blackout to be unable to start the cluster just because of some missing unimportant backup server ...

I tried to install glusterfs on the cluster nodes and failure of these services does not cause the problems in cluster. But I noticed that even after removing that storage from gui, it remains mounted on all nodes. This should not happened in my point of view.
This is my syslog when I turn off the gluster
May 15 09:40:27 cl1 pvestatd[2624]: WARNING: mount error: exit code 1
May 15 09:40:31 cl1 pvedaemon[2603]: WARNING: mount error: exit code 1
May 15 09:40:37 cl1 pvestatd[2624]: WARNING: mount error: exit code 1
May 15 09:40:39 cl1 pvedaemon[4771]: WARNING: mount error: exit code 1

@ symcomm
When I restart all the services at the same time it does help for a while, but in few moments cluster disconnects again. When I restart services on the nodes one after another, it does not solve the problem even for a while.

Anyway, this issue does not depend on any specific nfs server. I found it when using OpenIndiana and the same is happening with a nfs server running on linux.

These are my logs after turning of nfs server
May 15 09:46:24 cl1 pvedaemon[2603]: WARNING: command 'df -P -B 1 /mnt/pve/cupid_data' failed: got timeout
May 15 09:46:29 cl1 pvestatd[2624]: WARNING: command 'df -P -B 1 /mnt/pve/cupid_data' failed: got timeout
May 15 09:46:54 cl1 pveproxy[27998]: WARNING: proxy detected vanished client connection
 
Is there some similarities between the NFS server used by all those having this issue?

NFS server:
OS: Centos 6.5 x64/ Proxmox-ve-2.6.32 (where i run my VMs)
Hardware: a Asus workstation / six DELL servers in total, 2900/R710/R320
CPU: Intel Core I3 / Xeon
RAM: 16/32/24/48 GB.
Disks: Sata and SAS
RAID: Soft for my Worksttation and Hard for my PVE Nodes
Note: only 1 workstation have Centos 6.2, all the other Nodes are DELL servers with PVE and with NFS shared

PVE Host:
OS: Proxmox-ve-2.6.32
Hardware: DELL
CPU: Xeon
RAM: 16/32/24/48 GB.
Disks: SAS only
RAID: Hard RAID5 with several HDDs

Network:
broadcom for DELL Servers and Atheros for my Asus Workstation NFS shared (with the latest version driver)
Type: Ethernet
Bandwidth: Gigabit
Bonding: Active Backup
Jumbo frames:No configured (standard 1500 MTU)

Switch:
Planet
Type: Ethernet
Bandwidth: Gigabit
Bonding: None
Jumbo frames:No configured (standard 1500 MTU)
 
Last edited:
I put here some news about this problem as we talked about it on bugzilla.
This is an old bug of the web interface only. Cluster communication remains unaffected, at least in my case. It is possible to move machines between nodes and so on but only via CLI.
It could be temporarily fixed by deleting failed storage from storage.cfg.
Dietmar wrote me they are working on fix.
 
I put here some news about this problem as we talked about it on bugzilla.
This is an old bug of the web interface only. Cluster communication remains unaffected, at least in my case. It is possible to move machines between nodes and so on but only via CLI.
It could be temporarily fixed by deleting failed storage from storage.cfg.
Dietmar wrote me they are working on fix.

Thanks liska, good news :p

If you know when will be ready the new patch for download, please tell us again the good news

Other question: Do you know something about of if the NFS Server shared decomposes while a backup is in progress?, and if the lock of the VM will unlock by the same backup system that did the lock?

Best regards
Cesar
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!