NFS for backups and PVE 3.2 in cluster PVE

cesarpk · May 14, 2014

liska_ said:
cesarpk:
I just tried to start backup a container (suspend only as for snapshot I do not have enough space in this testing virtual cluster) and turn off my nfs server after few seconds. The process hang out until that nfs storage is available again.

Many thanks for the report, and... can you also report this error in bugzilla? (it is the first step for obtain the glory)

liska_ said:
I have just installed the latest updates. Unfortunately, this error is still not solved and there is no answer on the bugzilla as well.
Although I am very happy for all new features, solving old major bugs should be on the first place.
I use proxmox for few years but this seems to me like a REALLY huge problem distracting me from completely migrating from esxi.
Please can someone from the pve team give us some updates about this?

@liska:
+ 1 Vote , "solving old major bugs should be on the first place !!!".

@PVE Developers:
Please, can anybody fix these bugs? ... this is frustrating while a PVE Cluster is running, and for PVE Node if the NFS Server decomposes while a backup is in progress.
... PVE need to leave well of these type of problems !!!

Best regards
Cesar

dietmar · May 14, 2014

cesarpk said:
@PVE Developers:
Please, can anybody fix these bugs? ... this is frustrating while a PVE Cluster is running,

Please contact our commercial support service If you want faster bug fixing.

mir · May 14, 2014

Is there some similarities between the NFS server used by all those having this issue?

NFS server:
OS: Vendor and version
Hardware: Vendor and version
CPU: Vendor and version
RAM: Size
Disks: Vendor and version, type (Sata, SCSI, SAS, SSD)
RAID: Soft or Hard. If Hard, vendor and version. Type (RAID0, RAID1 ...)

PVE Host:
OS: PVE version
Hardware: Vendor and version
CPU: Vendor and version
RAM: Size
Disks: Vendor and version, type (Sata, SCSI, SAS, SSD)
RAID: Soft or Hard. If Hard, vendor and version. Type (RAID0, RAID1 ...)

Network:
Vendor and version
Type: Ethernet, infiniband, fibre.
Bandwidth: Gigabit, 4 Gigabit, 8 Gigabit, 10 Gigabit.
Bonding: 2, 3 ....
Jumbo frames:

Switch:
Vendor and version
Type: Ethernet, infiniband, fibre.
Bandwidth: Gigabit, 4 Gigabit, 8 Gigabit, 10 Gigabit.
Bonding: 2, 3 ....
Jumbo frames:

wahmed · May 14, 2014

NFS server
OS: FreeNAS 9.2.1
Hardware: Asus B85M-E/CSM
CPU: Intel i3-3220 3.30 Ghz
RAM: 8GB
Disks: 2TBx4 SATA
RAID: No hardware Raid. Only ZFS RAIDZ.

PVE Host
OS: Proxmox-ve-2.6.32
Hardware: Asus B85M-E/CSM
CPU: Intel i7-4770K
RAM: 32GB DDR3
Disks: Kingston KC300 240GB x 1
RAID: None

Network
Intel Pro GigaBit
Type: Ethernet
Bandwidth: Gigabit
Bonding: None
Jumbo frames: None

Switch
Netgear GS724T Smart 24 Port
Type: Ethernet
Bandwidth: Gigabit
Bonding: None
Jumbo frames: None

Loss of NFS server breaks the WebGUI. But usually by running
service pvedaemon restart
service pvestatd restart
service pveproxy retart
bring the Proxmox WebGUI back. If i still have issue, i remove the old NFS connection through gui and from command line #umount -f /<mount> brings everything back without issue.

mir · May 14, 2014

Seems one with similar problems: http://forums.freenas.org/index.php?threads/configuring-freenas-for-best-nfs-performance.18395/

mir · May 14, 2014

Have you tried your FreeNAS with 16 GB RAM?
It seems all agree on the fact that FreeNAS should be configured with minimum 16 GB RAM to provide descend performance with NFS when using ZFS.

liska_ · May 15, 2014

@ dietmar buying a subscription was on my todo list, but now I am waiting for this issue to be solved. I just can imagine myself after some big blackout to be unable to start the cluster just because of some missing unimportant backup server ...

I tried to install glusterfs on the cluster nodes and failure of these services does not cause the problems in cluster. But I noticed that even after removing that storage from gui, it remains mounted on all nodes. This should not happened in my point of view.
This is my syslog when I turn off the gluster
May 15 09:40:27 cl1 pvestatd[2624]: WARNING: mount error: exit code 1
May 15 09:40:31 cl1 pvedaemon[2603]: WARNING: mount error: exit code 1
May 15 09:40:37 cl1 pvestatd[2624]: WARNING: mount error: exit code 1
May 15 09:40:39 cl1 pvedaemon[4771]: WARNING: mount error: exit code 1

@ symcomm
When I restart all the services at the same time it does help for a while, but in few moments cluster disconnects again. When I restart services on the nodes one after another, it does not solve the problem even for a while.

Anyway, this issue does not depend on any specific nfs server. I found it when using OpenIndiana and the same is happening with a nfs server running on linux.

These are my logs after turning of nfs server
May 15 09:46:24 cl1 pvedaemon[2603]: WARNING: command 'df -P -B 1 /mnt/pve/cupid_data' failed: got timeout
May 15 09:46:29 cl1 pvestatd[2624]: WARNING: command 'df -P -B 1 /mnt/pve/cupid_data' failed: got timeout
May 15 09:46:54 cl1 pveproxy[27998]: WARNING: proxy detected vanished client connection

cesarpk · May 18, 2014

mir said:
Is there some similarities between the NFS server used by all those having this issue?

NFS server:
OS: Centos 6.5 x64/ Proxmox-ve-2.6.32 (where i run my VMs)
Hardware: a Asus workstation / six DELL servers in total, 2900/R710/R320
CPU: Intel Core I3 / Xeon
RAM: 16/32/24/48 GB.
Disks: Sata and SAS
RAID: Soft for my Worksttation and Hard for my PVE Nodes
Note: only 1 workstation have Centos 6.2, all the other Nodes are DELL servers with PVE and with NFS shared

PVE Host:
OS: Proxmox-ve-2.6.32
Hardware: DELL
CPU: Xeon
RAM: 16/32/24/48 GB.
Disks: SAS only
RAID: Hard RAID5 with several HDDs

Network:
broadcom for DELL Servers and Atheros for my Asus Workstation NFS shared (with the latest version driver)
Type: Ethernet
Bandwidth: Gigabit
Bonding: Active Backup
Jumbo frames:No configured (standard 1500 MTU)

Switch:
Planet
Type: Ethernet
Bandwidth: Gigabit
Bonding: None
Jumbo frames:No configured (standard 1500 MTU)

liska_ · May 20, 2014

I put here some news about this problem as we talked about it on bugzilla.
This is an old bug of the web interface only. Cluster communication remains unaffected, at least in my case. It is possible to move machines between nodes and so on but only via CLI.
It could be temporarily fixed by deleting failed storage from storage.cfg.
Dietmar wrote me they are working on fix.

cesarpk · May 20, 2014

liska_ said:
I put here some news about this problem as we talked about it on bugzilla.
This is an old bug of the web interface only. Cluster communication remains unaffected, at least in my case. It is possible to move machines between nodes and so on but only via CLI.
It could be temporarily fixed by deleting failed storage from storage.cfg.
Dietmar wrote me they are working on fix.

Thanks liska, good news

If you know when will be ready the new patch for download, please tell us again the good news

Other question: Do you know something about of if the NFS Server shared decomposes while a backup is in progress?, and if the lock of the VM will unlock by the same backup system that did the lock?

Best regards
Cesar

Search

Search

NFS for backups and PVE 3.2 in cluster PVE

cesarpk

Well-Known Member

dietmar

Proxmox Staff Member

mir

Famous Member

wahmed

Famous Member

mir

Famous Member

mir

Famous Member

liska_

Member

cesarpk

Well-Known Member

liska_

Member

cesarpk

Well-Known Member