NetApp & ProxMox VE

Bradomski · Sep 13, 2024

To answer the ask re: storage.cfg - IPs have been removed
there is a variance on the version purely for troubleshooting at this point
I see no glaring issues - Going to test dd asap

root@PVE-VX01:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content iso,backup,vztmpl

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

rbd: Containers
content rootdir,images
krbd 0
pool performance_pool

rbd: Container2
content images
krbd 0
pool bulk_pool

cephfs: LogsISOsTemplatesEtc
path /mnt/pve/LogsISOsTemplatesEtc
content vztmpl,backup,iso
fs-name LogsISOsTemplatesEtc

nfs: Priority1-001
export /NFS_P1_001
path /mnt/pve/Priority1-001
server x.x.x.x
content images,rootdir
options vers=4.2
prune-backups keep-all=1

nfs: Priority1-002
export /NFS_P1_002
path /mnt/pve/Priority1-002
server x.x.x.x
content images
options vers=4.2
prune-backups keep-all=1

nfs: Priority1-003
export /NFS_P1_003
path /mnt/pve/Priority1-003
server x.x.x.x
content images
prune-backups keep-all=1

nfs: Priority2-001
export /NFS_P2_001
path /mnt/pve/Priority2-001
server x.x.x.x
content images
options vers=4.2
prune-backups keep-all=1

nfs: Priority2-002
export /NFS_P2_002
path /mnt/pve/Priority2-002
server x.x.x.x
content images
options vers=4.2
prune-backups keep-all=1

nfs: Priority2-003
export /NFS_P2_003
path /mnt/pve/Priority2-003
server x.x.x.x
content images
options vers=4.2
prune-backups keep-all=1

nfs: Priority2-004
export /NFS_P2_004
path /mnt/pve/Priority2-004
server x.x.x.x
content images
prune-backups keep-all=1

nfs: Priority2-005
export /NFS_P2_005
path /mnt/pve/Priority2-005
server x.x.x.x
content images
options vers=4.2
prune-backups keep-all=1

nfs: Priority2-006
export /NFS_P2_006
path /mnt/pve/Priority2-006
server x.x.x.x
content images
options vers=4.2
prune-backups keep-all=1

nfs: Priority2-007
export /NFS_P2_007
path /mnt/pve/Priority2-007
server x.x.x.x
content images
prune-backups keep-all=1

nfs: Priority3-001
export /NFS_P3_001
path /mnt/pve/Priority3-001
server x.x.x.x
content images
options vers=4.1
preallocation off
prune-backups keep-all=1

nfs: Priority3-002
export /NFS_P3_002
path /mnt/pve/Priority3-002
server x.x.x.x
content images
prune-backups keep-all=1

nfs: Priority3-003
export /NFS_P3_003
path /mnt/pve/Priority3-003
server x.x.x.x
content images
options vers=4.2
prune-backups keep-all=1

nfs: Priority3-004
export /NFS_P3_004
path /mnt/pve/Priority3-004
server x.x.x.x
content images
options vers=4.2
prune-backups keep-all=1
showmount shows
Export list for
/NFS_P1_001 (everyone)
/NFS_P1_002 (everyone)
/NFS_P1_003 (everyone)
/NFS_P2_001 (everyone)
/NFS_P2_002 (everyone)
/NFS_P2_003 (everyone)
/NFS_P2_004 (everyone)
/NFS_P2_005 (everyone)
/NFS_P2_006 (everyone)
/NFS_P2_007 (everyone)
/NFS_P3_001 (everyone)
/NFS_P3_002 (everyone)
/NFS_P3_003 (everyone)
/NFS_P3_004 (everyone)

Falk R. · Sep 13, 2024

vizerei said:
Chiming in, I work with Brad. MTUs are 9216 everywhere on our "storage" VLAN. All shares were added via the GUI. We've tried disabling kerberos as suggested in another thread (and only disabling certain versions). We've also set individual share permissions to 777 but no change.

Initial add was weird as it would sit for a long time before showing the NFS share list, probably around 20-30 seconds, sometimes restarting the query via re-engaging the drop down. Once the list popped up it added immediately.

If you use MTU 9216 everywhere, only on the switches or also on the interfaces? Many systems only allow an MTU 9000 for data packets.
Have you ever done ping tests with a full size packet? e.g. with MTU 9000 a ping with packet size 8972?

bbgeek17 · Sep 13, 2024

Falk R. said:
If you use MTU 9216 everywhere, only on the switches or also on the interfaces? Many systems only allow an MTU 9000 for data packets.
Have you ever done ping tests with a full size packet? e.g. with MTU 9000 a ping with packet size 8972?

If their nodes are set to 9216, I would even go up to 9188 in ping.

Are you using LACP? Other network HA technology? Drop down to single cable/path.
Can you install vanilla Ubuntu or Debian on the same type of server with the same exact network config, does it work?

We've seen some very bizarre network issues, some were caused by bad NIC chip, others by bad mlag cable between core switches that affected only one particular flow of one rack.

Best of luck in your troubleshooting.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

vizerei · Sep 13, 2024

Falk R. said:
If you use MTU 9216 everywhere, only on the switches or also on the interfaces? Many systems only allow an MTU 9000 for data packets.
Have you ever done ping tests with a full size packet? e.g. with MTU 9000 a ping with packet size 8972?

We have solved this, Falk was right. My first thought was that can't be an issue as we mount NFS shares from the NetApp on Linux VMs regularly in our environment, but then it dawned on me that the VMs likely weren't configured for jumbo frames. @Bradomski ran with this first thing this morning with our NetEng and discovered immediate issues on your recommended test.

Our general Setup:
-We are running bonded 802.3ad 25gbps DACs at 9216 with no issues for Cluster Traffic and Ceph VLAN (tested extensively during setup, dozens of iperf streams, etc)
-Bonded 10gbps Fiber for corporate network access, MTU 1500
-Storage Network VLAN is identical to the Cluster Traffic VLAN and on the SWITCH showed everything at 9216

What Happened:
The switchports for the storage VLAN all *showed* 9216, but that was a failure to standardize/communicate on our part. Our Primary NetEng didn't reduce those to 9000, even though the NetApp and our Production ESXi cluster had the interfaces on that VLAN all set to 9000. You would think fragmentation wouldn't just break connections but on the receiving interface there's no feedback mechanism to fragment properly so yeah it breaks it.

Changed Bond1 Storage Interface MTU to 9000 and it works beautifully. We are migrating our low impact VMs as we speak. Good catch Falk!

We decided not to change the NetApp MTU to match as it's a live production system, and the difference shouldn't have a noticeable impact on performance.

vizerei · Sep 13, 2024

bbgeek17 said:
If their nodes are set to 9216, I would even go up to 9188 in ping.

Are you using LACP? Other network HA technology? Drop down to single cable/path.
Can you install vanilla Ubuntu or Debian on the same type of server with the same exact network config, does it work?

We've seen some very bizarre network issues, some were caused by bad NIC chip, others by bad mlag cable between core switches that affected only one particular flow of one rack.

Best of luck in your troubleshooting.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Yeah we were very certain that wasn't an issue, this is an under warranty VXRAIL Cluster that was previously running ESXi and VSAN using all ports in our production environment with no issues. Our 640s that now run production might have been suspect as they're a gen old, but unlikely that we had card issues from the vxrail servers sitting in a rack running doing nothing for months.

bbgeek17 · Sep 13, 2024

vizerei said:
You would think fragmentation wouldn't just break connections but on the receiving interface there's no feedback mechanism to fragment properly so yeah it breaks it.

I would not discount any sort of bizarre symptoms when there is an MTU mismatch.

Glad the issue in the network was easily fixable and did not require a re-architecture of the infrastructure.

Enjoy Proxmox

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

vizerei · Sep 13, 2024

bbgeek17 said:
I would not discount any sort of bizarre symptoms when there is an MTU mismatch.

Glad the issue in the network was easily fixable and did not require a re-architecture of the infrastructure.

Enjoy Proxmox

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Yeah it can get weird. Our company's product involves live moving networks and MTU is something we deal with regularly. Really was just a bad assumption on my part from switchport configurations that I didn't doublecheck on on the prod side stacks.

tmservo433 · Sep 23, 2024

This to me is something I find potentially very exciting. I have quite a few customers who do use a combination of ProxMox/TrueNas/etc. as we have been, frankly, forced to migrate away from VMWare.

I've looked at 10Gbe available devices that we can target for their SMB. Now, I'm definitely interested in the best way that NetApp would perceive to target the device. Our key goal, frankly, would be to view the NetApp in paired/tandem as a set for both a backup level target (Proxmox) and/or NFS share for data/vms.

I've long liked the style and layout of Netapp, and I'm interested in facilitating a smart connection to these units; setting up a direct 10Gb network connection to the unit to isolate it but provide it as a target for Proxmox is VERY enticing. I'm interested also in experience with expandability - ie, starting with a unit that houses 24 or more drives, starting with 8/12 and then expanding as we go in different pools.

Thank you for participating and bringing your knowledge here!

nemutai · Sep 27, 2024

i would like too see something similar to snapcenter on proxmox! that thing is so nice on vsphere env

jeffrey404 · Oct 2, 2024

I would love to see direct integration with Solidfire (netapp now) with proxmox.

DatacenterDude · Oct 14, 2024

Hey gang! Now that we're past our annual event, INSIGHT, I'm happy to report that we officially have support on the NetApp IMT for NFS, iSCSI, and FC. We've also published an initial round of best practices documentation on our official docs site.

If you have feedback or notice something missing, you can submit that through the links at the top, or tag me here or in our Discord community and I'll set you up with one of our engineers to get it updated.

https://docs.netapp.com/us-en/netapp-solutions/proxmox/proxmox-ontap.html

@nemutai Please know that those conversations are happening internally about how we can bring the power of ONTAP snapshot-based backup methodologies to PVE. Will keep everyone posted as things develop!

kellogs · Oct 15, 2024

ceph is the way to go

kluvi · Oct 15, 2024

We have Lenovo DM5100F (which is some rebranded NetApp, ONTAP 9.15) and using NVMe/TCP from it in PVE. I must say, that it is not as "smooth" at it should be.
1) `nvme connect-all` was not enough - we also have to create timer, that calls `nvmf-autoconnect.service` - because when we tested it and there was some network trouble, the nvme service wont start automatically (dont know if it should or not, the timer solves it)
2) joining new node is pain - we have to add "pause" in our Ansible playbook, which prints hostnqn and storage admin then has to go to ontap UI and allow this new hostnqn
3) storage has 2 controllers (and 2 tiers), so we have to create 2 NVMe namespaces for each tier. And then have 2 shared LVM storages in PVE. So when I am creating new VM, I have to decide which of the two storages use

Solution for (1) - update the manual

Solution for (2) - some plugin for Proxmox, that reads all hostnqn from all nodes and allows them in ONTAP

Solution for (3) - I dont have idea, if there is any solution, but something like "NVMe namespace backed by FlexGroup" would be great

spirit · Oct 15, 2024

DatacenterDude said:
Hey gang! Now that we're past our annual event, INSIGHT, I'm happy to report that we officially have support on the NetApp IMT for NFS, iSCSI, and FC. We've also published an initial round of best practices documentation on our official docs site.

If you have feedback or notice something missing, you can submit that through the links at the top, or tag me here or in our Discord community and I'll set you up with one of our engineers to get it updated.

https://docs.netapp.com/us-en/netapp-solutions/proxmox/proxmox-ontap.html

@nemutai Please know that those conversations are happening internally about how we can bring the power of ONTAP snapshot-based backup methodologies to PVE. Will keep everyone posted as things develop!

Nice !

you can have a look of my 13year old unmaintened netapp module and reuse code if you want ^_^

https://github.com/odiso/proxmox-pve-storage-netapp

spirit · Oct 15, 2024

My plugin was using the old sdk, I hope that they are a more moden rest api now ^_^.

My company is a big French netapp partner, https://www.netapp.com/partners/partner-connect/groupe-cyllene/,
feel free to pm me if you need help/advise for a clean integration.

flatpzero · Oct 16, 2024

Great to see NetApp getting on board. One of the key problems I've had recently with Proxmox v8 is the insistence on creating new directories for all of the types of data that is stored on the NFS share.

For example, in a non-clustered Proxmox environment, I created a NFS share called "proxmox" and mounted it on proxmox-server-1. This created 3 directories, "dump", "snippets", and "templates". When I try to add that NFS share on proxmox-server-2, it complains that those directories it wants to create are already there!

Note that this was in a non-cluster deployment of Proxmox because it is only 2 nodes.

spirit · Oct 16, 2024

flatpzero said:
Great to see NetApp getting on board. One of the key problems I've had recently with Proxmox v8 is the insistence on creating new directories for all of the types of data that is stored on the NFS share.

For example, in a non-clustered Proxmox environment, I created a NFS share called "proxmox" and mounted it on proxmox-server-1. This created 3 directories, "dump", "snippets", and "templates". When I try to add that NFS share on proxmox-server-2, it complains that those directories it wants to create are already there!

Note that this was in a non-cluster deployment of Proxmox because it is only 2 nodes.

Well, you can't share the same nfs share between 2 non-cluster nodes !

imagine that you have 2 vms with same id 100, you'll mix drives with same id, like vm-100-disk-0.raw for example.
same for vzdump backup.

only for iso, lxc templates it shouldn't be a problem

flatpzero · Oct 16, 2024

spirit said:
Well, you can't share the same nfs share between 2 non-cluster nodes !

imagine that you have 2 vms with same id 100, you'll mix drives with same id, like vm-100-disk-0.raw for example.
same for vzdump backup.

only for iso, lxc templates it shouldn't be a problem

The volume was selected for dump, templates, and ISO images. Images weren't included (that penny dropped.) The goal was to provide a single location for putting backups (templates also.)

spirit · Oct 16, 2024

flatpzero said:
The volume was selected for dump, templates, and ISO images. Images weren't included (that penny dropped.) The goal was to provide a single location for putting backups (templates also.)

ok.

>>When I try to add that NFS share on proxmox-server-2, it complains that those directories it wants to create are already there!

does it complain in the gui at the moment where you are trying to add the storage ?

you could try to add it manually in /etc/pve/storage.cfg directly.

flatpzero · Oct 16, 2024

spirit said:
>>When I try to add that NFS share on proxmox-server-2, it complains that those directories it wants to create are already there!

does it complain in the gui at the moment where you are trying to add the storage ?

I was able to solve my problem. The point of my original comment wasn't that I couldn't make it work, rather that there are parts of Proxmox NFS storage management (that NetApp want to get involved in) that need work/TLC to make it easier for people to use.

NetApp & ProxMox VE

New Member

Distinguished Member

Distinguished Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

New Member

New Member

Active Member

New Member

Active Member

Member

Distinguished Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

We value your privacy