HA NFS service for KVM VMs on a Proxmox Cluster with Ceph

Doesn't keepalived move the IP?
Yes it does. But as soon as the cron.d killed all the ganesha.nfsd process on all of the five CTs there is nowhere to move the IP to.

This is a part of my keepalived config:
Code:
rstumbaum@controlnode01.dc1:~$ cat keepalived/conf.d/check_proc_ganesha.conf
vrrp_script check_proc_ganesha {
       script "/usr/bin/pkill -0 ganesha.nfsd" # cheaper than pidof
       interval 1                       # check every second
}
rstumbaum@controlnode01.dc1:~$ cat keepalived/conf.d/vlan3000.conf
vrrp_instance vlan3000 {
  state BACKUP
  nopreempt
  #smtp_alert
  interface eth4
  virtual_router_id 30 # unique ID!
  priority 100
  advert_int 1
  authentication {
    # dont use pass unless on 100% secure net, its send in cleartext https://louwrentius.com/configuring-attacking-and-securing-vrrp-on-linux.html
    # auth_type PASS
    # much secure:
    auth_type AH
    auth_pass 123-3000
  }
  track_script {
      check_proc_ganesha
  }
  virtual_ipaddress {
      10.30.0.2/32
  }
}
rstumbaum@controlnode01.dc1:~$

So if there is no ganesha.nfsd running the CT is no target for a VIP.
 
No idea why people are using NFS-Ganesha???

Created a fresh CT,
copied, adjusted and reloaded an apparmor profile for it:
Code:
root@proxmox07:~# cat /etc/apparmor.d/lxc/lxc-default-with-nfs2ceph
# Do not load this file.  Rather, load /etc/apparmor.d/lxc-containers, which
# will source all profiles under /etc/apparmor.d/lxc

profile lxc-container-default-nfs2ceph flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/lxc/container-base>

  # the container may never be allowed to mount devpts.  If it does, it
  # will remount the host's devpts.  We could allow it to do it with
  # the newinstance option (but, right now, we don't).
  deny mount fstype=devpts,
  mount fstype=cgroup -> /sys/fs/cgroup/**,
  mount fstype=cgroup2 -> /sys/fs/cgroup/**,
  mount fstype=ceph,
  mount fstype=nfsd,
  mount fstype=rpc_pipefs,
}
root@proxmox07:~# service apparmor reload

Assigned the apparmor profile to the CT:
Code:
root@proxmox07:~# cat /etc/pve/lxc/601.conf
arch: amd64
cores: 4
hostname: nfsshares-a
memory: 4096
nameserver: 10.20.52.1
net0: name=eth0,bridge=vmbr0,gw=10.20.52.1,hwaddr=02:10:20:52:02:40,ip=10.20.52.240/24,tag=52,type=veth
net1: name=eth1,bridge=vmbr0,hwaddr=02:10:34:08:02:40,ip=10.34.8.240/24,tag=3408,type=veth,mtu=9000
net2: name=eth2,bridge=vmbr0,hwaddr=02:10:20:56:02:40,ip=10.20.56.240/24,tag=56,type=veth,mtu=9000
net3: name=eth3,bridge=vmbr0,hwaddr=02:10:20:57:02:40,ip=10.20.57.240/24,tag=57,type=veth,mtu=9000
net4: name=eth4,bridge=vmbr0,hwaddr=02:10:30:00:02:40,ip=10.30.0.240/23,tag=3000,type=veth,mtu=9000
net5: name=eth5,bridge=vmbr0,hwaddr=02:10:31:00:02:40,ip=10.31.0.240/23,tag=3100,type=veth,mtu=9000
net6: name=eth6,bridge=vmbr0,hwaddr=02:10:32:00:02:40,ip=10.32.0.240/23,tag=3200,type=veth,mtu=9000
ostype: debian
rootfs: ceph-proxmox-VMs:vm-601-disk-0,mountoptions=noatime,size=8G
swap: 1024
unrestricted: 1
lxc.seccomp.profile:
lxc.apparmor.profile: lxc-container-default-nfs2ceph
root@proxmox07:~#

and started the CT.

Installed the Ceph from Proxmox, keepalived and nfs-kernel-server,
generated and copied a minimal /etc/ceph/ceph.conf using ceph config generate-minimal-conf on the PVE host and then configured /etc/fstab to mount the CephFS using a dedicated user.

Configured /etc/exports and enabled and started nfs-kernel-server.

Used the configuration as above for keepalived.

Cloned the config and copied the disk and the failover even with a mounted .snap Ceph Snapshots seems to work (everything being NFSv3).
 
No idea why people are using NFS-Ganesha???
Because it is in user-space and doesn't tear down the kernel if it gets stuck/fails. Best option for containers. Also it can directly connect to various backends, without the (rather disconnected) extra layers (eg. fstab).
 
Because it is in user-space and doesn't tear down the kernel if it gets stuck/fails. Best option for containers. Also it can directly connect to various backends, without the (rather disconnected) extra layers (eg. fstab).
But does not support properly exporting .snap directories...
At least a bug report was created for this missing feature on NFS-Ganesha : https://tracker.ceph.com/issues/48991
 
Last edited:
I can not recommend using nfs-kernel-server with a cephfs kernel client when using CephFS snaphots.
Changed the setup now to use:
  • nfs-kernel-server
  • ZFS for filesystems where we use snapshots to boot our readonly server images
  • CephFS kernel client mounts for shared filesystems - but here we turned off CephFS snapshots completely because of https://tracker.ceph.com/issues/50511
CephFS snapshots usage currently still is too experimental...
 
Last edited:
Still running on the setup described in post #27.
The bugs I found back then should be fixed by now - but I never tried again...

If you want to try:
- Use a LXC container so you do not need to run keepalive for HA - the restart of those containers is fast enough
- Use NixOS - as it is easy to keep it up2date
 
  • Like
Reactions: alexskysilk

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!