nfs server service refuses to remain running

luciandf

Member
Feb 20, 2020
23
0
21
42
Hello all,

I have been having an issue for quite a while now with the nfs-server.service. I have noticed that at reboot it will refuse to start. I have to start it manually. Then yesterday I noticed that it refuses to remain running.

I have performed all the proxmox updates but this problem now is just unbearable.

there isn't anything of note in dmesg, just in journalctl -xe I see a `rpc.mountd[15211]: v4.2 client detached`, `The unit nfs-mountd.service has successfully entered the 'dead' state` and `rpc.idmapd: exiting on signal 15` messages, then the service stops.

This takes anywhere between 1 and 10 mins to happen.

Has anyone seen this?

When I regain access to the server I will try to post more details about the journalctl -xe messages.

Regards,

Lucian
 
Code:
shares:~$ systemctl status nfs-kernel-server
● nfs-server.service - NFS server and services
     Loaded: loaded (/lib/systemd/system/nfs-server.service; enabled; preset: enabled)
    Drop-In: /run/systemd/generator/nfs-server.service.d
             └─order-with-mounts.conf
     Active: active (exited) since Sat 2024-09-07 10:33:28 EDT; 2 days ago
    Process: 641 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
    Process: 645 ExecStart=/usr/sbin/rpc.nfsd (code=exited, status=0/SUCCESS)
   Main PID: 645 (code=exited, status=0/SUCCESS)
        CPU: 5ms

Note "active (exited)". This is normal because nfs-kernel-server starts a kernel thread:

Code:
shares:~$ ps ax|grep nfs
    554 ?        Ss     0:00 /usr/sbin/nfsdcld
    674 ?        I      0:00 [nfsd]
    675 ?        I      0:00 [nfsd]
    679 ?        I      0:00 [nfsd]
    681 ?        I      0:00 [nfsd]
    682 ?        I      0:00 [nfsd]
    683 ?        I      0:00 [nfsd]
    684 ?        I      0:00 [nfsd]
    685 ?        I      0:00 [nfsd]
 
Code:
shares:~$ systemctl status nfs-kernel-server
● nfs-server.service - NFS server and services
     Loaded: loaded (/lib/systemd/system/nfs-server.service; enabled; preset: enabled)
    Drop-In: /run/systemd/generator/nfs-server.service.d
             └─order-with-mounts.conf
     Active: active (exited) since Sat 2024-09-07 10:33:28 EDT; 2 days ago
    Process: 641 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
    Process: 645 ExecStart=/usr/sbin/rpc.nfsd (code=exited, status=0/SUCCESS)
   Main PID: 645 (code=exited, status=0/SUCCESS)
        CPU: 5ms

Note "active (exited)". This is normal because nfs-kernel-server starts a kernel thread:

Code:
shares:~$ ps ax|grep nfs
    554 ?        Ss     0:00 /usr/sbin/nfsdcld
    674 ?        I      0:00 [nfsd]
    675 ?        I      0:00 [nfsd]
    679 ?        I      0:00 [nfsd]
    681 ?        I      0:00 [nfsd]
    682 ?        I      0:00 [nfsd]
    683 ?        I      0:00 [nfsd]
    684 ?        I      0:00 [nfsd]
    685 ?        I      0:00 [nfsd]
Here is what I get:

Code:
# systemctl status nfs-server.service
○ nfs-server.service - NFS server and services
     Loaded: loaded (/lib/systemd/system/nfs-server.service; enabled; preset: enabled)
    Drop-In: /run/systemd/generator/nfs-server.service.d
             └─order-with-mounts.conf
     Active: inactive (dead) since Mon 2024-09-09 20:25:44 EEST; 8s ago
   Duration: 2min 7.153s
    Process: 60879 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
    Process: 60880 ExecStart=/usr/sbin/rpc.nfsd (code=exited, status=0/SUCCESS)
    Process: 61649 ExecStop=/usr/sbin/rpc.nfsd 0 (code=exited, status=0/SUCCESS)
    Process: 61666 ExecStopPost=/usr/sbin/exportfs -au (code=exited, status=0/SUCCESS)
    Process: 61667 ExecStopPost=/usr/sbin/exportfs -f (code=exited, status=0/SUCCESS)
   Main PID: 60880 (code=exited, status=0/SUCCESS)
        CPU: 12ms

Sep 09 20:23:36 odin systemd[1]: Starting nfs-server.service - NFS server and services...
Sep 09 20:23:36 odin systemd[1]: Finished nfs-server.service - NFS server and services.
Sep 09 20:25:43 odin systemd[1]: Stopping nfs-server.service - NFS server and services...
Sep 09 20:25:44 odin systemd[1]: nfs-server.service: Deactivated successfully.
Sep 09 20:25:44 odin systemd[1]: Stopped nfs-server.service - NFS server and services.

Note the stopping and stopped nfs-server.service messages.

Here is the journalctl -xe message:

Code:
Sep 09 20:25:44 odin rpc.idmapd[60876]: exiting on signal 15
Sep 09 20:25:44 odin systemd[1]: Stopping nfs-idmapd.service - NFSv4 ID-name mapping service...
░░ Subject: A stop job for unit nfs-idmapd.service has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A stop job for unit nfs-idmapd.service has begun execution.
░░
░░ The job identifier is 2528.
Sep 09 20:25:44 odin systemd[1]: Stopping nfs-mountd.service - NFS Mount Daemon...
░░ Subject: A stop job for unit nfs-mountd.service has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A stop job for unit nfs-mountd.service has begun execution.
░░
░░ The job identifier is 2527.
Sep 09 20:25:44 odin rpc.mountd[60878]: Caught signal 15, un-registering and exiting.
Sep 09 20:25:44 odin systemd[1]: nfs-idmapd.service: Deactivated successfully.
░░ Subject: Unit succeeded
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit nfs-idmapd.service has successfully entered the 'dead' state.
Sep 09 20:25:44 odin systemd[1]: Stopped nfs-idmapd.service - NFSv4 ID-name mapping service.
░░ Subject: A stop job for unit nfs-idmapd.service has finished
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A stop job for unit nfs-idmapd.service has finished.
░░
░░ The job identifier is 2528 and the job result is done.
Sep 09 20:25:44 odin systemd[1]: nfs-mountd.service: Deactivated successfully.
░░ Subject: Unit succeeded
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit nfs-mountd.service has successfully entered the 'dead' state.
Sep 09 20:25:44 odin systemd[1]: Stopped nfs-mountd.service - NFS Mount Daemon.
░░ Subject: A stop job for unit nfs-mountd.service has finished
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A stop job for unit nfs-mountd.service has finished.
░░
░░ The job identifier is 2527 and the job result is done.
 
Please post your /etc/exports.
Here you go, sorry I forgot about it:

Code:
# cat /etc/exports
# /etc/exports: the access control list for filesystems which may be exported
#               to NFS clients.  See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes       hostname1(rw,sync,no_subtree_check) hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4        gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes  gss/krb5i(rw,sync,no_subtree_check)
#
/data1                      192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check)
/data2                      192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check)
/data2/media         192.168.1.25(rw,sync,no_root_squash,no_subtree_check)
/data2/media         192.168.1.26(rw,sync,root_squash,no_subtree_check)
/data2/media         192.168.1.120(rw,sync,root_squash,no_subtree_check)
/data2/media         192.168.1.71(rw,sync,root_squash,no_subtree_check)
/data2/media/movies  192.168.1.5(ro,sync,no_root_squash,no_subtree_check)
/mnt/pve/backup/recording/           192.168.1.250(rw,sync,no_root_squash,no_subtree_check)

It worked fine until yesterday.
 
Almost all nfs related services are inactive dead:

Code:
nfs-idmapd.service                               loaded    inactive dead NFSv4 ID-name mapping service
  nfs-mountd.service                               loaded    inactive dead NFS Mount Daemon
  nfs-server.service                               loaded    inactive dead NFS server and services
  nfs-utils.service                                loaded    inactive dead NFS server and client services

And they will die after I start them.
 
"It worked fine until yesterday." What changed? Update? Configuration change? One share, /mnt/pve/backup/recording, appears to be a mount point. Is that actually present? Maybe it went missing.

Another thing is this: Sep 09 20:25:44 odin rpc.idmapd[60876]: exiting on signal 15

Signal 15 is SIGTERM, which is a signal that has to be explicitly sent as opposed to being caused by an error. IOW, something is killing rpc.imapd on purpose. In fact, the systemd "stop" task seems to be running for all of them. Do you have any "system checker" type of scripts running? Any cron jobs?
 
Almost all nfs related services are inactive dead:

Code:
nfs-idmapd.service                               loaded    inactive dead NFSv4 ID-name mapping service
  nfs-mountd.service                               loaded    inactive dead NFS Mount Daemon
  nfs-server.service                               loaded    inactive dead NFS server and services
  nfs-utils.service                                loaded    inactive dead NFS server and client services

And they will die after I start them.

The nfs-mountd.service shows that v4.2 clients are attaching and then after a 1 to 5 mins all nfs services are dead
 
None of your exports look like NFSv4.
I agree! But how do you explain this then?

Code:
systemctl status nfs-mountd.service                                                                                                               odin: Mon Sep  9 23:45:25 2024

● nfs-mountd.service - NFS Mount Daemon
     Loaded: loaded (/lib/systemd/system/nfs-mountd.service; static)
     Active: active (running) since Mon 2024-09-09 23:44:27 EEST; 57s ago
    Process: 129872 ExecStart=/usr/sbin/rpc.mountd (code=exited, status=0/SUCCESS)
   Main PID: 129873 (rpc.mountd)
      Tasks: 1 (limit: 18695)
     Memory: 1.2M
        CPU: 16ms
     CGroup: /system.slice/nfs-mountd.service
             └─129873 /usr/sbin/rpc.mountd

Sep 09 23:44:27 odin systemd[1]: Starting nfs-mountd.service - NFS Mount Daemon...
Sep 09 23:44:27 odin rpc.mountd[129873]: Version 2.6.2 starting
Sep 09 23:44:27 odin systemd[1]: Started nfs-mountd.service - NFS Mount Daemon.
Sep 09 23:44:30 odin rpc.mountd[129873]: v4.2 client attached: 0x39b2d3fd66df5e2b from "192.168.1.251:843"

Also, how can I figure out who is sending the sigterm 15? I don't have anything like this.

What happened yesterday was a power cut. When I regained access the nfs-server service started doing this. I thought an update might fix it (I don't know what I was thinking) but it didn't.
 
Maybe check that mount point again. And /etc/default/nfs-common and nfs-kernel-server to see if there is any corruption or missing files.
Well shoot... I had an automount with systemd for that mount point. I changed it to fstab and bam, the problem is gone. nfs-server now starts automatically at reboot and it doesn't get killed off by whatever it was.

i don't get it... i thought systemd mounts and fstab were basically the same
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!