Power hit on server and now system unable to mount nfs shares. Services failing including sssd.

idic

Member
Mar 13, 2020
7
1
8
51
Ok .. so i'm a idiot. Lets just get that out of the way. New server and I don't have it plugged in to battery.

We had some bad weather yesterday and there was a power blip. Server's power supply tripped from a spike and the system was powered down when I got home. I started up the system and it started fine. Everything seemed to be okay and all VMs and containers also started up. However it very soon became apparent that things were not good. The NFS mounts to the server were not working externally or internally. I'm getting "clnt_create: RPC: Program not registered" error externally.

I figured it was disk related because of the hit and ran zpool scrub on both my pools. They both ran and both didn't repair anything.

The access to the pools seem fine as the vms live there. I think it's just the nfs sharing that is the problem.

Prox version
Code:
# pveversion
pve-manager/6.4-14/15e2bf61 (running kernel: 5.4.174-2-pve)

When i check the services i'm seeing the following in failed state:
Code:
 mnt-tank.mount                           loaded failed     failed    /mnt/tank
 nfs-server.service                       loaded failed     failed    NFS server and services
 rpc-svcgssd.service                      loaded failed     failed    RPC security service for NFS server
 sssd.service                             loaded failed     failed    System Security Services Daemon

mnt-tank.mount restart error
Code:
Jun 15 19:12:15 storage1 mount[1048222]: mount.nfs: requested NFS version or transport protocol is not supported
Jun 15 19:12:15 storage1 systemd[1]: mnt-tank.mount: Mount process exited, code=exited, status=32/n/a

nfs-server.service restart successful


rpc-svcgssd.service restart error
Code:
# systemctl status rpc-svcgssd.service
● rpc-svcgssd.service - RPC security service for NFS server
   Loaded: loaded (/lib/systemd/system/rpc-svcgssd.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2022-06-15 19:14:35 MDT; 1min 10s ago
  Process: 1051656 ExecStart=/usr/sbin/rpc.svcgssd $SVCGSSDARGS (code=exited, status=1/FAILURE)

Jun 15 19:14:35 storage1 systemd[1]: Starting RPC security service for NFS server...
Jun 15 19:14:35 storage1 rpc.svcgssd[1051657]: ERROR: GSS-API: error in gss_acquire_cred(): GSS_S_FAILURE (Unspecified GSS failure.  Minor code may provide more information) - No key table entry found matching nfs/@
Jun 15 19:14:35 storage1 rpc.svcgssd[1051657]: unable to obtain root (machine) credentials
Jun 15 19:14:35 storage1 rpc.svcgssd[1051657]: do you have a keytab entry for nfs/<your.host>@<YOUR.REALM> in /etc/krb5.keytab?
Jun 15 19:14:35 storage1 systemd[1]: rpc-svcgssd.service: Control process exited, code=exited, status=1/FAILURE
Jun 15 19:14:35 storage1 systemd[1]: rpc-svcgssd.service: Failed with result 'exit-code'.
Jun 15 19:14:35 storage1 systemd[1]: Failed to start RPC security service for NFS server.

sssd.service
Code:
# systemctl status sssd.service
● sssd.service - System Security Services Daemon
   Loaded: loaded (/lib/systemd/system/sssd.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2022-06-15 19:17:17 MDT; 7s ago
  Process: 1055680 ExecStart=/usr/sbin/sssd -i ${DEBUG_LOGGER} (code=exited, status=4)
 Main PID: 1055680 (code=exited, status=4)

Jun 15 19:17:17 storage1 systemd[1]: Starting System Security Services Daemon...
Jun 15 19:17:17 storage1 sssd[1055680]: SSSD couldn't load the configuration database [5]: Input/output error.
Jun 15 19:17:17 storage1 systemd[1]: sssd.service: Main process exited, code=exited, status=4/NOPERMISSION
Jun 15 19:17:17 storage1 systemd[1]: sssd.service: Failed with result 'exit-code'.
Jun 15 19:17:17 storage1 systemd[1]: Failed to start System Security Services Daemon.

Also in log file
Code:
-- The job identifier is 4954 and the job result is failed.
Jun 15 13:18:02 storage1 systemd[1]: /lib/systemd/system/sssd.service:11: PIDFile= references path below legacy directory /var/run/, updating /var/run/sssd.pid → /run/sssd.pid; please update the unit file accordingly.


I've tried a few things at this point including trying to manually mount things, reboots, the disk scans, removing the sssd folder and reinstalling it but nothing changed anything.

I'm just not sure which way to go at this point. Any suggestions?
 
Last edited:
Hi,
have you tried connecting to the NFS form another machine?

What happens when you call pvesm scan nfs <hostname/ip>?
 
  • Like
Reactions: idic
Ok, at this point i don't know what i did and rpc-svcgssd.service i still cant get running. I think that's a configuration issue ... but I've fixed it somehow. I got the mount servcies going and things are running normal enough.
 
  • Like
Reactions: shrdlicka

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!