Proxmox VE lose connection with NFS and everything stops working! Plz Help!!!

GrGamerAmigo

New Member
Nov 4, 2024
2
0
1
Hi everyone! First time posting here because of a major problem during the last few days with IO delay and NFS mounted storage.
I will try to take it from the top but im certain i will forget something so be kind and please ask me what info i forgot and i will try to post it ASAP.

I am very new in the PROXMOX and Linux domain. The setup i have a friend helped me build it, who now is unavailable, so im left alone to troubleshoot problems with help of google, reddit and chatgtp. Not the best i know...

Setup:
I got a Dell sff machine where i host Proxmox.
CPU: 4 x Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz (1 Socket)
Ram: 16GB
SSD: 250 GB
Kernel Version: Linux 6.8.12-3-pve (2024-10-23T11:41Z)
Manager Version: pve-manager/8.2.7/3e0176e6bb2ade3b

In Proxmox im running 1 VM and 2 unprivileged LXC containers.
VM: I dont think its important. A simple Mikrotik Router VM so i can use WireGuard. It has 1GB of RAM, 1 socket 2 cores and 512MB of storage.
1st LXC: It is where i host Plex Server and a few other services in Docker. 4GB of Memory, 4GB of Swap, 4 Cores and 55GB of local Storage.
2nd LXC: Nextcloud AIO. 8 GB of RAM, 2 GB of Swap, 2 Cores and 41GB of NFS "Remote" Storage.

Storages:
Local: 71GB and im using 38GB
Local: LVM 147.5 GB and im using 60GB
Mounted Storage: Proxmox-Backups 2TB and im using 645 GB.

The mounted storage is mounted through /etc/pve/storage.cfg
Code:
dir: local
        path /var/lib/vz
        content vztmpl,backup,iso

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

nfs: Proxmox-Backup
        export /export/proxmox-backups
        path /mnt/pve/Proxmox-Backup
        server xxx.xxx.xxx.xxx (Seperate Machine that runs OMV)
        content images,iso,rootdir,backup,snippets,vztmpl
        prune-backups keep-all=1

That is the main Proxmox machine. Now because i need space for my media files and nextcloud files and backups i have a diffrent machine (Pretty old) that only runs OMV.

Specs of OMV
CPU: Intel(R) Core(TM)2 Quad CPU Q8300 @ 2.50GHz
Ram: 4GB
SSD: 120GB
Kernel: Linux 6.1.0-0.deb11.21-amd64
2x WDC WD20EFAX-68FB5N0 2TB
Those are RAID 1. Disks i use for "NAS" Storage. Those are the ones that are used as Proxmox-Backup storage as i said above and they have stored the backups of the containers. Also they are used for Nextcloud to run. They are shared to Proxmox as NFS.
2x WDC WD20EZBX-00AYRA0 2TB
Those i have used Mergerfs to store my media for Plex.

The Proxmox machine and the OMV machine are connected to the home LAN by a TP-LINK TL-SG105 v8 Unmanaged L2 Switch.

After that intro now to the problem.
The last few days i was playing around by setting up a Home Assistant LXC. The reason im bringing this up is because i wrongly used Proxmox-Backup storage to set it up. But i have since the problem deleted it and purged it. But the problem remains.

THE PROBLEM!!!!

Yesterday we had a power failure in the neighborhood. After the power was restored (yeah i dont have a ups.... Im planning on buying one...) for some reason Proxmox IO Delay hits 90 - 98% and it stays there for random amounts of time. Some times 10 minutes. Sometimes 20 minutes. And it keeps happening at random times as well.
I think i have isolated the problem to the Proxmox-Backup NFS storage. Every time IO Delay shoots up, the storage goes offline. When IO Delay drops down the storages comes back online. During the time the storage is unavailable, Nextcloud is unavailable and a few other services from the media container (plex stops playing videos). But im guessing media LXC stops working cause of the IO Delay.

Now during the times i lose connection to the Proxmox-Backup NFS storage i ping the OMV server through proxmox host and i get a responce. But if i try to iperf3 the connection between them the proccess hangs. Nothing shows. But if i do the same from my personal pc to OMV it works fine.

If i use journalctl | grep -i nfs i get the following results.

Code:
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 OK
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 OK
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 OK
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 OK

Somewhere in between these lines i also get

Code:
Nov 04 17:00:15 proxmox kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
Nov 04 17:00:15 proxmox systemd[1]: rpc-gssd.service - RPC security service for NFS client and server was skipped because of an unmet condition check (ConditionPathExists=/etc/krb5.keytab).
Nov 04 17:00:15 proxmox systemd[1]: Reached target nfs-client.target - NFS client services.
Nov 04 17:00:16 proxmox systemd[1]: Starting rpc-statd-notify.service - Notify NFS peers of a restart...
Nov 04 17:00:16 proxmox systemd[1]: Started rpc-statd-notify.service - Notify NFS peers of a restart.
Nov 04 17:00:34 proxmox kernel: NFS: Registering the id_resolver key type
Nov 04 17:00:34 proxmox systemd[1]: Starting rpc-statd.service - NFS status monitor for NFSv2/3 locking....
Nov 04 17:00:34 proxmox systemd[1]: Started rpc-statd.service - NFS status monitor for NFSv2/3 locking..
Nov 04 17:00:34 proxmox nfsrahead[4494]: setting /mnt/pve/Proxmox-Backup readahead to 128
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:45 proxmox kernel:  nfs_start_io_write+0x19/0x60 [nfs]
Nov 04 17:24:45 proxmox kernel:  nfs_file_write+0xb5/0x2a0 [nfs]
Nov 04 17:24:45 proxmox kernel:  nfs_start_io_write+0x19/0x60 [nfs]
Nov 04 17:24:45 proxmox kernel:  nfs_file_write+0xb5/0x2a0 [nfs]
Nov 04 17:24:45 proxmox kernel:  nfs_start_io_write+0x19/0x60 [nfs]
Nov 04 17:24:45 proxmox kernel:  nfs_file_write+0xb5/0x2a0 [nfs]
Nov 04 17:24:45 proxmox kernel:  nfs_write_begin+0x52/0x1e0 [nfs]
Nov 04 17:24:45 proxmox kernel:  nfs_file_write+0x19b/0x2a0 [nfs]
Nov 04 17:24:45 proxmox kernel:  ? nfs_pageio_complete+0xee/0x140 [nfs]
Nov 04 17:24:45 proxmox kernel:  nfs_file_fsync+0x99/0x1d0 [nfs]
Nov 04 17:29:59 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:29:59 proxmox kernel: nfs: server 192.168.1.6 not respo

I am at my wits end. My lack of knowledge in Linux commands isnt helping...

So please someone help identify why the storage loses connection and why IO Delay shoots up.

Im sure i havent posted critical information cause i might not know a command that will provide that information so please ask me to run anything that will help solve this problem!

Sorry for the very long post and thanks everyone for their time!

Edit 1: I keep trying things. Now i have shutdown the Nextcloud LXC since this is the one that mainly uses Proxmox-Backups but and the only diffrence is that i no longer get IO delay. But the Proxmox-Backup NFS storage still goes offline for long periods of time and create problems to media container...
 
Last edited:
hey,

your problem is:
"when you use your NFS Storage, this last goes offline without many more explications."
But, when you're experiencing the problem, the PVE and the NAS ping each other.
You've spoke about an very old hardware with your omv. All of yours test seem's to be on PVE, CT/VM, persoPC, but not from your OMV host. Did you check the logs for NFS / system for OMV?
 
This is a fairly common phenomenon. You're experiencing two issues, one as a consequence of the other.

if/when you saturate your nfs target its latency will necessarily increase. as it increases, it can and does lock up the proxmox metrics daemon (pve-statd) which causes the ui to turn into question marks. with PVE, the most common cause for this is vzdump, which is why vzdump config contains a throttling mechanism.

your nfs target can become saturated either due to demand overload (eg, more requests then it can handle) or subsystem issues (eg, raid rebuild, defective disk, etc.) you need to shift your troubleshooting attention to the nfs provider.
 
Update

Tnx for the replies. I had an update a day ago but too busy to post it.
So...
After a lot of checking and trying various stuff, like disconnecting storages and reconnecting them i stumbled across something weird. I tried pinging the OMV machine from Proxmox. Everything looked fine. I got a reply in every ping. Only small problem was that at some points the responce took like 5 - 10ms. Not great but i wouldnt think it was a major problem. 75-80% of the responces was at 0.200ms. Then i pinged Proxmox from OMV machine. There was a major problem.

When proxmox was getting a responce at 5 - 10ms the omv machines ping was losing packets. So omv was losing around 20% of packets. The rest of 80% was great at 0.200ms.

Then i tried the same thing from my personal windows machine. I had absolutely no problem with either the proxmox machine or omv machine. Every ping and responce was great.

Then i run them all together. 80% was fine. But the other 20% Proxmox was going 5 - 10ms only for OMV while on windows machine was great. OMV was losing 20% packets from Proxmox but with windows was great, and Windows machine was great with everything!

Im so confused.. I though maybe the power failure did something in the OMV ethernet adapter and i bought a new one to install. The weird thing is that it only greated a problem between OMV and Proxmox but not between OMV and Windows? Either way before i installed the new adapter i found a configuration in OMV network adapter option which was WAKE ON LAN. I know that typically this option is to wake up the computer or power it but i thought maybe somehow OMV was going on a low power mode (dont even know if this exists????) and maybe because proxmox was the only machine actively using OMV maybe the option would help to never lose connection?

Result is that after i enabled the option seems like everything runs fine. At least i think so cause i hadnt much time to make tests. Plex is playing. Things are downloading. Pings looks fine. So i guess its fixed? No idea. Will keep checking though.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!