Hi everyone! First time posting here because of a major problem during the last few days with IO delay and NFS mounted storage.
I will try to take it from the top but im certain i will forget something so be kind and please ask me what info i forgot and i will try to post it ASAP.
I am very new in the PROXMOX and Linux domain. The setup i have a friend helped me build it, who now is unavailable, so im left alone to troubleshoot problems with help of google, reddit and chatgtp. Not the best i know...
Setup:
I got a Dell sff machine where i host Proxmox.
CPU: 4 x Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz (1 Socket)
Ram: 16GB
SSD: 250 GB
Kernel Version: Linux 6.8.12-3-pve (2024-10-23T11:41Z)
Manager Version: pve-manager/8.2.7/3e0176e6bb2ade3b
In Proxmox im running 1 VM and 2 unprivileged LXC containers.
VM: I dont think its important. A simple Mikrotik Router VM so i can use WireGuard. It has 1GB of RAM, 1 socket 2 cores and 512MB of storage.
1st LXC: It is where i host Plex Server and a few other services in Docker. 4GB of Memory, 4GB of Swap, 4 Cores and 55GB of local Storage.
2nd LXC: Nextcloud AIO. 8 GB of RAM, 2 GB of Swap, 2 Cores and 41GB of NFS "Remote" Storage.
Storages:
Local: 71GB and im using 38GB
Local: LVM 147.5 GB and im using 60GB
Mounted Storage: Proxmox-Backups 2TB and im using 645 GB.
The mounted storage is mounted through /etc/pve/storage.cfg
That is the main Proxmox machine. Now because i need space for my media files and nextcloud files and backups i have a diffrent machine (Pretty old) that only runs OMV.
Specs of OMV
CPU: Intel(R) Core(TM)2 Quad CPU Q8300 @ 2.50GHz
Ram: 4GB
SSD: 120GB
Kernel: Linux 6.1.0-0.deb11.21-amd64
2x WDC WD20EFAX-68FB5N0 2TB
Those are RAID 1. Disks i use for "NAS" Storage. Those are the ones that are used as Proxmox-Backup storage as i said above and they have stored the backups of the containers. Also they are used for Nextcloud to run. They are shared to Proxmox as NFS.
2x WDC WD20EZBX-00AYRA0 2TB
Those i have used Mergerfs to store my media for Plex.
The Proxmox machine and the OMV machine are connected to the home LAN by a TP-LINK TL-SG105 v8 Unmanaged L2 Switch.
After that intro now to the problem.
The last few days i was playing around by setting up a Home Assistant LXC. The reason im bringing this up is because i wrongly used Proxmox-Backup storage to set it up. But i have since the problem deleted it and purged it. But the problem remains.
THE PROBLEM!!!!
Yesterday we had a power failure in the neighborhood. After the power was restored (yeah i dont have a ups.... Im planning on buying one...) for some reason Proxmox IO Delay hits 90 - 98% and it stays there for random amounts of time. Some times 10 minutes. Sometimes 20 minutes. And it keeps happening at random times as well.
I think i have isolated the problem to the Proxmox-Backup NFS storage. Every time IO Delay shoots up, the storage goes offline. When IO Delay drops down the storages comes back online. During the time the storage is unavailable, Nextcloud is unavailable and a few other services from the media container (plex stops playing videos). But im guessing media LXC stops working cause of the IO Delay.
Now during the times i lose connection to the Proxmox-Backup NFS storage i ping the OMV server through proxmox host and i get a responce. But if i try to iperf3 the connection between them the proccess hangs. Nothing shows. But if i do the same from my personal pc to OMV it works fine.
If i use journalctl | grep -i nfs i get the following results.
Somewhere in between these lines i also get
I am at my wits end. My lack of knowledge in Linux commands isnt helping...
So please someone help identify why the storage loses connection and why IO Delay shoots up.
Im sure i havent posted critical information cause i might not know a command that will provide that information so please ask me to run anything that will help solve this problem!
Sorry for the very long post and thanks everyone for their time!
Edit 1: I keep trying things. Now i have shutdown the Nextcloud LXC since this is the one that mainly uses Proxmox-Backups but and the only diffrence is that i no longer get IO delay. But the Proxmox-Backup NFS storage still goes offline for long periods of time and create problems to media container...
I will try to take it from the top but im certain i will forget something so be kind and please ask me what info i forgot and i will try to post it ASAP.
I am very new in the PROXMOX and Linux domain. The setup i have a friend helped me build it, who now is unavailable, so im left alone to troubleshoot problems with help of google, reddit and chatgtp. Not the best i know...
Setup:
I got a Dell sff machine where i host Proxmox.
CPU: 4 x Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz (1 Socket)
Ram: 16GB
SSD: 250 GB
Kernel Version: Linux 6.8.12-3-pve (2024-10-23T11:41Z)
Manager Version: pve-manager/8.2.7/3e0176e6bb2ade3b
In Proxmox im running 1 VM and 2 unprivileged LXC containers.
VM: I dont think its important. A simple Mikrotik Router VM so i can use WireGuard. It has 1GB of RAM, 1 socket 2 cores and 512MB of storage.
1st LXC: It is where i host Plex Server and a few other services in Docker. 4GB of Memory, 4GB of Swap, 4 Cores and 55GB of local Storage.
2nd LXC: Nextcloud AIO. 8 GB of RAM, 2 GB of Swap, 2 Cores and 41GB of NFS "Remote" Storage.
Storages:
Local: 71GB and im using 38GB
Local: LVM 147.5 GB and im using 60GB
Mounted Storage: Proxmox-Backups 2TB and im using 645 GB.
The mounted storage is mounted through /etc/pve/storage.cfg
Code:
dir: local
path /var/lib/vz
content vztmpl,backup,iso
lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir
nfs: Proxmox-Backup
export /export/proxmox-backups
path /mnt/pve/Proxmox-Backup
server xxx.xxx.xxx.xxx (Seperate Machine that runs OMV)
content images,iso,rootdir,backup,snippets,vztmpl
prune-backups keep-all=1
That is the main Proxmox machine. Now because i need space for my media files and nextcloud files and backups i have a diffrent machine (Pretty old) that only runs OMV.
Specs of OMV
CPU: Intel(R) Core(TM)2 Quad CPU Q8300 @ 2.50GHz
Ram: 4GB
SSD: 120GB
Kernel: Linux 6.1.0-0.deb11.21-amd64
2x WDC WD20EFAX-68FB5N0 2TB
Those are RAID 1. Disks i use for "NAS" Storage. Those are the ones that are used as Proxmox-Backup storage as i said above and they have stored the backups of the containers. Also they are used for Nextcloud to run. They are shared to Proxmox as NFS.
2x WDC WD20EZBX-00AYRA0 2TB
Those i have used Mergerfs to store my media for Plex.
The Proxmox machine and the OMV machine are connected to the home LAN by a TP-LINK TL-SG105 v8 Unmanaged L2 Switch.
After that intro now to the problem.
The last few days i was playing around by setting up a Home Assistant LXC. The reason im bringing this up is because i wrongly used Proxmox-Backup storage to set it up. But i have since the problem deleted it and purged it. But the problem remains.
THE PROBLEM!!!!
Yesterday we had a power failure in the neighborhood. After the power was restored (yeah i dont have a ups.... Im planning on buying one...) for some reason Proxmox IO Delay hits 90 - 98% and it stays there for random amounts of time. Some times 10 minutes. Sometimes 20 minutes. And it keeps happening at random times as well.
I think i have isolated the problem to the Proxmox-Backup NFS storage. Every time IO Delay shoots up, the storage goes offline. When IO Delay drops down the storages comes back online. During the time the storage is unavailable, Nextcloud is unavailable and a few other services from the media container (plex stops playing videos). But im guessing media LXC stops working cause of the IO Delay.
Now during the times i lose connection to the Proxmox-Backup NFS storage i ping the OMV server through proxmox host and i get a responce. But if i try to iperf3 the connection between them the proccess hangs. Nothing shows. But if i do the same from my personal pc to OMV it works fine.
If i use journalctl | grep -i nfs i get the following results.
Code:
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 OK
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 OK
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 OK
Nov 04 20:39:56 proxmox kernel: nfs: server 192.168.1.6 OK
Somewhere in between these lines i also get
Code:
Nov 04 17:00:15 proxmox kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
Nov 04 17:00:15 proxmox systemd[1]: rpc-gssd.service - RPC security service for NFS client and server was skipped because of an unmet condition check (ConditionPathExists=/etc/krb5.keytab).
Nov 04 17:00:15 proxmox systemd[1]: Reached target nfs-client.target - NFS client services.
Nov 04 17:00:16 proxmox systemd[1]: Starting rpc-statd-notify.service - Notify NFS peers of a restart...
Nov 04 17:00:16 proxmox systemd[1]: Started rpc-statd-notify.service - Notify NFS peers of a restart.
Nov 04 17:00:34 proxmox kernel: NFS: Registering the id_resolver key type
Nov 04 17:00:34 proxmox systemd[1]: Starting rpc-statd.service - NFS status monitor for NFSv2/3 locking....
Nov 04 17:00:34 proxmox systemd[1]: Started rpc-statd.service - NFS status monitor for NFSv2/3 locking..
Nov 04 17:00:34 proxmox nfsrahead[4494]: setting /mnt/pve/Proxmox-Backup readahead to 128
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:21 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:24:45 proxmox kernel: nfs_start_io_write+0x19/0x60 [nfs]
Nov 04 17:24:45 proxmox kernel: nfs_file_write+0xb5/0x2a0 [nfs]
Nov 04 17:24:45 proxmox kernel: nfs_start_io_write+0x19/0x60 [nfs]
Nov 04 17:24:45 proxmox kernel: nfs_file_write+0xb5/0x2a0 [nfs]
Nov 04 17:24:45 proxmox kernel: nfs_start_io_write+0x19/0x60 [nfs]
Nov 04 17:24:45 proxmox kernel: nfs_file_write+0xb5/0x2a0 [nfs]
Nov 04 17:24:45 proxmox kernel: nfs_write_begin+0x52/0x1e0 [nfs]
Nov 04 17:24:45 proxmox kernel: nfs_file_write+0x19b/0x2a0 [nfs]
Nov 04 17:24:45 proxmox kernel: ? nfs_pageio_complete+0xee/0x140 [nfs]
Nov 04 17:24:45 proxmox kernel: nfs_file_fsync+0x99/0x1d0 [nfs]
Nov 04 17:29:59 proxmox kernel: nfs: server 192.168.1.6 not responding, still trying
Nov 04 17:29:59 proxmox kernel: nfs: server 192.168.1.6 not respo
I am at my wits end. My lack of knowledge in Linux commands isnt helping...
So please someone help identify why the storage loses connection and why IO Delay shoots up.
Im sure i havent posted critical information cause i might not know a command that will provide that information so please ask me to run anything that will help solve this problem!
Sorry for the very long post and thanks everyone for their time!
Edit 1: I keep trying things. Now i have shutdown the Nextcloud LXC since this is the one that mainly uses Proxmox-Backups but and the only diffrence is that i no longer get IO delay. But the Proxmox-Backup NFS storage still goes offline for long periods of time and create problems to media container...
Last edited: