NFS soft option causes I/O errors

Stefano Giunchi

Renowned Member
Jan 17, 2016
88
13
73
50
Forlì, Italy
www.soasi.com
I had the backup storage mounted with NFS shared by a NAS.
Until last week I always used the default "hard" connection.
If NAS dies during backup, the VM which was backed up freezes until I don't force unmount the NFS share.

Then I found the "soft" NFS option in this thread, and I tried to use it.
After that, the largest (1tb) VM failed constantly the backup:

Code:
[...]
INFO:  97% (1.0 TiB of 1.0 TiB) in 2h 10m 24s, read: 104.2 MiB/s, write: 102.7 MiB/s
INFO:  98% (1.0 TiB of 1.0 TiB) in 2h 12m 11s, read: 101.2 MiB/s, write: 99.5 MiB/s
INFO:  99% (1.0 TiB of 1.0 TiB) in 2h 13m 51s, read: 109.3 MiB/s, write: 107.9 MiB/s
zstd: /*stdout*\: Input/output error
INFO: 100% (1.0 TiB of 1.0 TiB) in 2h 14m 17s, read: 416.4 MiB/s, write: 114.2 MiB/s
INFO: backup is sparse: 293.93 GiB (27%) total zero data
INFO: transferred 1.04 TiB in 8057 seconds (135.2 MiB/s)
Warning: unable to close filehandle GEN5914 properly: Input/output error at /usr/share/perl5/PVE/VZDump/QemuServer.pm line 811.
ERROR: Backup of VM 303 failed - zstd --rsyncable --threads=1 failed - wrong exit status 1
INFO: Failed at 2023-03-07 01:09:10
INFO: Backup job finished with errors
TASK ERROR: job errors

I had to revert back to the standard NFS hard.
I'm dubious also if NFS soft can be a reliable choice for backups, or it could hide undetected data corruption.

proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-helper: 7.3-3
pve-kernel-5.15: 7.3-1
pve-kernel-5.4: 6.4-18
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.4.189-2-pve: 5.4.189-2
ceph: 15.2.17-pve1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-2
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
 
Last edited:
Hi,
yes, the man page says that it can cause (even silent!) corruption in some cases:
Code:
NB: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only when client responsiveness is more important than  data  integrity.
Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option.

And yes, unfortunately, disappearing NFS servers are not really handled gracefully, but in a production system that should also not be a common scenario.
 
And yes, unfortunately, disappearing NFS servers are not really handled gracefully, but in a production system that should also not be a common scenario.

Unfortunately, bad things happen.

I think it's better to keep the VM running, and get a "failed backup" error, than having the VM hung because the backup NAS is crashed.
If the corruption is really "silent", and no error is received from vzdump, I agree with you that "soft" must not be used.

I've read some documentation, it is suggested to increase the retrans option if using soft and I would like to try it.

This is my storage:
Code:
nfs: NAS-DAILY
        export /volume1/backup01/proxmox_daily
        path /mnt/pve/NAS-DAILY
        server 10.10.3.105
        content vztmpl,backup
        options soft,retrans=6
        prune-backups keep-last=3

But the share is always mounted with retrans=2:
Code:
# mount
[...]
10.10.3.105:/volume1/backup01/proxmox_daily on /mnt/pve/NAS-DAILY type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.10.3.105,mountvers=3,mountport=892,mountproto=udp,local_lock=none,addr=10.10.3.105)

How can I pass it?

Thanks
 
Did you unmount the storage after adding the option? Proxmox VE will re-mount the storage automatically, but not automatically unmount it after you change an option.
 
Did you unmount the storage after adding the option? Proxmox VE will re-mount the storage automatically, but not automatically unmount it after you change an option.
I disabled and the re-enabled the storage. I thought it would umount and remount, but it doesn't.
After unmounting manually the storage, it has remounted with the retrans=5 option.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!