NFS soft option causes I/O errors

Stefano Giunchi · Mar 13, 2023

I had the backup storage mounted with NFS shared by a NAS.
Until last week I always used the default "hard" connection.
If NAS dies during backup, the VM which was backed up freezes until I don't force unmount the NFS share.

Then I found the "soft" NFS option in this thread, and I tried to use it.
After that, the largest (1tb) VM failed constantly the backup:

Code:

[...]
INFO:  97% (1.0 TiB of 1.0 TiB) in 2h 10m 24s, read: 104.2 MiB/s, write: 102.7 MiB/s
INFO:  98% (1.0 TiB of 1.0 TiB) in 2h 12m 11s, read: 101.2 MiB/s, write: 99.5 MiB/s
INFO:  99% (1.0 TiB of 1.0 TiB) in 2h 13m 51s, read: 109.3 MiB/s, write: 107.9 MiB/s
zstd: /*stdout*\: Input/output error
INFO: 100% (1.0 TiB of 1.0 TiB) in 2h 14m 17s, read: 416.4 MiB/s, write: 114.2 MiB/s
INFO: backup is sparse: 293.93 GiB (27%) total zero data
INFO: transferred 1.04 TiB in 8057 seconds (135.2 MiB/s)
Warning: unable to close filehandle GEN5914 properly: Input/output error at /usr/share/perl5/PVE/VZDump/QemuServer.pm line 811.
ERROR: Backup of VM 303 failed - zstd --rsyncable --threads=1 failed - wrong exit status 1
INFO: Failed at 2023-03-07 01:09:10
INFO: Backup job finished with errors
TASK ERROR: job errors

I had to revert back to the standard NFS hard.
I'm dubious also if NFS soft can be a reliable choice for backups, or it could hide undetected data corruption.

proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-helper: 7.3-3
pve-kernel-5.15: 7.3-1
pve-kernel-5.4: 6.4-18
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.4.189-2-pve: 5.4.189-2
ceph: 15.2.17-pve1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-2
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

fiona · Mar 13, 2023

Hi,
yes, the man page says that it can cause (even silent!) corruption in some cases:

Code:

NB: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only when client responsiveness is more important than  data  integrity.
Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option.

And yes, unfortunately, disappearing NFS servers are not really handled gracefully, but in a production system that should also not be a common scenario.

Stefano Giunchi · Mar 14, 2023

fiona said:
And yes, unfortunately, disappearing NFS servers are not really handled gracefully, but in a production system that should also not be a common scenario.

Unfortunately, bad things happen.

I think it's better to keep the VM running, and get a "failed backup" error, than having the VM hung because the backup NAS is crashed.
If the corruption is really "silent", and no error is received from vzdump, I agree with you that "soft" must not be used.

I've read some documentation, it is suggested to increase the retrans option if using soft and I would like to try it.

This is my storage:

Code:

nfs: NAS-DAILY
        export /volume1/backup01/proxmox_daily
        path /mnt/pve/NAS-DAILY
        server 10.10.3.105
        content vztmpl,backup
        options soft,retrans=6
        prune-backups keep-last=3

But the share is always mounted with retrans=2:

Code:

# mount
[...]
10.10.3.105:/volume1/backup01/proxmox_daily on /mnt/pve/NAS-DAILY type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.10.3.105,mountvers=3,mountport=892,mountproto=udp,local_lock=none,addr=10.10.3.105)

How can I pass it?

Thanks

fiona · Mar 14, 2023

Did you unmount the storage after adding the option? Proxmox VE will re-mount the storage automatically, but not automatically unmount it after you change an option.

Stefano Giunchi · Mar 15, 2023

fiona said:
Did you unmount the storage after adding the option? Proxmox VE will re-mount the storage automatically, but not automatically unmount it after you change an option.

I disabled and the re-enabled the storage. I thought it would umount and remount, but it doesn't.
After unmounting manually the storage, it has remounted with the retrans=5 option.

Search

Search

NFS soft option causes I/O errors

Stefano Giunchi

Renowned Member

fiona

Proxmox Staff Member

Stefano Giunchi

Renowned Member

fiona

Proxmox Staff Member

Stefano Giunchi

Renowned Member

We value your privacy