Web interface connexion not working after fail backup

Aug 30, 2021
5
0
6
41
Hello

we have a bakckup plan every saturday on our servers on a local storage.

On one of them the backup fail.
Code:
VMID    NAME    STATUS    TIME    SIZE    FILENAME
103    APP2    FAILED    00:24:33    vma_queue_write: write error - Broken pipe
104    VM 104    FAILED    00:00:00    unable to open file '/etc/pve/nodes/XXX/qemu-server/104.conf.tmp.9793' - Input/output error
106    VM 106    FAILED    00:00:00    unable to open file '/etc/pve/nodes/XXX/qemu-server/106.conf.tmp.9793' - Input/output error
107    VM 107    FAILED    00:00:00    unable to open file '/etc/pve/nodes/XXX/qemu-server/107.conf.tmp.9793' - Input/output error
108    VM 108    FAILED    00:00:00    unable to open file '/etc/pve/nodes/XXX/qemu-server/108.conf.tmp.9793' - Input/output error
TOTAL    00:24:33    0KB

If we look the detail log it crash at 41%
Code:
103: 2021-08-28 05:24:33 INFO:  41% (211.7 GiB of 505.0 GiB) in 24m 27s, read: 107.0 MiB/s, write: 107.0 MiB/s
103: 2021-08-28 05:24:33 ERROR: vma_queue_write: write error - Broken pipe
103: 2021-08-28 05:24:33 INFO: aborting backup job
103: 2021-08-28 05:24:33 INFO: resuming VM again
103: 2021-08-28 05:24:35 ERROR: Backup of VM 103 failed - vma_queue_write: write error - Broken pipe

After that i can only log with ssh on the server. When i try to connect with the webinterface using different realm ( Linux PAM or Active directory )it says the login Failed.

When i look on the server with journalctl i have a lot of the following lines
Code:
 unable to write lrm status file - unable to open file '/etc/pve/nodes/nsXXX/lrm_status.tmp.4177' - Input/output error
 authkey rotation error: cfs-lock 'authkey' error: got lock request timeout

The only way to deal with it is to restart the service. It's not very convenient as it is a PROD/LIVE server.

I already try several command
Bash:
service pvedaemon restart
service pveproxy restart
service pvestatd restart

Here is my space available

Bash:
root@ns3181572:/mnt# df -h
Filesystem        Size  Used Avail Use% Mounted on
udev               63G     0   63G   0% /dev
tmpfs              13G  498M   13G   4% /run
rpool/ROOT/pve-1  1.1T  820G  211G  80% /
tmpfs              63G   43M   63G   1% /dev/shm
tmpfs             5.0M     0  5.0M   0% /run/lock
tmpfs              63G     0   63G   0% /sys/fs/cgroup
rpool             211G  128K  211G   1% /rpool
rpool/ROOT        211G  128K  211G   1% /rpool/ROOT
rpool/data        211G  128K  211G   1% /rpool/data
/dev/fuse          30M   32K   30M   1% /etc/pve
tmpfs              13G     0   13G   0% /run/user/0

Is there any way to resolve the issue without restarting the server?

Thanks in advance