Ran out of disk space getting input/output error

Phil Freeman

Active Member
Aug 18, 2016
2
0
41
49
Hi all, a few days ago I ran out of disk space and half of my VM's showed an exclamation and weren't responsive.

I cleared some space by deleting some ISO files, and then STOPped and started the affected VM's and they seemed to boot back up fine.

Now a few days later I'm unable to log into proxmox via the web-ui. I'm able to ssh in, and looking at the log files, it seems that even after I cleared up the space and restarted the VM's, I've been getting Input/Output errors such as:

Jul 28 18:43:13 legolas pve-ha-lrm[6940]: unable to write lrm status file - unable to delete old temp file: Input/output error

Jul 28 18:48:08 legolas pvesr[25183]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 28 18:48:08 legolas pve-ha-lrm[6940]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 28 18:48:09 legolas pvesr[25183]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 28 18:48:10 legolas pvesr[25183]: cfs-lock 'file-replication_cfg' error: got lock request timeout
Jul 28 18:48:10 legolas systemd[1]: pvesr.service: Main process exited, code=exited, status=5/NOTINSTALLED
Jul 28 18:48:10 legolas systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jul 28 18:48:10 legolas systemd[1]: Failed to start Proxmox VE replication runner.

Aug 1 17:03:31 legolas pvestatd[6897]: authkey rotation error: cfs-lock 'authkey' error: got lock request timeout
Aug 1 17:03:31 legolas pvestatd[6897]: status update time (9.110 seconds)


Is there something I can do to clear these errors without rebooting the proxmox server? I suspect they are residual from when I was out of disk space.
 
Had same issue, did a systemctl restart pve-cluster and it cleared up.
This one was the winner for me. The symptom that led me here on PVE 8.4.1 was the inability to login via the root account. I was able to get a shell via SSH (putty or whatever you use) and could see the /snap/certbot and /snap/core mounts were at 100%. The full disks could be a red herring because there are still a few full but I could also see the errors above by running "journalctl -r". Issuing the restart via "systemctl restart pve-cluster" was the only thing that worked for me. Thanks @mathx !
 
  • Like
Reactions: mathx