Unable to login - I/O Error

shadowq

Renowned Member
Mar 12, 2013
32
2
73
Hi there,

This is the second time I've ran into this issue over the past few weeks. Server working fine, when suddenly I'm unable to log in via Web UI. Syslog shows "Mar 24 21:01:19 s5 pvedaemon[14546]: authentication failure; rhost=xxx user=root@pam msg=error with cfs lock 'authkey': got lock request timeout", plus a bunch of other errors surrounding it:
Mar 24 20:53:05 s5 pvesr[30638]: trying to acquire cfs lock 'file-replication_cfg' ...
Mar 24 20:53:06 s5 pvesr[30638]: trying to acquire cfs lock 'file-replication_cfg' ...
Mar 24 20:53:07 s5 pvesr[30638]: trying to acquire cfs lock 'file-replication_cfg' ...
Mar 24 20:53:08 s5 pvesr[30638]: trying to acquire cfs lock 'file-replication_cfg' ...
Mar 24 20:53:09 s5 pvesr[30638]: error with cfs lock 'file-replication_cfg': got lock request timeout
Mar 24 20:53:09 s5 systemd[1]: pvesr.service: Main process exited, code=exited, status=5/NOTINSTALLED
Mar 24 20:53:09 s5 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Mar 24 20:53:09 s5 systemd[1]: Failed to start Proxmox VE replication runner.
Mar 24 20:53:14 s5 pve-ha-lrm[2061]: unable to write lrm status file - unable to delete old temp file: Input/output error
Mar 24 20:53:19 s5 pve-ha-lrm[2061]: unable to write lrm status file - unable to delete old temp file: Input/output error
Mar 24 20:53:20 s5 pvestatd[1881]: authkey rotation error: error with cfs lock 'authkey': got lock request timeout

Can't figure out what I/O error it's receiving. Partitions are not full. A reboot fixes the issue, but is not sustainable. Any ideas are appreciated!

proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-5.3: 6.1-5
 
seems the cluster filesystem is not running correctly - ist this a cluster?
* check the journal for messages from 'pve-cluster' (cluster filesystem a.k.a pmxcfs) and 'corosync'

I hope this helps!
 
Thanks for the quick reply, Stoiko. Not a cluster unfortunately. Single node.

The first line in the log that has kicked off the issue both times is
Mar 24 05:40:24 s5 pmxcfs[1780]: [database] crit: commit transaction failed: database or disk is full#010

Which database or disk? No disks are full.
 
the pmxcfs is a fuse filesystem backed by a sqlite-db (stored in /var/lib/pve-cluster) - which is limited to 30MB in size - check the reference documentation:
https://pve.proxmox.com/pve-docs/chapter-pmxcfs.html

do you have stored large files in /etc/pve?

EDIT: fixed wrong link
 
Nope!

/dev/fuse 30M 24K 30M 1% /etc/pve

root@s5:/etc/pve# du
2.5K ./priv
5.0K ./nodes
1.0K ./.le
13K .
 
Turns out following this advice helped:

  • On every node do
    systemctl stop pve-cluster
    This may take a while
  • On every node do
    sudo rm -f /var/lib/pve-cluster/.pmxcfs.lockfile
  • On each node – one by one do
    systemctl start pve-cluster

Which is odd, considering I don't use a cluster? Any ideas as to why this may have happened.
 
Turns out following this advice helped:

  • On every node do
    systemctl stop pve-cluster
    This may take a while
  • On every node do
    sudo rm -f /var/lib/pve-cluster/.pmxcfs.lockfile
  • On each node – one by one do
    systemctl start pve-cluster

Which is odd, considering I don't use a cluster? Any ideas as to why this may have happened.

Thanks! This also worked for me, and also running a single node; no HA.

In my case I ran out of disk space, and had trouble connecting via the UI. Solved the disk space, but Proxmox was still stuck, and there was a message on syslog every few seconds on pve-ha-lrm[1201]: unable to write lrm status file - unable to open file.

Everything came back to normal after following the steps above.
 
This advice also worked for me, thanks. I suspect something like the following is happening:
  • backup "locks" /etc/pve
  • backup fills a disc
  • backup fails and deletes some interim files (the once-full disc not quite 100% full now)
  • backup dies but leaves /etc/pve "locked"
  • errors from several pve processes, result: cannot login using web UI
    • ... pve-ha-lrm[...]: unable to write lrm status file - unable to open file '/etc/pve/nodes/.../lrm_status.tmp.352706' - Input/output error
It seems intermittent because the backup sometimes works, or rarely runs (in my case, weekly).
 
I just had a similar issue for the second time within a few weeks. I restarted the node before I found this thread. The reboot fixed the issue.

The logs are in line with gedge's theory, since in my case, the disks were clearly not full. but the backup failed due to network issues, which are revealed indirectly by postfix:

Code:
Dec 21 18:40:31 mypvehost pvescheduler[13319]: INFO: Finished Backup of VM 107 (00:01:50)
Dec 21 18:40:31 mypvehost pvescheduler[13319]: INFO: Starting Backup of VM 108 (qemu)
Dec 21 18:40:32 mypvehost qmeventd[48642]: Starting cleanup for 107
Dec 21 18:40:32 mypvehost qmeventd[48642]: Finished cleanup for 107
Dec 21 18:43:09 mypvehost postfix/qmgr[2506]: 89A361F3D7: from=<user@domain.my>, size=13448, nrcpt=1 (queue active)
Dec 21 18:43:09 mypvehost postfix/qmgr[2506]: 411EB3F802: from=<user@domain.my>, size=8821, nrcpt=1 (queue active)
Dec 21 18:43:09 mypvehost postfix/qmgr[2506]: B9E8B1E395: from=<user@domain.my>, size=11371, nrcpt=1 (queue active)
Dec 21 18:43:09 mypvehost postfix/qmgr[2506]: B6C9E24275: from=<user@domain.my>, size=925, nrcpt=1 (queue active)
Dec 21 18:43:19 mypvehost postfix/smtp[49886]: 89A361F3D7: to=<user@domain.my>, relay=none, delay=302113, delays=302103/0.04/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service err>
Dec 21 18:43:19 mypvehost postfix/smtp[49887]: 411EB3F802: to=<user@domain.my>, relay=none, delay=78300, delays=78290/0.04/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error>
Dec 21 18:43:19 mypvehost postfix/smtp[49888]: B9E8B1E395: to=<user@domain.my>, relay=none, delay=306913, delays=306903/0.06/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service err>
Dec 21 18:43:19 mypvehost postfix/smtp[49889]: B6C9E24275: to=<user@domain.my>, orig_to=<root>, relay=none, delay=130698, delays=130688/0.08/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name ser>
[...]
Dec 21 19:03:10 mypvehost postfix/qmgr[2506]: 20D4E16ABF: from=<user@domain.my>, size=13591, nrcpt=1 (queue active)
Dec 21 19:03:10 mypvehost postfix/qmgr[2506]: 697831D819: from=<user@domain.my>, size=5081, nrcpt=1 (queue active)
Dec 21 19:03:10 mypvehost postfix/qmgr[2506]: 1CF8416745: from=<user@domain.my>, size=9378, nrcpt=1 (queue active)
Dec 21 19:03:10 mypvehost postfix/qmgr[2506]: A7986221C4: from=<user@domain.my>, size=3009, nrcpt=1 (queue active)
Dec 21 19:03:20 mypvehost postfix/smtp[58426]: 20D4E16ABF: to=<user@domain.my>, relay=none, delay=46511, delays=46501/0.03/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error>
Dec 21 19:03:20 mypvehost postfix/smtp[58427]: 697831D819: to=<user@domain.my>, relay=none, delay=312604, delays=312594/0.04/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service err>
Dec 21 19:03:20 mypvehost postfix/smtp[58428]: 1CF8416745: to=<user@domain.my>, relay=none, delay=50711, delays=50701/0.06/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error>
Dec 21 19:03:20 mypvehost postfix/smtp[58429]: A7986221C4: to=<user@domain.my>, relay=none, delay=143114, delays=143103/0.08/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service err>
Dec 21 19:08:10 mypvehost postfix/qmgr[2506]: 5F9184162E: from=<user@domain.my>, size=855, nrcpt=1 (queue active)
Dec 21 19:08:10 mypvehost postfix/error[60548]: 5F9184162E: to=<user@domain.my>, orig_to=<root>, relay=none, delay=403668, delays=403668/0.02/0/0.01, dsn=4.4.3, status=deferred (delivery temporarily suspended: Host>
[...]
Dec 21 19:33:10 mypvehost postfix/qmgr[2506]: 1FBCC143A5: from=<user@domain.my>, size=5012, nrcpt=1 (queue active)
Dec 21 19:33:10 mypvehost postfix/qmgr[2506]: E6DE317490: from=<user@domain.my>, size=15594, nrcpt=1 (queue active)
Dec 21 19:33:20 mypvehost postfix/smtp[71206]: 1FBCC143A5: to=<user@domain.my>, relay=none, delay=55801, delays=55791/0.03/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error>
Dec 21 19:33:20 mypvehost postfix/smtp[71207]: E6DE317490: to=<user@domain.my>, relay=none, delay=48601, delays=48591/0.03/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error>
Dec 21 19:41:46 mypvehost pmxcfs[2431]: [database] crit: commit transaction failed: database or disk is full#010
Dec 21 19:41:46 mypvehost pmxcfs[2431]: [database] crit: rollback transaction failed: cannot rollback - no transaction is active#010
Dec 21 19:41:46 mypvehost pvescheduler[13319]: unable to delete old temp file: Input/output error
Dec 21 19:41:46 mypvehost pvescheduler[13319]: ERROR: Backup of VM 108 failed - vma_queue_write: write error - Broken pipe
Dec 21 19:41:48 mypvehost pve-ha-lrm[2613]: unable to write lrm status file - unable to open file '/etc/pve/nodes/mypvehost/lrm_status.tmp.2613' - Input/output error
Dec 21 19:41:53 mypvehost pve-ha-lrm[2613]: unable to write lrm status file - unable to open file '/etc/pve/nodes/mypvehost/lrm_status.tmp.2613' - Input/output error
Dec 21 19:41:58 mypvehost pve-ha-lrm[2613]: unable to write lrm status file - unable to open file '/etc/pve/nodes/mypvehost/lrm_status.tmp.2613' - Input/output error
Dec 21 19:42:03 mypvehost pve-ha-lrm[2613]: unable to write lrm status file - unable to open file '/etc/pve/nodes/mypvehost/lrm_status.tmp.2613' - Input/output error

Next time I will try the instructions provided by guiambros. Thanks for that.
 
Last edited:
I had the problem again. The root cause was again a failed backup due to networking.
The following one-liner fixed the issue:

Code:
date ; systemctl stop pve-cluster &&  rm -f /var/lib/pve-cluster/.pmxcfs.lockfile && systemctl start pve-cluster ; date

It took 11 seconds.