[SOLVED] vm got stuck?

killmasta93

Renowned Member
Aug 13, 2017
980
60
93
31
Hi,
I was wondering if something has happened to them before? currently running proxmox 5.3-8, it was doing a backup vzdump last night and it stopped on the last vm 106
any ideas?

Thank you

this was the error
Code:
106: 2019-07-17 02:28:52 ERROR: vma_queue_write: write error - Broken pipe
106: 2019-07-17 02:28:52 INFO: aborting backup job
106: 2019-07-17 02:28:54 INFO: unable to open file '/etc/pve/nodes/prometheus/qemu-server/106.conf.tmp.14455' - Input/output error
106: 2019-07-17 02:29:17 ERROR: Backup of VM 106 failed - vma_queue_write: write error - Broken pipe

Code:
Jul 17 06:25:07 prometheus pvesr[26885]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:25:08 prometheus pvesr[26885]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:25:09 prometheus pvesr[26885]: error with cfs lock 'file-replication_cfg': got lock request timeout
Jul 17 06:25:09 prometheus systemd[1]: pvesr.service: Main process exited, code=exited, status=5/NOTINSTALLED
Jul 17 06:25:09 prometheus systemd[1]: Failed to start Proxmox VE replication runner.
Jul 17 06:25:09 prometheus systemd[1]: pvesr.service: Unit entered failed state.
Jul 17 06:25:09 prometheus systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jul 17 06:25:10 prometheus spiceproxy[8577]: worker exit
Jul 17 06:25:10 prometheus spiceproxy[3119]: worker 8577 finished
Jul 17 06:25:10 prometheus pveproxy[24370]: worker exit
Jul 17 06:25:10 prometheus pveproxy[8812]: worker exit
Jul 17 06:25:10 prometheus pveproxy[8813]: worker exit
Jul 17 06:25:10 prometheus pveproxy[3088]: worker 24370 finished
Jul 17 06:25:10 prometheus pveproxy[3088]: worker 8812 finished
Jul 17 06:25:10 prometheus pveproxy[3088]: worker 8813 finished
Jul 17 06:25:12 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:17 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:22 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:27 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:32 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:37 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:42 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:47 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:52 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:57 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:26:00 prometheus systemd[1]: Starting Proxmox VE replication runner...
Jul 17 06:26:00 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:01 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:02 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:26:02 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:03 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:04 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:05 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:06 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:07 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
 
The cluster file system (/etc/pve/) was not available while the backup was running. If its a cluster, then check your network.
 
Does the server have enough resources during the backup? Are those messages showing up in the journal/syslog more frequently too?
 
thanks for the reply, correct there is enough resources around 30gigs of ram only happened when the backup got stuck haven't done a backup since that error should i try it one last time?
 
thanks for the reply, correct there is enough resources around 30gigs of ram only happened when the backup got stuck
I don't understand. How much RAM is available when the backups are running?

haven't done a backup since that error should i try it one last time?
Up to you. I don't believe you want to keep it in that state. ;)
 
i will postback im going to run it this week just incase it gets stuck i wont get hammered the next day
 
so reran it and did not backup change the USB disk and worked i think it had to do something with the USB disk very odd but thank you