[SOLVED] vm got stuck?

killmasta93

Renowned Member
Aug 13, 2017
959
56
68
30
Hi,
I was wondering if something has happened to them before? currently running proxmox 5.3-8, it was doing a backup vzdump last night and it stopped on the last vm 106
any ideas?

Thank you

this was the error
Code:
106: 2019-07-17 02:28:52 ERROR: vma_queue_write: write error - Broken pipe
106: 2019-07-17 02:28:52 INFO: aborting backup job
106: 2019-07-17 02:28:54 INFO: unable to open file '/etc/pve/nodes/prometheus/qemu-server/106.conf.tmp.14455' - Input/output error
106: 2019-07-17 02:29:17 ERROR: Backup of VM 106 failed - vma_queue_write: write error - Broken pipe

Code:
Jul 17 06:25:07 prometheus pvesr[26885]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:25:08 prometheus pvesr[26885]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:25:09 prometheus pvesr[26885]: error with cfs lock 'file-replication_cfg': got lock request timeout
Jul 17 06:25:09 prometheus systemd[1]: pvesr.service: Main process exited, code=exited, status=5/NOTINSTALLED
Jul 17 06:25:09 prometheus systemd[1]: Failed to start Proxmox VE replication runner.
Jul 17 06:25:09 prometheus systemd[1]: pvesr.service: Unit entered failed state.
Jul 17 06:25:09 prometheus systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jul 17 06:25:10 prometheus spiceproxy[8577]: worker exit
Jul 17 06:25:10 prometheus spiceproxy[3119]: worker 8577 finished
Jul 17 06:25:10 prometheus pveproxy[24370]: worker exit
Jul 17 06:25:10 prometheus pveproxy[8812]: worker exit
Jul 17 06:25:10 prometheus pveproxy[8813]: worker exit
Jul 17 06:25:10 prometheus pveproxy[3088]: worker 24370 finished
Jul 17 06:25:10 prometheus pveproxy[3088]: worker 8812 finished
Jul 17 06:25:10 prometheus pveproxy[3088]: worker 8813 finished
Jul 17 06:25:12 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:17 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:22 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:27 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:32 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:37 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:42 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:47 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:52 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:25:57 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:26:00 prometheus systemd[1]: Starting Proxmox VE replication runner...
Jul 17 06:26:00 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:01 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:02 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
Jul 17 06:26:02 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:03 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:04 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:05 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:06 prometheus pvesr[28995]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul 17 06:26:07 prometheus pve-ha-lrm[3110]: unable to write lrm status file - unable to delete old temp file: Input/output error
 
The cluster file system (/etc/pve/) was not available while the backup was running. If its a cluster, then check your network.
 
Does the server have enough resources during the backup? Are those messages showing up in the journal/syslog more frequently too?
 
thanks for the reply, correct there is enough resources around 30gigs of ram only happened when the backup got stuck haven't done a backup since that error should i try it one last time?
 
thanks for the reply, correct there is enough resources around 30gigs of ram only happened when the backup got stuck
I don't understand. How much RAM is available when the backups are running?

haven't done a backup since that error should i try it one last time?
Up to you. I don't believe you want to keep it in that state. ;)
 
i will postback im going to run it this week just incase it gets stuck i wont get hammered the next day
 
so reran it and did not backup change the USB disk and worked i think it had to do something with the USB disk very odd but thank you
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!