how to stop a replication job

drjaymz@

Member
Jan 19, 2022
124
5
23
102
I have a replication job that has been running for 20 hours and is blocking all the other replication jobs.
Due to a change I need to abandon the running job. It should be using the migration network but wasn't configured correctly when started.

Which is the best way to do this?
 
I realised that the node it was replicating to had not active VM's so I just rebooted it and that broke the replication for me, and now its restarted using the correct network and is working beautifully. But I'd still like to know the graceful way to do that.
 
Hi,
if the replication is really hanging and doesn't hit a timeout, I think the only way is to send an interrupt/terminate/kill signal to the zfs send command or whichever command the job was hanging on. Did you look up the command it was hanging on by chance? There can be very long-running replications, so we can't really use a fixed timeout for the send operation, there would rather need to be some kind of progress monitoring and abort if there's no progress at all for X minutes.
 
Hi,
if the replication is really hanging and doesn't hit a timeout, I think the only way is to send an interrupt/terminate/kill signal to the zfs send command or whichever command the job was hanging on. Did you look up the command it was hanging on by chance? There can be very long-running replications, so we can't really use a fixed timeout for the send operation, there would rather need to be some kind of progress monitoring and abort if there's no progress at all for X minutes.
I couldn't find out what the command was that I was looking for. I suppose you're suggesting grepping for the command line "zfs send". When it comes to logs and finding out what is happening, generally I find what I'm looking for no problem, replication seems to be a bit harder to monitor. I'm probably not looking in the right place, I checked documentation and that didn't really help.
 
I suppose you're suggesting grepping for the command line "zfs send".
Yes, or rather for pvescheduler and what it spawned, e.g. using ps faxl.

When it comes to logs and finding out what is happening, generally I find what I'm looking for no problem, replication seems to be a bit harder to monitor. I'm probably not looking in the right place, I checked documentation and that didn't really help.
It unfortunately is. Replication doesn't run as a task like other operations, and only the latest log is saved at the moment.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!