pve-zsync bug: spawns endless "cron" processes

QLP24

Member
Oct 7, 2016
12
1
21
124
I configured pve-zsync yesterday to sync a few datasets to a remote destination. This morning there just just under 300 cron processes running.

It seems pve-zsync dosen't exit but waits for other processes to complete. This is probably unwanted behavior. Attached screenshots.

Two possible (easy) fixes I can think of:

1) move pve-zsync from cron to systemd timers
2) have pve-zsync exit instead of waiting

Probably best to implement options 1 and 2.
 

Attachments

  • pve-zsync-1.jpg
    pve-zsync-1.jpg
    554.8 KB · Views: 12
  • pve-zsync-2.jpg
    pve-zsync-2.jpg
    72.6 KB · Views: 11
Hi,
I don't think it's supposed to wait, the hangs may come from a bug. What is the output of 'pve-zsync list' and 'pve-zsync status'?
 
While its syncing the last dataset (16TB), I commented out all the cronjobs so it won't spawn a million processes. Here's the output from the commands:

Code:
root@hostname:~# cat /etc/cron.d/pve-zsync
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

#*/15 * * * * root pve-zsync sync --source datapool/user12     --dest backup.hostname.fqdn:backup-tank --name datapool-user12     --maxsnap 2 --method ssh --source-user root --dest-user root
#*/15 * * * * root pve-zsync sync --source datapool/abcd       --dest backup.hostname.fqdn:backup-tank --name datapool-abcd       --maxsnap 2 --method ssh --source-user root --dest-user root
#*/15 * * * * root pve-zsync sync --source datapool/data       --dest backup.hostname.fqdn:backup-tank --name datapool-data       --maxsnap 2 --method ssh --source-user root --dest-user root
#*/15 * * * * root pve-zsync sync --source datapool/pvestorage --dest backup.hostname.fqdn:backup-tank --name datapool-pvestorage --maxsnap 2 --method ssh --source-user root --dest-user root


root@hostname:~# pve-zsync list
SOURCE                   NAME                     STATE     LAST SYNC           TYPE  CON
datapool/abcd            datapool-abcd            ok        2019-09-16_22:57:47 undef ssh
datapool/data            datapool-data            syncing   0                   undef ssh
datapool/user12          datapool-user12          ok        2019-09-17_13:30:04 undef ssh
datapool/pvestorage      datapool-pvestorage      ok        2019-09-16_22:58:46 undef ssh


root@hostname:~# pve-zsync status
SOURCE                   NAME                     STATUS
datapool/abcd            datapool-abcd            ok
datapool/data            datapool-data            syncing
datapool/user12          datapool-user12          ok
datapool/pvestorage      datapool-pvestorage      ok
 
So the problem is that 'pve-zsync' currently has no job-specific locking mechanism. It will wait for all other instances of itself to finish.
It's probably too late for this time, but as a workaround one can do an initial sync first and add the job only after the initial sync finished.
 
This is easily fixed by moving the sync job to a systemd timer. It will not start a second instance if one is already running.

Quick example:

/etc/systemd/system/pve-zsync.timer
Code:
[Unit]
Description=Run pve-zsync every 15 minutes

[Timer]
OnCalendar=*:0/15

[Install]
WantedBy=timers.target

/etc/systemd/system/pve-zsync.service
Code:
[Unit]
Description=pve-zsync

[Service]
Type=oneshot
ExecStart=/usr/sbin/pve-zsyncsync --source datapool/user12 --dest backup.hostname.fqdn:backup-tank --name datapool-user12 --maxsnap 2 --method ssh --source-user root --dest-user root
ExecStart=/usr/sbin/pve-zsyncsync --source datapool/abcd --dest backup.hostname.fqdn:backup-tank --name datapool-abcd --maxsnap 2 --method ssh --source-user root --dest-user root
ExecStart=/usr/sbin/pve-zsyncsync --source datapool/data --dest backup.hostname.fqdn:backup-tank --name datapool-data --maxsnap 2 --method ssh --source-user root --dest-user root
ExecStart=/usr/sbin/pve-zsyncsync --source datapool/pvestorage --dest backup.hostname.fqdn:backup-tank --name datapool-pvestorage --maxsnap 2 --method ssh --source-user root --dest-user root

This may not be the best method though. There's lots of options to play with. A few that come to mind are Before=, After=, Conflicts=, using a Target with multiple services, instanced units (this would be really cool, but would have to use config files for each job)
 
Thanks for the suggestion. I'll look into that and see what we can do.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!