Strange issue with backups this morning.

-RYknow

Well-Known Member
Apr 2, 2017
31
0
46
43
Hi All,

I have a cluster with two nodes currently. Yesterday I ran dist-upgrade on both and rebooted as it had been awhile. They both came back up, VM's started and everything looked good. I have various backups scheduled to kick off in the early hours. This morning I got up, and was just doing an rsync from my nas to secondary backup server. It wasn't until then that I noticed one of my VM's wasn't there.

Node1: a bunch of VM's
Node2: a single pfsense VM

The VM on node2 is the one that hadn't copied over. When I log into the web gui, there isn't any errors or anything. When I choose to run the backup manually, I get an error that reads;

"Some errors have been encountered:
---
proxmox: Node is offline"

The only option at this error is to click OK, and when I do, the backup starts and completes without issue. I just would like the backup to run automatically like it has been since I set things up over a year ago.

Thanks,
-RYknow
 
Take a look in the task-log or in the "Backup" tab of your pfsense VM - did the Backup for *both* nodes really complete?

The error sounds like your pfSense node is not reachable or otherwise down. What happens if you backup the VM directly? (e.g. in the Web GUI select the VM, then Backup, instead of Datacenter->Backup)
 
Take a look in the task-log or in the "Backup" tab of your pfsense VM - did the Backup for *both* nodes really complete?

In the task-log at the bottom of the page, it acts as if there is no job scheduled to run. I tried it again last night. I set the backup to run at a specified time and I was there to watch it. It gives zero errors. It doesn't even try to run the backup. From the webui I can log into either host. Both host shows both nodes are up and running... all VM's up and running. I can ping the hosts, and all VM's.

The error sounds like your pfSense node is not reachable or otherwise down. What happens if you backup the VM directly? (e.g. in the Web GUI select the VM, then Backup, instead of Datacenter->Backup)

I can try this backup method when I get home and report back. If I use the Datacenter->Backup process, I get the error, but if I just click "OK" out of the error message, the backup starts and appears to complete. I will try doing it via selecting the VM and then backup and see if I get the same error message.

I will also try to post screenshots and some log info tonight as well. As I said before, Each host appears to see the other. Running the "pvecm status" command, it gives a yes under quorate, and lists both nodes with the appropriate IP's for each.

-RYknow
 
I ran the backup as you suggested from the VM tab specifically, and it did the backup with no error message. Not sure why doing it from the DataCenter portion wouldn't work? I have three other backups scheduled to run, all on host1. Those backups work fine. The backup for the single VM on host2 however, is the issue.

Here is a couple shots showing things are quorate at least.

Host1
Code:
Quorum information
------------------
Date:             Tue Nov 19 20:09:49 2019
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.8ee48
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1
Flags:            2Node Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.10.0.4 (local)
0x00000002          1 10.10.0.40

Host2
Code:
Quorum information
------------------
Date:             Tue Nov 19 14:55:52 2019
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.8ee48
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1
Flags:            2Node Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.10.0.4
0x00000002          1 10.10.0.40 (local)

If there is any other useful info I can get for you, let me know.
-RYknow

EDIT: Now that I'm posting this, I see the time is way off on my second host. I'm thinking that could be the issue?

EDIT 2: I've changed the timezone on host two to match host one (which was the correct timezone), but I'm still getting the same error. I didn't reboot either server, and maybe I would need to after changing the time?
 
Last edited:
Also, this is a picture of the message I get when I try to run the backup from the DataCenter drop down.

Proxmox error.jpg
 
If you log in to your host (the one where backup fails) and issue cat /etc/pve/vzdump.cron you should see a 'crontab' entry for your scheduled backup. What happens if you run the 'vzdump' command for that yourself?

Should look something like this, as an example:
Code:
vzdump <vmids> --compress lzo --mode snapshot --mailnotification failure --node test --quiet 1 --storage local
 
Sounds good, I'll give this a shot tonight and report back.

Thanks,
-RYknow
 
What happens if you run the 'vzdump' command for that yourself?

If I run the command directly on the host, it completes the backup without issue. So the command that is in the cron file works when run manually, however, it doesn't work when cron tries to run it.
 
Are there any log entries created when run via cron? Maybe something in /var/mail/* ?

Also, is cron even enabled on your host? systemctl status cron.service - since it works fine when run directly, and you said the task log doesn't show anything at all, it seems the job isn't even executed on your pfSense node...
 
Are there any log entries created when run via cron? Maybe something in /var/mail/* ?

Also, is cron even enabled on your host? systemctl status cron.service - since it works fine when run directly, and you said the task log doesn't show anything at all, it seems the job isn't even executed on your pfSense node...

That's a good question on the cron service. I will check. I know I built this cluster about 8 months ago and backups have been running fine since day one. Just something seemed to have changed when I updated this passed weekend. I'd love for it to be as simple as the cron service just having been disabled for one reason or another.

-RYknow
 
In a strange turn of events.... I had a backup scheduled yesterday at 1pm (just as a test). I was out and about with the family yesterday and forgot to check. But looking at it this morning, the backup ran as it was supposed to. Doesn't make any sense why all of a sudden it's working. I've not changed anything. I'm not convinced this issue is resolved yet, but I know it worked fine yesterday.

Thanks for all your help. I will report back here in a week if the regular backups don't work as intended.

-RYKnow
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!