[TUTORIAL] [Backup] Cluster config (pmxcfs) - /etc/pve

esi_y

Active Member
Nov 29, 2023
1,593
228
43
github.com
Backup


A no-nonsense way to safely backup your /etc/pve files (pmxcfs [1]) is actually very simple:

Bash:
sqlite3 /var/lib/pve-cluster/config.db .dump > ~/config.dump.$(date --utc +%Z%Y%m%d%H%M%S).sql

This is safe to execute on a running node and is only necessary on any single node of the cluster, the results (at specific point in time) will be exactly the same.

Obviously, it makes more sense to save this somewhere else than the home directory ~, especially if you have dependable shared storage off the cluster. Ideally, you want a systemd timer, cron job or a hook to your other favourite backup method launching this.



Recovery


You will ideally never need to recover from this backup. In case of single node's corrupt config database, you are best off to copy over /var/lib/pve-cluster/config.db (while inactive) from a healthy node and let the implantee catch up with the cluster.

However, failing everything else, you will want to stop cluster service, put aside the (possibly) corrupt database and get the last good state back:

Bash:
systemctl stop pve-cluster
killall pmxcfs
mv /var/lib/pve-cluster/config.db{,.corrupt}
sqlite3 /var/lib/pve-cluster/config.db < ~/config.dump.<timestamp>.sql
systemctl start pve-cluster

NOTE: Any leftover WAL will be ignored.



[1] https://pve.proxmox.com/wiki/Proxmox_Cluster_File_System_(pmxcfs)
 
Last edited:
Additional notes on SQLite CLI


The .dump command [1] reads the database as if with a SELECT statement within a single transaction. It will block concurrent writes, but once it finishes, you have a "snapshot". The result is a perfectly valid SQL set of commands to recreate your database.

There's an alternative .save command (equivalent to .backup), it would produce a valid copy of the actual .db file, and while it is non-blocking copying the base page by page, if they get dirty in the process, the process needs to start over. You could receive Error: database is locked failure on the attempt. If you insist on this method, you may need to append .timeout <milliseconds> to get more luck with it.

Another option yet would be to use VACUUM command with an INTO clause [2], but it does not fsync the result on its own!

If you already have a corrupt .db file at hand (and nothing better), you may try your luck with .recover [3].

[1] https://www.sqlite.org/cli.html#converting_an_entire_database_to_a_text_file
[2] https://www.sqlite.org/lang_vacuum.html
[3] https://www.sqlite.org/cli.html#recover_data_from_a_corrupted_database
 
Last edited:
  • Like
Reactions: waltar
NOTE: I will likely expand this stub in the future, just wanted to have it floating here for easier reference.

NB Not sure why this is not suggested in the official docs since a while.

ALSO! Any feedback welcome, especially negative (as it makes the resulting piece of advice better)!
 
Last edited:
  • Like
Reactions: waltar
NB Not sure why this is not suggested in the official docs since a while.
I agree. All host/node backup/restore isn't adequately covered, both programmatically & documented.

Anyway kudos for your tutorial.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!