I’m running a fairly active but small budget Proxmox cluster for circa one year now. Meaning depending on the current project workload it only consists of 3-4 nodes and uses ZFS due to its replication capabilities which allow nearly HA-like failover times (5 to 10 minutes in my case) with a budget far lower than a setup with real HA storage would require. And tbh this is great!
Obviously I’d like to use other ZFS advantages as well as much as possible. Next to replication this would mainly be snapshots. Doing so I used various scripts which use the Proxmox API or ZFS directly as well as eve4pve-autosnap and came across some Proxmox-induced problems, which I’d like to share here for other people in the same situation and the Proxmox team in case they are interested.
Problem
In a setup describe above there basically is only one ‘real’ problem regarding snapshots: Proxmox snapshots are not reliably stable. Meaning they can and will fail due to various reasons. In this case they always leave a locked VM with a messed up snapshot state behind.
This only happens with a chance of 0,05% to 0,1% when taking (or deleting) a snapshot which is a number that seems little. However, it is not… In my cluster I have about 30 VMs and since this is ZFS and snapshots are extremely cheap regarding every resource I make automated snapshots every hour of every VM. So there are over 5 000 snapshots a week. And when doing so with the Proxmox API this means there is nearly no week in which at least 1 or 2 VMs would hit a “bad snapshot”. This is bad since it means no further snapshots will be taken and, even worse, backups will be skipped as well due to the locked VM.
Regarding the reasons, why this happens, I have to speculate a little bit. The easiest to explain but also least frequent one I’ve seen is a “dataset busy” error. Since replication is enabled and the cluster is fairly active I can imagine this happen when a snapshot falls in a bad time with replications. However most of the time I saw errors from which I couldn’t get a good reason and errors during `delete` where more frequent than during `prepare`. I had the feeling that during the cluster backup the error chance was higher but that’s also just a feeling.
What could Proxmox do better
What others/we can do until Proxmox maybe improves stuff
Obviously I’d like to use other ZFS advantages as well as much as possible. Next to replication this would mainly be snapshots. Doing so I used various scripts which use the Proxmox API or ZFS directly as well as eve4pve-autosnap and came across some Proxmox-induced problems, which I’d like to share here for other people in the same situation and the Proxmox team in case they are interested.
Problem
In a setup describe above there basically is only one ‘real’ problem regarding snapshots: Proxmox snapshots are not reliably stable. Meaning they can and will fail due to various reasons. In this case they always leave a locked VM with a messed up snapshot state behind.
This only happens with a chance of 0,05% to 0,1% when taking (or deleting) a snapshot which is a number that seems little. However, it is not… In my cluster I have about 30 VMs and since this is ZFS and snapshots are extremely cheap regarding every resource I make automated snapshots every hour of every VM. So there are over 5 000 snapshots a week. And when doing so with the Proxmox API this means there is nearly no week in which at least 1 or 2 VMs would hit a “bad snapshot”. This is bad since it means no further snapshots will be taken and, even worse, backups will be skipped as well due to the locked VM.
Regarding the reasons, why this happens, I have to speculate a little bit. The easiest to explain but also least frequent one I’ve seen is a “dataset busy” error. Since replication is enabled and the cluster is fairly active I can imagine this happen when a snapshot falls in a bad time with replications. However most of the time I saw errors from which I couldn’t get a good reason and errors during `delete` where more frequent than during `prepare`. I had the feeling that during the cluster backup the error chance was higher but that’s also just a feeling.
What could Proxmox do better
- In case a snapshot creation or deletion fails the API should perform an automatic rollback. The manual process now is quite tedious: Check if the snapshot still has an actually corresponding ZFS snapshot and if so destroy it; delete the snapshot from the VM.conf file; unlock the VM.
- Reduce the actual problems which lead to bad snapshots.
- [Nice-to-have] Add an ‘exclude from snapshot’ feature like ‘exclude from backup’ and ‘exclude from replication’ since if a VM has a HDD on another storage for example in qcow2 you most defiantly don’t want it to automatically take snapshots every hour or so.
- [Nice-to-have] Add a built-in and maintained snapshot schedule.
- [Nice-to-have] Add an easy way (preferably per API) to boot up a VM clone from a snapshot so that we could automate stuff like backups from within a VM more easily.
What others/we can do until Proxmox maybe improves stuff
- Do not use the Proxmox API for automated snapshots (and obviously also don’t use programs like eve4pve-autosnap which also use the Proxmox API)!
Instead take automated snapshots directly via ZFS. In case of an error like “dataset busy” the only thing that happens is that one cycle is skipped and no further harm is done. Due to its nature ZFS snapshots are very consistent and even VMs with quite active DBs should be able to recover automatically with the DBMS crash recovery even if the VM is totally agnostic towards snapshot creation. I’m no expert here so I can’t say for sure but I used a few MongoDB and MySQL snapshots which were created in this manner and they always could recover without me doing anything special.
- My solution which I now use for all VMs (and have used for a selected few to compare it to Proxmox API approaches since some month) is a cronjob in a container which is able to `ssh root@proxmox-nodes` and which runs the following scripts:
cron-task.sh
Code:#!/bin/bash ssh root@10.0.1.1 'bash -s' < /home/user/snap-vms.sh ssh root@10.0.1.2 'bash -s' < /home/user/snap-vms.sh ssh root@10.0.1.3 'bash -s' < /home/user/snap-vms.sh # or/and whatever ips your cluster nodes have
snap-vms.sh
Code:#!/bin/bash declare -a normal=( "vm-100-disk-1" "vm-101-disk-1" "subvol-102-disk-1" # and whatever further disks you want to include ) declare -a short=( "vm-103-disk-1" # and whatever further disks you want to include ) # for normal 6 snapshots are keept for i in "${normal[@]}" do if zfs list -t all | grep -q 'rpool/data/'"$i"; then if zfs list -t all | grep -q 'rpool/data/'"$i"'@auto-6'; then zfs destroy rpool/data/"$i"@auto-6 fi zfs rename rpool/data/"$i"@auto-5 rpool/data/"$i"@auto-6 zfs rename rpool/data/"$i"@auto-4 rpool/data/"$i"@auto-5 zfs rename rpool/data/"$i"@auto-3 rpool/data/"$i"@auto-4 zfs rename rpool/data/"$i"@auto-2 rpool/data/"$i"@auto-3 zfs rename rpool/data/"$i"@auto-1 rpool/data/"$i"@auto-2 zfs snap rpool/data/"$i"@auto-1 fi done # for short 3 snapshots are kept for i in "${short[@]}" do if zfs list -t all | grep -q 'rpool/data/'"$i"; then if zfs list -t all | grep -q 'rpool/data/'"$i"'@auto-6'; then zfs destroy rpool/data/"$i"@auto-3 fi zfs rename rpool/data/"$i"@auto-2 rpool/data/"$i"@auto-3 zfs rename rpool/data/"$i"@auto-1 rpool/data/"$i"@auto-2 zfs snap rpool/data/"$i"@auto-1 fi done