Fleecing storage when taking backups. My nightmare with backup tasks and downtime.

David123

New Member
Jul 6, 2024
10
0
1
The past few weeks for me have been horrible. Each time the backup schedule would run the vm would freeze. After forcible stopping and starting the vm the /etc/shadow, /etc/passwd files were sometimes empty afterwards resulting in caous. It happened repeatedly on multiple proxmox servers and vm so many sleepless nights in recreating for example the mysql, ssh and all other users. It doesn't help that in my setup I have 300-400 users (cPanel).

I decided to investigate this further. First I disabled the qemu-guest-agent freese. When I took backup while I was sitting next to it, I noticed iowait inside of vm went up to 40. SSH was strange, and only showed bash 5.1 (not the usual hostname) and most services were unresponsive. Running something like: dnf install xyz was impossible. Visiting curl localhost was also unresponsive.

As soon as I stopped the backup task from the proxmox side the issue went away.

I experimented with all options including iowait and iotread limit.

Finally I found the fleecing option under the "Advanced tab"

It solved the issue instantly. Now iowait is back at 0.0 wa, and services are responsive. It also seems to fix the issue I was having with /etc/passwd, /etc/shadow files emptying (OMG WHAT A NIGHTMARE).

I'm very happy about finding this option, however how much storage do I need for this?
On each server I only have 1x VM with a 3.5 TB partition. The total disk size of the proxmox server is 3.6 TB (100 GB left over)



Is it enough with 1x 1 TB SSD seperate disk or do I need more? Would it make a difference if it was nvme vs ssd disk? My local storage is nvme.
The backup is writing at 100 MiB/s.
Code:
INFO:  17% (588.9 GiB of 3.4 TiB) in 43m 20s, read: 240.2 MiB/s, write: 106.4 MiB/s
INFO:  18% (623.5 GiB of 3.4 TiB) in 48m 42s, read: 110.2 MiB/s, write: 94.7 MiB/s
INFO:  19% (658.7 GiB of 3.4 TiB) in 50m 29s, read: 337.0 MiB/s, write: 111.6 MiB/s
INFO:  20% (692.8 GiB of 3.4 TiB) in 55m 36s, read: 113.6 MiB/s, write: 98.7 MiB/s

Does it make more sense to take backup every 4 hours rather than daily with fleecing on?
 
Last edited:
fleecing is like a cache between your vm and the backup storage. so if the backup storage is fast, the fleecing can be 0 used in size. but if the backup storage is really slow, at worse, the fleecing volume can have the same size than the vm backuped disk. (for each vm disk, a thin provisionned fleecing of same size is created when the backup begin). So, a small ssd should be enough.
 
Last edited:
  • Like
Reactions: _gabriel