PBS corrupts MS Exchange database

nejcsuhadolc

Active Member
Apr 17, 2019
10
0
41
54
I believe a bug in the PBS / Proxmox system can corrupt the Exchange database.

Here are the steps to reproduce it:
Environment:

pve-manager/8.0.4/d258a813cfa6b390
VM disk are located on raidz1 zfs pool, disks are Samsung Datacenter 860 DCT series. Pool stays healthy all the time.
Virtio driver 0.1.240.
The server is Supermicro dual proc, I can dig the exact info if somebody finds this relevant.
PBS is connected over IPSec, a slower connection, about 100 Mbps. The reason for this is that we ran out of space at the location. I do not think the issue itself is related to this slower link since this has happened before on a local PBS server as well.

To reproduce the bug, you have to start the backup, wait until it starts sending the data, and then stop the backup.

Here is the backup log:
INFO: starting new backup job: vzdump 106 107 108 109 110 111 112 --quiet 1 --mode snapshot --mailto xxxxx --mailnotification always --notes-template '{{guestname}}' --storage sgn02
INFO: Starting Backup of VM 106 (qemu)
INFO: Backup started at 2024-02-01 02:30:04
INFO: status = running
INFO: VM Name: ex19-02
INFO: include disk 'virtio0' 'dctpool:vm-106-disk-2' 900G
INFO: include disk 'virtio1' 'local-zfs:vm-106-disk-0' 500G
INFO: include disk 'virtio2' 'local-zfs:vm-106-disk-1' 500G
INFO: include disk 'virtio3' 'local-zfs:vm-106-disk-2' 500G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/106/2024-02-01T01:30:04Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 106 qmp command 'guest-fsfreeze-thaw' failed - got timeout
INFO: started backup task '70085ea0-5066-47ab-88ce-bc25b6c56429'
INFO: resuming VM again
INFO: virtio0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: virtio1: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: virtio2: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: virtio3: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: 0% (916.0 MiB of 2.3 TiB) in 3s, read: 305.3 MiB/s, write: 268.0 MiB/s
INFO: 1% (24.0 GiB of 2.3 TiB) in 2h 38m 12s, read: 2.5 MiB/s, write: 2.5 MiB/s
INFO: 2% (48.0 GiB of 2.3 TiB) in 5h 40m 25s, read: 2.2 MiB/s, write: 2.2 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 106 failed - interrupted by signal
INFO: Failed at 2024-02-01 10:12:54
ERROR: Backup job failed - interrupted by signal
TASK ERROR: interrupted by signal

VM continues to run, but some disk corruption is also detected by windows. This server has 3 exchange databases, 2 of them got corrupted instantly.

We are still struggling to bring them back.

Any idea how to prevent this in the future? Otherwise, we'll have to switch to another virtualization platform.

Jenrej
 
I would recommend switching the discs to scsi with discard.

But more relevant, do you have iothread enabled? If so, deactivate it, stop the VM and start it again (restart or reboot is not enough) and trigger the backup. There is currently a bug with this flag and this could possibly be your problem.

But it can also be due to the long duration of the backups. All changes are first sent via the PBS during the backup so that it can make a consistent backup. It also seems to me that your line is clearly too small.
 
Check your guest agent.
Hi, can you please be more specific?

I'm also wondering why should the changes to the disc be written first to the backup. Is it possible to do a snapshot first, back up the snapshot, and then delete it? This would probably be less prone to such errors?
 
your qemu guest agent is not working inside your VM, so I suggest you fix and try again.
 
>PBS is connected over IPSec, a slower connection, about 100 Mbps

>INFO: 0% (916.0 MiB of 2.3 TiB) in 3s, read: 305.3 MiB/s, write: 268.0 MiB/s
>INFO: 1% (24.0 GiB of 2.3 TiB) in 2h 38m 12s, read: 2.5 MiB/s, write: 2.5 MiB/s
>INFO: 2% (48.0 GiB of 2.3 TiB) in 5h 40m 25s, read: 2.2 MiB/s, write: 2.2 MiB/s

TB sized VM backup via 100Mbps IPSEC?

you know that pbs link/backup speed limits the VM write speed , as we do not have backup fleecing yet?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!