PBS messing up my SQL server i/o

thomsany · Dec 29, 2025

Hi guys!

I have a 5 cluster setting with NVME drives, all working great.
However, everytime I try to backup my SQL servers with PBS to a separate external server with SAS drives, the backup processes messes up inside the server and causes disconnections, io SQL Server errors (833), etc. So everytime I run a backup on those servers I have to pray PBS behaves and I can get the backup safely without damaging the server or SQL DBs.
Last week I had an incident where PBS started running at midnight on a backup and at the same time SQL had a backup job inside the VM. Those two jobs made the SQL Data folder colapse, cause a crash on the sql backup and damaged the live database so we had to download from backup to recover. The backup took 6 hours to recover 2 files (1.5TB) for both files in the same network. I also attempted a full server restore (about 2.9TB) and it took 9 hours to restore from SAS repository to NVME servers.
There is clearly something wrong with the way PBS works or I am just looking at real numbers here?
Also the way PBS manipulates the server drives while doing a backup is insane, since it reads.write realtime competing with SQL server, etc.

For now I halted all use of PBS on SQL critical VMs until I find a solution since I even tried throttling the bandwidth which made no difference and caused i/o storms on the SQL servers. So basically PBS doesn't work for these type of servers? or do I have something that is not configured properly?
I need to be able to backup all my servers without them freezing, causing disk errors etc while the backup is in process but maybe that is just the way PBS works and I need a different solution?
I even disabled the fs freeze because it literally freezed the server so people could not work.

Any help/experience is greatly appreciated.

Teo

janus57 · Dec 29, 2025

Hi,

For better help, you should give more details about your configuration (pve/vm[os/appliance]/pbs).

Best regards,

thomsany · Dec 29, 2025

Hi, you are right.
I have 5 servers with NVME on a cluster. All of them are using CEF disk configuration and performance is perfect.
Here are the hardware specs of each Proxmox server (5 of them):
2 x Intel Xeon Gold 6554S, 1.5TB RAM, 2x960GB SSD NVME OS drive, 24x3.84TB SSD NME
PVE: 8.4.1

PBS specs:
Amd Epyc 4344P, 32GB RAM, 2x960GB SSD NVME OS drive, 8x22TB HDD SAS
PBS: Version 3

The latency comes everytime a backup is run on big VMs (more than 1TB total) and you see lots of SQL 833 application errors (IO errors).

Thanks much for looking into this.
Teo

MarkusKo · Dec 29, 2025

Your PVE is a bit outdated, you should consider updating to the latest PVE8.x
If not already done, try to enable backup fleecing.
Check the virtio driver versions in your VM's, virtio-0.1.285 has some issues, downgrade to virtio-0.1.271.

Don't think this has anything to do with PBS directly. PBS is just a backup storage, from PVE point of view PBS or vzdump backups are the same.
PBS just deduplicates the data it receives from PVE.

https://forum.proxmox.com/threads/r...device-system-unresponsive.139160/post-819420

janus57 · Dec 29, 2025

Hi,

What is the OS of your VM and your complete application name and version ? (Not clear in your post).

And like @MarkusKo I suggest using "fleecing" because your PBS has SAS drive.
When PVE backup to PBS, it sends the data directly to the PBS server, so the more "slow" your PBS is, the more slowness your VM will suffer; and with big VMs it can be bad.

Best regards,

thomsany · Dec 29, 2025

Hi Markus,

But PVE version installed is 8.4.1, is it that outdated?
I will check into the virtio drivers. On that server causing the i/o storm I have 1.271 version installed so I am not sure if that is the issue.
Don't know what else could be causing it but it's driving me crazy.

What else I could send to see if we can find the root of the issue?
Teo

thomsany · Dec 29, 2025

janus57 said:
Hi,

What is the OS of your VM and your complete application name and version ? (Not clear in your post).

And like @MarkusKo I suggest using "fleecing" because your PBS has SAS drive.
When PVE backup to PBS, it sends the data directly to the PBS server, so the more "slow" your PBS is, the more slowness your VM will suffer; and with big VMs it can be bad.

OS is Windows 2022 Standard, but I have another VM with the same problem with Windows 2019 Standard.
Both of them running SQL 2019 Enterprise.
I will check on fleecing if I can enable it.

Teo

janus57 · Dec 29, 2025

Hi,

thomsany said:
But PVE version installed is 8.4.1, is it that outdated?

Last version (you can check with pveversion -v) :

Code:

proxmox-ve: 8.4.0 (running kernel: 6.8.12-17-pve)
pve-manager: 8.4.14 (running version: 8.4.14/b502d23c55afcba1)
[…]
proxmox-backup-client: 3.4.7-1
proxmox-backup-file-restore: 3.4.7-1

thomsany said:
OS is Windows 2022 Standard, but I have another VM with the same problem with Windows 2019 Standard.

Also maybe related (but not sure) : https://pve.proxmox.com/wiki/VM_Backup_Consistency

Best regards,

danielb · Dec 29, 2025

PBS backups use a copy-before-write method to ensure coherency at a point in time. If PBS is too slow (which can be because of slower drives, slow connection, or high latency), it can impact the workload (as the VM must wait for the data to be copied on the remote PBS before it can write). To help with this, you can use a local "fleecing" storage (in the advanced tab of a backup job in PVE). With fleecing enabled, write happening during a backup can be copied to the local, fast storage, without waiting for it to be written on the slower PBS. See https://pve.proxmox.com/pve-docs/chapter-vzdump.html#_vm_backup_fleecing for more details

MarkusKo · Dec 29, 2025

Not sure if backup fleecing on ceph will help in that case. If i remember correctly some people on this forum added some ssd's and used a local file system (zfs / ext4 / lvm) for their sql servers to circumvent this issue.

danielb · Dec 29, 2025

It depends on how slow is PBS, and which part (HDD, latency, bandwidth ?) is the bottleneck. Even on Ceph it might make sense to enable fleecing (I use it)

_gabriel · Dec 29, 2025

thomsany said:
I even disabled the fs freeze because it literally freezed the server

don't forget to re-enable it,
because without Windows snapshot, copied data isn't consistent ( Windows may not boot because registry corrupt, sql doesn't start because data corrupt)

Search

Search

PBS messing up my SQL server i/o

thomsany

New Member

janus57

Renowned Member

thomsany

New Member

MarkusKo

Renowned Member

janus57

Renowned Member

thomsany

New Member

thomsany

New Member

janus57

Renowned Member

danielb

Renowned Member

MarkusKo

Renowned Member

danielb

Renowned Member

_gabriel

Famous Member

We value your privacy