Hi guys!
I have a 5 cluster setting with NVME drives, all working great.
However, everytime I try to backup my SQL servers with PBS to a separate external server with SAS drives, the backup processes messes up inside the server and causes disconnections, io SQL Server errors (833), etc. So everytime I run a backup on those servers I have to pray PBS behaves and I can get the backup safely without damaging the server or SQL DBs.
Last week I had an incident where PBS started running at midnight on a backup and at the same time SQL had a backup job inside the VM. Those two jobs made the SQL Data folder colapse, cause a crash on the sql backup and damaged the live database so we had to download from backup to recover. The backup took 6 hours to recover 2 files (1.5TB) for both files in the same network. I also attempted a full server restore (about 2.9TB) and it took 9 hours to restore from SAS repository to NVME servers.
There is clearly something wrong with the way PBS works or I am just looking at real numbers here?
Also the way PBS manipulates the server drives while doing a backup is insane, since it reads.write realtime competing with SQL server, etc.
For now I halted all use of PBS on SQL critical VMs until I find a solution since I even tried throttling the bandwidth which made no difference and caused i/o storms on the SQL servers. So basically PBS doesn't work for these type of servers? or do I have something that is not configured properly?
I need to be able to backup all my servers without them freezing, causing disk errors etc while the backup is in process but maybe that is just the way PBS works and I need a different solution?
I even disabled the fs freeze because it literally freezed the server so people could not work.
Any help/experience is greatly appreciated.
Teo
I have a 5 cluster setting with NVME drives, all working great.
However, everytime I try to backup my SQL servers with PBS to a separate external server with SAS drives, the backup processes messes up inside the server and causes disconnections, io SQL Server errors (833), etc. So everytime I run a backup on those servers I have to pray PBS behaves and I can get the backup safely without damaging the server or SQL DBs.
Last week I had an incident where PBS started running at midnight on a backup and at the same time SQL had a backup job inside the VM. Those two jobs made the SQL Data folder colapse, cause a crash on the sql backup and damaged the live database so we had to download from backup to recover. The backup took 6 hours to recover 2 files (1.5TB) for both files in the same network. I also attempted a full server restore (about 2.9TB) and it took 9 hours to restore from SAS repository to NVME servers.
There is clearly something wrong with the way PBS works or I am just looking at real numbers here?
Also the way PBS manipulates the server drives while doing a backup is insane, since it reads.write realtime competing with SQL server, etc.
For now I halted all use of PBS on SQL critical VMs until I find a solution since I even tried throttling the bandwidth which made no difference and caused i/o storms on the SQL servers. So basically PBS doesn't work for these type of servers? or do I have something that is not configured properly?
I need to be able to backup all my servers without them freezing, causing disk errors etc while the backup is in process but maybe that is just the way PBS works and I need a different solution?
I even disabled the fs freeze because it literally freezed the server so people could not work.
Any help/experience is greatly appreciated.
Teo