Hi,
We are facing issues of SQL server database corruption for around last 10 days and unable to come out of the situation.
We are running a 3 node Proxmox cluster with Ceph. There are 4 NVMe drives participating in the ceph configuration. These 3 nodes are connected in a full mesh network in Routed (With fall back) mechanism to support Ceph.
Please find hardware information of each of the servers below:
Disk model: 3.2TB Micron_7450_MTFD x4 (Participating in Ceph)
CPU: Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz x2
Memory: 32GB x12
Server Model: Supermicro SYS-620C-TN12R
Network card participating in Ceph: AOC-A25G-i4SM x2
Storage Controller: Broadcom MegaRAID 9560-16i 8GB
This 3-node cluster is hosting around 8 Windows Server VMs. All of them are running Windows Server 2022 Standard editions. The servers are of the following kind:
1) 5 Application servers (Running IIS)
2) 1 Active Directory (Microsoft Active Directory)
3) 2 Database servers (SQL Server 2017 with latest Cumulative Update applied) // Corruption issues are shown here
Please note all the servers are TPM enabled.
The problem started on 12th June 2025 when the SQL server workload had been brought live after enabling database encryption (TDE). The application requires 2 database servers. DB-1 acts as a Transactional server and DB-2 is the Reporting server. The arrangement is that, after the entire day of transactions on DB-1, all the databases are backed up in the night-time and restored in DB-2 for the reporting purpose.
DB-1 is hosting 5 databases. All of them are encryption enabled.
The problem is, after enabling encryption the databases have started corrupted. DBCC CHECKDB shows various allocation and page issues. Even if a new database is created with fresh data and DBCC CHECKDB shows no error at all, but in the next day the database started getting corrupted and throws the errors like following:
It is happening randomly. So far 2 databases are affected. We do not know whether the other database would hit by this issue or not in the future. The daily task has become - we get the corrupt databases, then create a new database. And somehow correlate the data for the entire night and make the database with 0 error. Next day another database gets corrupted and the cycle continues.
It had also happened that the same database had got corrupted multiple times after creating the database freshly with 0 error.
We have looked into the System and Hardware events in the VMs and found nothing related to Storage or I/O subsystem.
The storage of the Windows VM is configured in the following way:
Storage configuration
Is the VM configuration correct in Proxmox?
The hardware we chose, are they compatible to run MSSQL server 2017?
What is going wrong? we need urgent help on this.
PVE information is attached
We are facing issues of SQL server database corruption for around last 10 days and unable to come out of the situation.
We are running a 3 node Proxmox cluster with Ceph. There are 4 NVMe drives participating in the ceph configuration. These 3 nodes are connected in a full mesh network in Routed (With fall back) mechanism to support Ceph.
Please find hardware information of each of the servers below:
Disk model: 3.2TB Micron_7450_MTFD x4 (Participating in Ceph)
CPU: Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz x2
Memory: 32GB x12
Server Model: Supermicro SYS-620C-TN12R
Network card participating in Ceph: AOC-A25G-i4SM x2
Storage Controller: Broadcom MegaRAID 9560-16i 8GB
This 3-node cluster is hosting around 8 Windows Server VMs. All of them are running Windows Server 2022 Standard editions. The servers are of the following kind:
1) 5 Application servers (Running IIS)
2) 1 Active Directory (Microsoft Active Directory)
3) 2 Database servers (SQL Server 2017 with latest Cumulative Update applied) // Corruption issues are shown here
Please note all the servers are TPM enabled.
The problem started on 12th June 2025 when the SQL server workload had been brought live after enabling database encryption (TDE). The application requires 2 database servers. DB-1 acts as a Transactional server and DB-2 is the Reporting server. The arrangement is that, after the entire day of transactions on DB-1, all the databases are backed up in the night-time and restored in DB-2 for the reporting purpose.
DB-1 is hosting 5 databases. All of them are encryption enabled.
The problem is, after enabling encryption the databases have started corrupted. DBCC CHECKDB shows various allocation and page issues. Even if a new database is created with fresh data and DBCC CHECKDB shows no error at all, but in the next day the database started getting corrupted and throws the errors like following:
It is happening randomly. So far 2 databases are affected. We do not know whether the other database would hit by this issue or not in the future. The daily task has become - we get the corrupt databases, then create a new database. And somehow correlate the data for the entire night and make the database with 0 error. Next day another database gets corrupted and the cycle continues.
It had also happened that the same database had got corrupted multiple times after creating the database freshly with 0 error.
We have looked into the System and Hardware events in the VMs and found nothing related to Storage or I/O subsystem.
The storage of the Windows VM is configured in the following way:
Storage configuration
Is the VM configuration correct in Proxmox?
The hardware we chose, are they compatible to run MSSQL server 2017?
What is going wrong? we need urgent help on this.
PVE information is attached