Hello everybody,
I'm new to this, so I'd like to ask for your help with this project of mine. To begin, I will tell you a little about my story, to give you a better context of my problem. If you don't want to read this part, you can skip straight to the "PROBLEMS TO SOLVE" section.
Well, for many years now my company has had a traditional server system (physical server with operating system running directly) and honestly, to date we have never had any serious problems. There were indeed some disks that were replaced at one time, but as we have the disks in RAID1, replacing the disks never meant a stop in production. Well, our current server is a Dell T330-8233 (Xeon E3-1240 v5, 64GB DDR4, 2x SAS HDD 300GB, 2x SSD 480GB, 2x 1GB NIC) and which accommodates the various services we have running. This server has performed well over the years, but is now starting to show some performance losses, so we decided to buy 2 "new" servers to replace it or at least to remove the heaviest services from it. We then purchased 2x Refurbished HPE DL380 G10 (2x Xeon Gold 6136, 128GB DDR4, 2x SSD 480GB, 4x NVMe 1.9TB, 2x 10GB SFP+, 4x 1GB NIC). Unfortunately, we are a small company and we deal with a small budget on a daily basis, which is why refurbished was the option chosen.
The initial idea was to maintain the traditional system, but when we approached our usual supplier to ask for a price for Windows Server and respective CALs, we were presented with a new reality. Our supplier's Server Specialist explained that the traditional system no longer makes sense and that virtualization is a reality for several reasons that you all surely know. We were even given a live demonstration of a production system from a company that our supplier manages, where VMs were migrated and Servers were restarted in front of us without the end customer noticing. To help, we were told that the 2 servers we had purchased were more than perfect for a Cluster system that would allow us to have HA. At that moment we immediately "bought" that idea. It was all very beautiful and in my head I even started to idealize a solution with several VMs, one for each service we have. But the bad news came quickly. The next day we received the quote for Windows and Virtualization Software and it was like being punched in the stomach. The virtualization system my supplier implements is VMware and it is not cheap. Please note, I understand that technology has a cost, but the VMware solution is certainly not for everyone. Even the high price of Windows licensing has led us to forget the idea of having several VMs, thus being limited to only 2 VMs, which are what Windows Server 2025 Standard allows us to have per license (in this case we had to purchase 2 licenses, one for each node, apparently each node that can run a Windows Server must have a valid license).
Well... The option to follow the virtualization path had already been taken and this made us look for alternatives. We were presented with 2: Hyper-V and Proxmox. As we had some time and 2 free servers, we decided to install each of the solutions on each server. Hyper-V soon began to gain points for not requiring any additional cost for its implementation, against the fact that Proxmox, although initially a free solution, for a business implementation it is highly recommended to purchase the subscription, which in practice for us it will be the same as paying to use Proxmox. However, from then on, only Proxmox gained points. From the quick installation which did not require installing drivers to the ease with which we created the RAID via software. However, the final decision came down to the performance comparison between ZFS and Storage Spaces. The difference was so great that the Hyper-V solution quickly ceased to be an option. Maybe if the server supported NVMe Hardware RAID, maybe the performance difference in Hyper-v would be less. So, Proxmox was the path we decided to follow...
From here we begin a path where we encounter some obstacles and this is where I need your help. For anyone who didn't fall asleep or give up halfway through my current description, I'll explain what my issues are.
PROBLEMS TO BE SOLVED:
I apologize in advance for the long topic and probably for additional descriptive information that does not add anything to the real problem.
I look forward to your feedback.
Compliments,
I'm new to this, so I'd like to ask for your help with this project of mine. To begin, I will tell you a little about my story, to give you a better context of my problem. If you don't want to read this part, you can skip straight to the "PROBLEMS TO SOLVE" section.
Well, for many years now my company has had a traditional server system (physical server with operating system running directly) and honestly, to date we have never had any serious problems. There were indeed some disks that were replaced at one time, but as we have the disks in RAID1, replacing the disks never meant a stop in production. Well, our current server is a Dell T330-8233 (Xeon E3-1240 v5, 64GB DDR4, 2x SAS HDD 300GB, 2x SSD 480GB, 2x 1GB NIC) and which accommodates the various services we have running. This server has performed well over the years, but is now starting to show some performance losses, so we decided to buy 2 "new" servers to replace it or at least to remove the heaviest services from it. We then purchased 2x Refurbished HPE DL380 G10 (2x Xeon Gold 6136, 128GB DDR4, 2x SSD 480GB, 4x NVMe 1.9TB, 2x 10GB SFP+, 4x 1GB NIC). Unfortunately, we are a small company and we deal with a small budget on a daily basis, which is why refurbished was the option chosen.
The initial idea was to maintain the traditional system, but when we approached our usual supplier to ask for a price for Windows Server and respective CALs, we were presented with a new reality. Our supplier's Server Specialist explained that the traditional system no longer makes sense and that virtualization is a reality for several reasons that you all surely know. We were even given a live demonstration of a production system from a company that our supplier manages, where VMs were migrated and Servers were restarted in front of us without the end customer noticing. To help, we were told that the 2 servers we had purchased were more than perfect for a Cluster system that would allow us to have HA. At that moment we immediately "bought" that idea. It was all very beautiful and in my head I even started to idealize a solution with several VMs, one for each service we have. But the bad news came quickly. The next day we received the quote for Windows and Virtualization Software and it was like being punched in the stomach. The virtualization system my supplier implements is VMware and it is not cheap. Please note, I understand that technology has a cost, but the VMware solution is certainly not for everyone. Even the high price of Windows licensing has led us to forget the idea of having several VMs, thus being limited to only 2 VMs, which are what Windows Server 2025 Standard allows us to have per license (in this case we had to purchase 2 licenses, one for each node, apparently each node that can run a Windows Server must have a valid license).
Well... The option to follow the virtualization path had already been taken and this made us look for alternatives. We were presented with 2: Hyper-V and Proxmox. As we had some time and 2 free servers, we decided to install each of the solutions on each server. Hyper-V soon began to gain points for not requiring any additional cost for its implementation, against the fact that Proxmox, although initially a free solution, for a business implementation it is highly recommended to purchase the subscription, which in practice for us it will be the same as paying to use Proxmox. However, from then on, only Proxmox gained points. From the quick installation which did not require installing drivers to the ease with which we created the RAID via software. However, the final decision came down to the performance comparison between ZFS and Storage Spaces. The difference was so great that the Hyper-V solution quickly ceased to be an option. Maybe if the server supported NVMe Hardware RAID, maybe the performance difference in Hyper-v would be less. So, Proxmox was the path we decided to follow...
From here we begin a path where we encounter some obstacles and this is where I need your help. For anyone who didn't fall asleep or give up halfway through my current description, I'll explain what my issues are.
PROBLEMS TO BE SOLVED:
- Unlike VMware, Proxmox needs at least 3 nodes to work. We quickly realized that this limitation could be overcome with a QDevice. I even have a Qnap that is only used for backups and, despite being weak, allows virtualization, so I believe that we can easily virtualize a QDevice on this equipment. Problem 1 solved.
- As quickly as we find a solution to the previous problem, we quickly come across another problem. The best solution for HA is using Ceph, however this system necessarily requires 3 nodes. In other words, we return to the first problem. We have done a lot of research on the subject and have come across some possible solutions. This topic of mine is an attempt to understand whether my statements are correct or whether I may not be aware of another solution that is also viable. I will now describe the conclusions I reached:
- 2.1. Forget Ceph and use ZFS Replication. This solution is extremely easy to implement, however it raises a big problem which is the fact that synchronization is not in real time. The minimum replication time is 1 minute and in our company we have a SQL Server where in one minute there can be dozens of transactions, transactions that in the event of a system failure may not be included in the replication existing on the second node and this information cannot be be lost.
- 2.2. Use our Dell server as the third node. Buying a server like the other 2 would be ideal, but honestly our budget is practically zero. In reality the Dell server would probably be left idle after we move all services to the new servers. But this solution raises other problems, as in terms of hardware it is quite different. Meanwhile I read somewhere that for Ceph a third node the hardware difference may not be a problem as long as there is storage for replication. However, I have some fear that we may be creating a solution here that will create low performance for the entire system. So let's see all the points:
- 2.2.1. Network: The network card is 1GB, but I believe that this situation can be easily overcome by purchasing a cheap refurbished 10GB SFP+ card.
- 2.2.2. CPU and RAM: Since it is not exactly the same as the other 2 servers, it will probably not be suitable for supporting VMs (on top of that, for better performance, we will place the CPU as host on the other 2 nodes). However, I think it is possible to configure this server to only be used for replication and not to manage Cluster VMs. Or at most, if that were possible, configure which VMs could run here and then we could choose the one that was least critical.
- 2.2.3. Storage: This is where I think we will have a big problem and maybe here I need your help once again. HPE's NVMe disks theoretically work at 6,500/2,200 MB's (we even made an extraordinary effort to buy fast disks), while Dell's SSDs only work at 520/460 MB's. The size I intend to have in cluster is 1.9TB (2x NVMe 1.9TB in Mirror). I thought of some solutions but I don't know which one is the best:
- 2.2.3.1. Purchase more SSD disks. Purchase 2x 480GB SSD and place the 4-disk RAID0, theoretically achieving a performance of 2,080/1,840MB's. However, I am somewhat concerned that a RAID0 on this node could be a problem in the event of a disk failure. I don't want to have a solution here that, if it fails, creates more problems than solutions. An 8-disk RAID10 solution would already be beyond our budget.
- 2.2.3.2. Use consumer disks. I didn't mention it before but all the disks we have are enterprise class. I don't know if using consumer disks for this third node could be a viable solution. This way, the price of 8 consumer disks in RAID10 would probably be the same as 2 enterprise disks.
- 2.2.3.3. Use some type of NVME adapter. I don't know if is possible to put some kind of NVME PCIe controller or adapter in this server that would allow me to put NVMe disks. I don't know if any of you have any experience with a similar scenario. If so, I would appreciate recommendations.
- 2.2.3.1. Purchase more SSD disks. Purchase 2x 480GB SSD and place the 4-disk RAID0, theoretically achieving a performance of 2,080/1,840MB's. However, I am somewhat concerned that a RAID0 on this node could be a problem in the event of a disk failure. I don't want to have a solution here that, if it fails, creates more problems than solutions. An 8-disk RAID10 solution would already be beyond our budget.
- 2.2.1. Network: The network card is 1GB, but I believe that this situation can be easily overcome by purchasing a cheap refurbished 10GB SFP+ card.
- 2.1. Forget Ceph and use ZFS Replication. This solution is extremely easy to implement, however it raises a big problem which is the fact that synchronization is not in real time. The minimum replication time is 1 minute and in our company we have a SQL Server where in one minute there can be dozens of transactions, transactions that in the event of a system failure may not be included in the replication existing on the second node and this information cannot be be lost.
I apologize in advance for the long topic and probably for additional descriptive information that does not add anything to the real problem.
I look forward to your feedback.
Compliments,