Suggestions/warnings to a Windows 2012/MS SQL virtualized setup

pietrek · Jun 8, 2014

Hi.

I work at a medium-sized company which uses an ERP system utilizing MS SQL as it's database backend. The entire setup is (as of now) based around a single HP ProLiant ML350 G6 server running Windows 2008 R2, which has 2x Xeon E5504, 56GB of RAM and 8x600GB SAS 15k HDDs hooked up to a battery-backed HP 410i RAID controller.
The system's usual load is about 40 users (accessing the database) and from what we've gathered this seems to be well below the hardware's max capacity. We've recently came with an idea of upgrading to Windows 2012 R2 and changing the entire setup to make it a bit more "fault-tolerant", or "portable" - we do realize that having just one server with all of the company's data and bussiness-critical software is not good at all in the first place, but (unluckily) we are a bit both budget and staff limited in our department and because of that we're trying to improve the situation with what we have right now. Anyhow, like I said above, we'd like to move to Windows 2012 R2 and virtualize it using Proxmox on the same server that runs our current Windows 2008 setup - the idea is that if something goes terribly bad we'll be able to reinstall Proxmox, restore the Windows VM from a backup and get the entire setup up and running in much shorter time than it'd take to reinstall Windows + all the software and whole setup on bare hardware. Now, we *know* that this is *not the right way* of doing such things, we know that we should have some sort of backup hardware, with shared storage, and so on - we're hoping to get this right some time in the future but for now it's all we've got and we have to deal with it.
What I'd like to know is are there some known obstacles in creating such setup? Any culprits that we should test for beforehand, or things to be aware of before even starting? We want to keep the performance impact as low as we can, of course, so we've already made ouselves familiar with this: https://pve.proxmox.com/wiki/Windows_2012_guest_best_practices and this: http://pve.proxmox.com/wiki/Migration_of_servers_to_Proxmox_VE
We'd like to make this transition as simple and as effective as possible so we're planning to use XFS as our main filesystem combined with hardware RAID (according to these benchmarks http://jrs-s.net/2013/05/17/kvm-io-benchmarking/ we should do fine). We've been considering using ZFS which has some nice features that could become handy at some point, but decided to stick with good old, proven and supported setup to avoid any hassle in the middle of the whole thing. Besides we've never had any serious experience with ZFS and we don't feel like "exploring the uncharted waters" with a mission-critical setup

So any tips, warnings and suggestions, guys? I know that these questions may sound a bit noob-ish, and the idea may seem a little silly, but we think the idea is good (considering our situation) and it should make our lives a bit easier in the future and in case of any serious faliure.

Any constructive input appreciated!

udo · Jun 8, 2014

Hi,
your setup sounds like an good usecase for two nodes with drbd (and a small third node for the quorum).

DRBD is not for free (performance) but with the right IO-Subsystem (raid-10 sas-raid + fast raid-controller) and an fast networkconnection (like infiniband direct connection) there is an setup possible which save you for big trouble if one node fails.
I use DRBD without big trouble on four cluster-couple for years (with different drbd-ressources: SAS, SATA, SSDs).

Udo

pietrek · Jun 9, 2014

Thanks for your feedback. We don't have enough resources to implement DRDB right now, but we will consider that in the future. Anything else we should be aware of, or do in a different way?

cesarpk · Jun 9, 2014

pietrek said:
Thanks for your feedback. We don't have enough resources to implement DRDB right now, but we will consider that in the future. Anything else we should be aware of, or do in a different way?

Hi pietrek

Only as a comment, Udo is a "Master in this forum", he has helped me, as also to many people

And about of DRBD (for time future), with the target that you know about of this, please see this link:
http://forum.proxmox.com/threads/18699-ISCSI-Storage-with-LVM-Partition?p=95661#post95661

But, for this moment that you can not have a second PVE node configured in "High Availability", i suggest:
1- Have a PVE node very basic (may be a workstation) with the NFS service running

Why with NFS?..
2- The NFS protocol is the best protocol for gain speed in a network comunication.

And why a NFS shared in a PVE Node?:
3- Because you will can do a restore in the same PVE node for test if your backup of VMs are good.

4- From that Windows systems support NFS, also you can run a backup of your SQL data in your Windows system with the destination to your NFS Shared Server, that will be more quick that do a backup of the complete VM. Of this manner and in automatized mode, you can have tasks of backup of SQL data for all days, and also (for example) have the automatized backup of your VM once a week (for example a sunday day when few people are working)
This idea is for that you can do a restore of the VM and SQL DATA in short time and in any PVE node that you want.

Best regards
Cesar

Re edited: A more tip of Networking: If you have a switch managed, using bonding in your NICs with the 802.3ad protocol, with two NICs, you will can get duplicate the speed of the comunication of network, and this new speed can be used for your PVE nodes, the PVE cluster communications, your backups, etc.

Other benefit of use NICs in bonding, is that if a cable of network is decomposed or a NIC port in the server or in the switch, all will continue working well, with the unique difference that your speed of network comunication will be reduced

But if you don't have a switch managed, you can configure the bonding in mode "active-backup" for avoid the lose of the network communication. In this case always the speed will be the same, and nobody will know if a cable of network was descompose.

In both ways, always you can see in the server or in the switch if this disconnect exist. I have particulary a script in bash that see for me this type of problems and send me a mail if is the case.

e100 · Jun 10, 2014

I dislike large file systems on a virtualization host because eventually it will need to have an FSCK run and it will take forever.
It is also really easy to over provision and filling up your disk will cause issues for your running VMs.

I would suggest to use LVM for the VM storage, as Udo suggested DRBD would be great too.

My biggest suggestion is this:
Seek a solution to your biggest problem: "we do realize that having just one server with all of the company's data and bussiness-critical software is not good at all in the first place"
That is the problem you need to solve and you will get the "portable" and "fault-tolerant" features for free.

Maybe this will give you some talking points so the boss can find some money to solve your real problem, the one server Single Point of Failure:
With 40 users any significant amount of downtime will cost your company more money than a 2nd server would cost.
Idle users and overtime to catch up cost money too.

There are lots of ways to save money, DRBD is free and provides real time data replication, you would pay multiples of $10k for such features/equipment from SAN vendors. Look how much money you already saved!
Used servers are cheap and a used server is better than nothing at all when the one you have decides to fail.
Used Infiniband gear is cheap and works great for DRBD replication.
Look how much money you saved by seeking advice here instead of hiring a consultant!

Before Proxmox + DRBD I had conversations like this:
Boss: Why is the site down?
Me: Single Point of Failure DB server is down
Boss: How long till it is repaired?
Me: Don't know, I was just getting ready to drive to the datacenter so I can find out, I will keep you updated.

After Proxmox + DRBD:
Boss: Why was the site down for 5 minutes?
Me: Server failed so I started the DB VM on the backup node
Boss: Great, thanks for being on top of things!

Here soon I think I will be moving from DRBD to CEPH, the conversation can then change to this:
Boss: Why was the site down for 1 minute?
Me: The server running the DB VM failed, Proxmox auto-started it on another node and it takes about 1 minute to boot up.
Boss: So it fixed itself?
Me: Yep!
Boss: Wow!
Me: Yep, Magical Server Pixie Dust at last!

Search

Search

Suggestions/warnings to a Windows 2012/MS SQL virtualized setup

pietrek

New Member

udo

Distinguished Member

pietrek

New Member

cesarpk

Well-Known Member

e100

Renowned Member