Windows server 2016 disk active time becomes 100% suddenly

jsengupta

Active Member
Feb 27, 2020
44
4
28
29
We are running 2 sql server 2017 over windows server 2016. All of a sudden the disk active time becomes 100% and query executions become slow. After sometimes it becomes normal and functioning correctly.

We are also having a 3rd vm running with windows server 2016 on the same storage. When the other vms having 100% disk utilization, i am testing iops on the 3rd vm and gives 130000 iops. I can do other tasks like file copy etc. as well without any issue.

All the servers are having virtio driver installed.

VM1: 80g ram 14vCPU (numa) total 2tb storage
VM2: 48g ram 8vCPU (numa) total 1tb storage

Test vm: 12g ram 2vCPU (numa) 100g storage.

The host is having hardware raid5 with h730 mini perc controller. Intel d3 s4610 ssd x4

All 3 vms are running over proxmox lvm on this raid pool. No lvm-thin pool


Can anyone help me out what is the possible issue?
 
Hi,
please post VM-Config from VM with and without issue.
All VMs same Version of VirtIO? Latest is 208 currently.
All VMs same Patch-Level? 12/21?
 
All the VMs are running latest virtion which is 208.
All of them are running Windows server 2016 standard
All of them are running SQL server 2017
Patches are same on all the VMs

VM1 and VM2 are having problem and clients are interacting with them.
VM3 is only for testing purpose only.


VM-1 with problem
1641667047101.png


VM-2 with problem
1641667084981.png



VM-3 without any problem
1641667113104.png

We don't know whether Hardware NUMA is enabled on the underlying server or not.


Thanks in advanced.
 
Well, is it a DUAL-Socket-Server? Than it has NUMA enabled. Also first and second GEN AMD EPYC have NUMA even in Single-Socket
So what CPUs has your HOST? Is there CEPH used? Cause your VM3 has an unused disk with ceph-storage in its name?

Do you have Cache enabled on your VirtIO SCSI-Disks? Threads?

Oh, what does Process-Monitor Disk tab show when your VMs are at 100% DiskUsage?
 
Last edited:
For now we are not concerned about CEPH.

Host CPU is
1641713300109.png

No io-thread enabled. No cache is being used on all the Virtio SCSI drives.

I cannot provide you the exact graph right now, since it is a sunday and there is no such load on the servers and they are doing fine now. However, when the problem occurs, it shows sudden fall on the Disk IO graph. Usually the disk graph shows the IO activity around 30 to 40M. But during the problem, it gets down to 100 to 200k.
 
How much RAM do the Hosts have?

Also you should really think about getting a paid subscription if the systems are "that" important and maybe get direct support from Proxmox-Guys....
 
Last edited:
1641721431275.png

Yes we are thinking of the subscription. But we really want to focus on the problem first that we are facing now.
 
Just a first guess... but with this SWAP Usage, consider there is just not enough RAM and maybe your Host gets slowed down due to heavy Pagefile IO.... Maybe you should set SWAPPINESS to 10 or maybe 0 im Host config.....
 
  • Like
Reactions: djsami
I have set the swappiness to 0.

However, should we still need to disable KSM? The host CPU utilization never exceeded 70% during those VM problems.
 
  • Like
Reactions: djsami
I have set the swappiness to 0.

However, should we still need to disable KSM? The host CPU utilization never exceeded 70% during those VM problems.
I manage 15 servers, 800 vm virtual servers, I disable km in all of them
it also makes ksm cpu load it's a fact

https://forum.proxmox.com/threads/ksm-is-using-about-30-of-cpu-time.3857/

Sorry, but this is expected behavior - KSM need to scan the whole memory all the time, so this need CPU (how much can be controlled in the KSM configuration).

https://pve.proxmox.com/wiki/Dynamic_Memory_Management

Just install several KVM virtual machines with the same OS (using at least 80% of your physical memory on the host) and wait a few minutes. You will notice higher CPU activities on the host (ksm daemon) and the used memory on the host will be lowered significantly (see start page showing the overall memory usage).
 
Last edited:
I have set the swappiness to 0.

However, should we still need to disable KSM? The host CPU utilization never exceeded 70% during those VM problems.
When PVE shows 70% CPU load with the CPU E5-2660 v3 it is maxed out. This CPU has Hyper-Threading... on a perfect sunny day when everything works perfect Hyper-Threading can give you about 10 to 15% extra-CPU-Power... meaning.... 70% your CPUs are overloaded by at least 5% minimum during the issue... disable KSM and have a look if it helps....

And if you really care about every last bit of SQL-Performance, you should also disable Hyper-Threading in BIOS at all.....
 
Ok. if we build ZFS RAID-5 with default configuration that proxmox provides and migrate one of the heavily loaded servers to one of the other hosts, will it solve the problem? Of course, we shall disable KSM and set swappiness to 0.

Does heavily loaded SQL server run on default ZFS RAID-5 default configuration with proxmox? Of course, we shall use 4 ssds for underlying storage.
 
Ok. if we build ZFS RAID-5 with default configuration that proxmox provides and migrate one of the heavily loaded servers to one of the other hosts, will it solve the problem? Of course, we shall disable KSM and set swappiness to 0.

Does heavily loaded SQL server run on default ZFS RAID-5 default configuration with proxmox? Of course, we shall use 4 ssds for underlying storage.

For your information, ssds do not break easily.
The sata ssd working 900k hours is still working.
give up raid 5. And ZFS consumes more ram.

always use btrfs.
 
For your information, ssds do not break easily.
The sata ssd working 900k hours is still working.
give up raid 5. And ZFS consumes more ram.

always use btrfs.
btrfs? Really? Isn't it still declared as experimental or beta in Proxmox?
Where are we compared to ZFS in performance perspective view?
 
btrfs? Really? Isn't it still declared as experimental or beta in Proxmox?
Where are we compared to ZFS in performance perspective view?
btrfs raid 5 declared experimental

btrfs raid 0 I'm using it and I've never had a problem
 
For now, we are not concerned about RAM and we are ready to give RAM to ZFS. What we are thinking that we shall allocate 8G+(1GB*8) = 16GB [since, we are going to have 2TB x4 SSD in RAID-5] to ZFS ARC.

Our concern is, will this setting give us the amount of IOPS to the MSSQL servers, that we are getting right now from the present Hardware RAID?
 
For now, we are not concerned about RAM and we are ready to give RAM to ZFS. What we are thinking that we shall allocate 8G+(1GB*8) = 16GB [since, we are going to have 2TB x4 SSD in RAID-5] to ZFS ARC.

Our concern is, will this setting give us the amount of IOPS to the MSSQL servers, that we are getting right now from the present Hardware RAID?
Well... every sort of parity or "erasure coding" will lower IOPS... if IOPS are important for your servers, and for Database-Server IOPS is everything.... use RAID-10-like ZFS. Or if you really need the Z1 ZFS Raid5-Like-Storage put a massive amount of RAM to ZFS so it can compensate as much as possible in RAM.....
 
For now, we are not concerned about RAM and we are ready to give RAM to ZFS. What we are thinking that we shall allocate 8G+(1GB*8) = 16GB [since, we are going to have 2TB x4 SSD in RAID-5] to ZFS ARC.

Our concern is, will this setting give us the amount of IOPS to the MSSQL servers, that we are getting right now from the present Hardware RAID?
GIVES btrfs the best IOPS.

WhatsApp Image 2022-01-10 at 16.47.10.jpeg


btrfs0 4 samsung ssd 870 evo. 1 tb.

Hardware RAID? + ZFS not recommended
 
GIVES btrfs the best IOPS.

View attachment 33123


btrfs0 4 samsung ssd 870 evo. 1 tb.

Hardware RAID? + ZFS not recommended
Says nothing.... ZFS...
1641822731769.png

But yes.... Hardware-RAID is not recommended for ZFS... use a HBA when you want to purely go with ZFS.... and use it RAID-10... even in a Hardware-Raid RAID10 would be the better way to go for high IOPS....
 
  • Like
Reactions: djsami

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!