Help with vm Win srv 2019 too slow

Letizia

New Member
May 5, 2023
12
0
1
Hi everybody,
I am a new italian proxmox user! :)
I installed a proxmox cluster with three nodes. Each node is a DL380 gen9 with five SSD disks; two RAID1 disks for OS, three 2Tb disks for the Ceph (double 10 Gb fiber).
The vm's with Win srv are very slow, I've read the documentation for installation best practices. I ask for help because I don't know how to solve this problem.
Thank you :)
 
it may adapt to italien mentality - just kidding :)
can you add more info (soon or later somebody will ask).
some commands which are usefully
qm config (VM_ID)
ceph status
vi /etc/network/interfaces

also, could you describe what exactly is slow ? the WIndows itself ? the migration ? the internet access ?
 
  • Like
Reactions: Letizia
it may adapt to italien mentality - just kidding :)
can you add more info (soon or later somebody will ask).
some commands which are usefully
qm config (VM_ID)
ceph status
vi /etc/network/interfaces

also, could you describe what exactly is slow ? the WIndows itself ? the migration ? the internet access ?
LOL!! :D
Thanks for your reply, I share the info with you. My cluster was created to host the management software with Microsoft SQL Server in a safe place.
I noticed the reduced performance when I tried the same software in an older laptop. 10 minutes in the old laptop vs 30 minutes in the vm's for the same processing.
Thanks for the support :)

Code:
root@pveb:~# ceph status
  cluster:
    id:     30e2df28-0540-4q1e-5fdc-badf5170b0fa
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pvea,pveb,pvec (age 13h)
    mgr: pvec(active, since 13h), standbys: pveb, pvea
    osd: 9 osds: 9 up (since 13h), 9 in (since 8w)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 91.04k objects, 355 GiB
    usage:   1.0 TiB used, 15 TiB / 16 TiB avail
    pgs:     33 active+clean
 
  io:
    client:   6.0 KiB/s rd, 52 KiB/s wr, 0 op/s rd, 6 op/s wr

__________

root@pveb:~# qm config 200
agent: 1
boot: order=scsi0;ide2;net0
cores: 6
ide2: none,media=cdrom
ide3: none,media=cdrom
machine: pc-i440fx-7.2
memory: 16384
meta: creation-qemu=7.2.0,ctime=1680354798
name: WinS2k19
net0: virtio=9A:F8:B1:BD:D1:4D,bridge=vmbr4,firewall=1
numa: 0
onboot: 1
ostype: win10
scsi0: NCPool:vm-200-disk-0,cache=writethrough,discard=on,size=100G
scsi1: NCPool:vm-200-disk-1,cache=writethrough,discard=on,size=100G
scsi2: NCPool:vm-200-disk-2,cache=writethrough,discard=on,size=50G
scsihw: virtio-scsi-pci
smbios1: uuid=1340601d-ffbd-4c79-b514-9ed793e72c90
sockets: 2
tags: Test;
vmgenid: 36239324-8e20-4188-9e0f-5e21606cc40c

_____________


auto lo
iface lo inet loopback

iface eno1 inet manual

iface ens5f0 inet manual

iface ens5f1 inet manual

auto eno2
iface eno2 inet manual
#to_porta_11

iface eno3 inet manual

iface eno4 inet manual

iface eno49 inet manual

iface eno50 inet manual

iface eno51 inet manual

iface eno52 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.65.11/24
        gateway 192.168.65.1
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0


auto vmbr1
iface vmbr1 inet static
        address 192.168.71.11/24
        bridge-ports ens5f0
        bridge-stp off
        bridge-fd 0
#Ceph LAN Public

auto vmbr2
iface vmbr2 inet static
        address 192.168.72.11/24
        bridge-ports ens5f1
        bridge-stp off
        bridge-fd 0
#Ceph LAN Cluster

auto vmbr3
iface vmbr3 inet static
        address 192.168.33.11/24
        bridge-ports eno2
        bridge-stp off
        bridge-fd 0
#Erogazione_VM_B

auto vmbr4
iface vmbr4 inet manual
        bridge-ports eno3
        bridge-stp off
        bridge-fd 0
#Erogazione Rete 10
 
same operation in his software runs in 10 minutes an older laptop ( but missing hw specs ) , runs in 30 minutes in the VM.
How many vcpus assigned to the VM ? what vcpu type of the VM ?
Whats are hw specs the DL380 ? cpu model & count
 
  • Like
Reactions: Letizia
what ssd model ?
what storage controller ?
Hi _gabriel,
thank for your request.
The hard disk model is samsung SSD 870evo 2TB.
The controller for the two disks in RAID1 of the OS it is the p440 of the DL380 Gen9; the three disks for the Ceph are connected to the LSI 9211-8i pcie controller.
 
and plz give us an definition what "slow" means
Write latency seems extremely bad (24 ms).
The slowness is evident when the business software accesses Microsoft SQL Server. The VM is not responsive enough even though there is little data required.
 
same operation in his software runs in 10 minutes an older laptop ( but missing hw specs ) , runs in 30 minutes in the VM.
How many vcpus assigned to the VM ? what vcpu type of the VM ?
Whats are hw specs the DL380 ? cpu model & count
The cluster consists of three identical nodes, like this:

HPE DL380 Gen9 with:
56 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz (2 Sockets)
RAM: 128Gb ECC

In the pveb there is VM 200 (Windows Server 2019) with 16Gb of the RAM, 12 processors (2 sockets, 6 cores).

Thanks :)
 
what is cpu of the tested laptop ?
because e5-2680 v4 is 7 years old , so single thread operation can be the bottleneck.
I would turning off mitigations on these old cpu ... and make sure vcpu type is set to host.
 
  • Like
Reactions: Letizia
what is cpu of the tested laptop ?
because e5-2680 v4 is 7 years old , so single thread operation can be the bottleneck.
I would turning off mitigations on these old cpu ... and make sure vcpu type is set to host.
The hardware specifications of the reference laptop are:

Intel Core i5-8265U 2.20GHz
Ram 16 GB DDR4 2400 MHz
Hard Disk SSD NVMe 512GB
Windows 11
 
a few suggestions:

1. change your cpu arrangement to 1 core, 12 core, numa on. you probably dont have any use for this core count, just keeping it the same as your current.
2. move the vdisk off of ceph and onto a local ssd.
3. what version virtio iso did you use to install the guest drivers, and what version windows?

edit- if you have memory balooning enabled, turn it off.
 
Last edited:
  • Like
Reactions: Letizia
of course I've missed the red card for the not datacenter power protected ssd model mandatory for ceph performance because writes are synchronous only.
 
  • Like
Reactions: Letizia
Thanks to everyone for the precious advices.

I did the tests you suggested and I found a big difference with the hard disk of the VM external to the ceph.
The virtio version is 0.1.229.

How can I check if this difference is normal or did I make a mistake in the configuration?

firstImg.jpegsecondImg.jpeg
 
of course I've missed the red card for the not datacenter power protected ssd model mandatory for ceph performance because writes are synchronous only.
I'm sorry @_gabriel but I didn't understand your message.
You mean hard drives are not good to be used for ceph?
Thanks :)
 
I did the tests you suggested and I found a big difference with the hard disk of the VM external to the ceph.
I think you have a fundamental misunderstanding of what ceph does/how it works.

your experience with "performance" for ceph will be the SLOWEST of:

1. the individual disk mechanisms. In case of spinning disks this can be mitigated a LITTLE by moving the db onto faster media.
2. the ceph public interface speed
3. the ceph private interface speed
4. if the ceph private and public interfaces are shared, it would the the maximum of HALF the interface speed.
 
  • Like
Reactions: Letizia
three 2Tb disks for the Ceph
The hard disk model is samsung SSD 870evo 2TB.
The vm's with Win srv are very slow
Write latency seems extremely bad (24 ms).
The slowness is evident when the business software accesses Microsoft SQL Server. The VM is not responsive enough even though there is little data required.

Like every week at least once:
You saved money on the wrong end with those consumer SSDs for, in this case, Ceph.

Have a look at this: [1]. Especially the comparison table on page 3. Your 870 EVO is comparable to the 850 EVO listed there.

So, if you want/need performance, get appropriate enterprise SSDs with PLP (power loss protection).

PS.: CrystalDiskMark is not a suitable benchmark (in this case) anyway. Better use fio with proper parameters.

[1] https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2020-09-hyper-converged-with-nvme.76516
 
Like every week at least once:
You saved money on the wrong end with those consumer SSDs for, in this case, Ceph.

Have a look at this: [1]. Especially the comparison table on page 3. Your 870 EVO is comparable to the 850 EVO listed there.

So, if you want/need performance, get appropriate enterprise SSDs with PLP (power loss protection).

PS.: CrystalDiskMark is not a suitable benchmark (in this case) anyway. Better use fio with proper parameters.

[1] https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2020-09-hyper-converged-with-nvme.76516
First of all, I would like to thank all of you for your support and interest @_gabriel, @pille99, @alexskysilk, @Neobin :)
I did many tests following your instructions and I found two other interesting points.

I tried the directly attached disk in HPE Gen9 DL380 p440 cotroller and I can see Win Server 2019 VM very slow when working with SQL Server together with my management program.
I'm confused about choosing a new hard drive for my infrastructure: SATA, NVME, ... Any pointers?
Thank you :)
 
Last edited:
p440 has its own write battery backed cache ( if not present perfs are bad ) so better set Disk VM cache to writeback.
BBWC is enabled by default on array/disk but hw controller disable disk write cache, so consumer ssd are slow in a hw raid and wearout will be too fast.
imo, a consumer ssd like your samsung evo , can be used only as ext4/lvmthin datastore and outside of a hw raid or as real hba mode to get the best lifetime and reliability + daily backup.
if you want raid (hw or zfs) or Shared Storage, you will need datacenter/entreprise disks.
 
Last edited:
  • Like
Reactions: Letizia
I'm confused about choosing a new hard drive for my infrastructure: SATA, NVME, ... Any pointers?
SAS, SATA, and NVME refer to the host bus connection, no the disk technology- which is what you're after.

HDD- slow, cheap. can be sas or sata (nvme attached HDD will be a thing but not yet)
SSD- can be any of the above. in order of price/gig, can be either consumer or enterprise, read or write optimized.

What should you buy- it depends on your budget and use case. What is the consequence of data outage and data loss in dollars? speed should not be your first priority. buy the best available in your budget.
 
  • Like
Reactions: Letizia

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!