Slow VMs Performance

dparadis

New Member
Jan 23, 2022
8
0
1
24
Hi!

i have my homelab server for about a year and need help
i started adding new VMs and the more i add the slower the system gets
CPU usage is less than 10% at peak and RAM usage is about 60% (High usage due to ZFS)

1642965019349.png

i started installing a zabbix VM and setuping the MySQL database take hours
1642964932517.png
I've canceled half way and just deleting took an hours
Ubuntu 20.04 install take 2 hours

i installed fast SATA drive in RAID1 but i suspect they are the culprit
2x 2TB Barracuda 7200
Proxmox run on an SSD

is there a way to easely know the bottle neck ?
would i be better with a real RAID controller or just SSD drives ?
 
HDDs in general are bad as a VM storage because the more guests you run the more IOPS hit your pool. And HDDs can't handle more than maybe 100-200 IOPS where SSDs can handle up to factor 10 to 1000 of that. I would also guess the your pool is the bottleneck. But you can check that if you look at the "IO delay" graph. If it goes high your HDDs can't handle the IOPS. And then keep in mind that databases like MySQL will do sync writes which are terrible slow especially if your pool doesn't got a SSD as a SLOG for sync write caching.

Another factor that might slow your server down the more VMs you add is the physical cores/threads to vCPU ratio. The more vCPUs you assign to guests the slower your CPU will get because processes will have to wait in the queue. This can slow down down your entire system even if your CPU load is very low. I don't know how many vCPUs you assigned to guests but I guess this won't be a big problem in a homelab server with 48 physical threads.

And then there is the fragmentation of your ZFS pool. Pools will fragment over time and there is no way to degragment it because ZFS uses Copy-on-Write. So you might want to check how high your pools fragmentation is. The higher the fragmentation, die slower your HDDs will be.
And you should also make sure never to use more than 80% of your pools capacity. After that the pool will get slow and fragmentation will increase more.
 
so currently PVE is installed on a SSD without raid and vm storage on HDD RAID1
is it possible to had a SLOG even if the system is already installed ? or do o need to reinstall everything to put this SLOG in pace ?

also, is it possible to do the caching whit my SSD thus not using RAM ? i could also put like 100GB on SSD since only PVE is taking space
 
so currently PVE is installed on a SSD without raid and vm storage on HDD RAID1
is it possible to had a SLOG even if the system is already installed ? or do o need to reinstall everything to put this SLOG in pace ?
Should work. But keep in mind that a SLOG will only help with sync writes and won't help anything at all with async writes (which should be most of your writes). No SSD caching will really make your HDDs fast. If you want a fast storage with SSD speeds you need to replace your HDDs with (preferable Enterprise) SSDs.
also, is it possible to do the caching whit my SSD thus not using RAM ? i could also put like 100GB on SSD since only PVE is taking space
Are you talking about as L2ARC? In general L2ARC only makes sense if you already upgraded your RAM to the maximum your server will support. The bigger your L2ARC is to use SSDs as read cache, the more RAM it will consume. Using L2ARC you are basically sacrificing a bit of very fast RAM read cache to get more of slow SSD read cache. So you will have a bigger but slower read cache which might slow down your VMs even more.
 
Last edited:
Just to echo what @Dunuin has been saying, high speed storage is vital to getting good performance, you can use HDD for data storage but the VM disks should be on SSD or NVMe drives
 
Just to echo what @Dunuin has been saying, high speed storage is vital to getting good performance, you can use HDD for data storage but the VM disks should be on SSD or NVMe drives
I finnaly installed SSD for VM Storage...

with only 5 VMs im starting to get IO delays and seeing latency in VMs
there is two of thoses VM that are Windows server that are Idle 90% of the time
a PBX which have 2-3 call a day so mostly Idle
Unifi Server with few devices
and an empty Zabbix instance

ive put 128GB of ECC ram and allocated more ram to all VMs but no luck
 
QLC SSDs like the Samsung QVO are terrible slow (down to HDD performance as soon as the cache gets full). Use the forum search and search for "QVO" and you will find lot of people complaining about IO delay using QVOs. Then they replace the QVOs with better SSDs (TLC or better MLC NAND chips and datacenter/enterprise grade SSDs perferable) and everything is fine:
https://forum.proxmox.com/threads/samsung-870-qvo-1tb-terrible-write-performance.82026/#post-472829
https://forum.proxmox.com/threads/proxmox-7-1-6-poor-performance.100632/post-434262
https://forum.proxmox.com/threads/high-io-on-load.105339/
https://forum.proxmox.com/threads/very-poor-performance-with-consumer-ssd.99104/
https://forum.proxmox.com/threads/poor-disk-performance.93370/
https://forum.proxmox.com/threads/abysmal-ceph-write-performance.84361/
...and alot more...so they are just crappy SSDs not meant for server use because of the terrible write performance, low life expectation and missing powerloss protection. I would send them back and get atleast a good consumer TLC SSD like a normal Samsung 870 EVO or better a low budget enterprise TLC SSD like a Intel S4510.
 
Last edited:
  • Like
Reactions: Neobin
i had 2x 1tb 870 QVO in RAID 1
just ordered 4x 500gb EVO that will be set in RAID 10 ...

hopefully will see a difference from the drive ... at least the raid 10 will help
 
QLC SSDs like the Samsung QVO are terrible slow (down to HDD performance as soon as the cache gets full). Use the forum search and search for "QVO" and you will find lot of people complaining about IO delay using QVOs. Then they replace the QVOs with better SSDs (TLC or better MLC NAND chips and datacenter/enterprise grade SSDs perferable) and everything is fine:
https://forum.proxmox.com/threads/samsung-870-qvo-1tb-terrible-write-performance.82026/#post-472829
https://forum.proxmox.com/threads/proxmox-7-1-6-poor-performance.100632/post-434262
https://forum.proxmox.com/threads/high-io-on-load.105339/
https://forum.proxmox.com/threads/very-poor-performance-with-consumer-ssd.99104/
https://forum.proxmox.com/threads/poor-disk-performance.93370/
https://forum.proxmox.com/threads/abysmal-ceph-write-performance.84361/
...and alot more...so they are just crappy SSDs not meant for server use because of the terrible write performance, low life expectation and missing powerloss protection. I would send them back and get atleast a good consumer TLC SSD like a normal Samsung 870 EVO or better a low budget enterprise TLC SSD like a Intel S4510.
thanks for your help!

1655332769020.png
if im using SATA drives would i be better adding SATA disks instead of the SCSI controller ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!