Evaluation Regarding Proxmox, Storage Models, and Deployment

iamsparticus

Active Member
Apr 5, 2012
12
0
41
OK everyone! I've been an avid Proxmox user since 1.x, and I really love the platform. That being said, it's been a HUGE pain trying to figure things out, and now, with questions about performance and, well, just sensibility, I'm asking for some external eyes.

I'd like to lay out how my current production is, and openly ask what should I be doing, and how to do it.

My Current Setup

Currently, I'm running one cluster of 9 hosts, all Dell Poweredge rack servers or various age/hardware. Each device has 2 drive set up. The primary is 2 smaller drives in RAID1, and the secondary is 4+ drives in RAID10. Primary has the Proxmox installation, and second is mounted as ext4 with local qcow2 files. Drives are mostly 7200RPM SAS and SATA.

One backup server is an NFS share, and, using Ayufan's differential backups (https://ayufan.eu/projects/proxmox-ve-differential-backups/), I back up a few times a week over 1Gb network.

All of my VMs are Windows 2008/2012r2 guests with 1CPU/XXcores depending on need, using KVM64 for CPU. I use Virtio for drive/NW, and use ballooning for memory. Write back cache is enabled for all drives, and Best practices followed in VM setup.

Guests are moslty file services/ad, software (database driven), and terminal services.

Each server uses 2 network interfaces. vmbr0 is tied to eth0 and on the management lan for my servers. vmbr1 is eth1 and trunked with vlan support for my guests.

There are no other "tweaks" being done on these hosts. cpu units/vcpu count is default. I turned off "Use tablet for pointer" under options for best practices, and that's it. I've read and tries tons of random tweaks/settings, but nothing of note. If you have some that NEED to be used, please let know what, and how it helps. Thanks!

My Problems

Well, I do run into issues from time to time.

1. CLUSTER -
My cluster is super finicky. It loves to fall apart. I've had this issue for years, and it's such a pain. I just did a clean install of 4.1 this weekend, and fingers crossed, the cluster is fine so far.

2. IOWAIT - 99.9999% of the time, this is the bottleneck for me. I have only a VERY small number of VMs per host (seriously less than 2-3) because IOWAIT can and will be a HUGE pain. I don't use LVM, which I think will assist in this, because, frankly, I need some simple assistance in understanding and deploying. More on that below.

3. Backup speeds - When backing up to my NFS server, which, BT runs Proxmox so it can acts as a test server/emergency backup if a VM or host dies, I average 100MB/s on our 1Gb network. Then, that backup process with randomly drop to 5MB/s and STAY THERE. I've often come in the next morning and found a guest still backing up when it should have taken 30 minutes max.

My Questions

One, without degrading my choices, as I was making the best based on my understanding, I need feedback, and moreso, help.

I don't know/understand LVM well enough. I know how to set it up for local storage. But, with my current config of each host server housing the files of the guest VMs, how to I snapshot to another server? Does that even work with NFS?

When a VM gets hung/locked by a long backup process, how do you resolve it? OR get notified?

With LVM, if it's the solution to my IOWAIT, do I continue with local storage, or is there a better way? If I use NW storage, what's best? I'm on 1Gb network, is that sufficient? I know nothing of NAS/iSCSI/DRBL, but I'm willing to learn. Seriously.

What am I missing? Besides the above, anything I SHOULD be doing, or shoudn't be.

Please, if you have suggestions and corrections, please give me direction to investigate/resolve. I'm self taught, and I'm willing to dig in, but I might have to catch up a little on some things. I really want to maximize the equipment I have to serve the best performance, and I feel I'm falling short.
 
Last edited:
Q1: First off, what are you looking for ? just a couple small VM's ? VM's that get taxed heavily and are public ? Personal only VM's ? Reliability ? Performance ? Large storage ? would help to narrow your use-case a bit better.

2. IOWAIT - 99.9999% of the time, this is the bottleneck for me. I have only a VERY small number of VMs per host (seriously less than 2-3) because IOWAIT can and will be a HUGE pain. I don't use LVM, which I think will assist in this, because, frankly, I need some simple assistance in understanding and deploying. More on that below.

You mentioned IO-wait. TBH, without knowing exactly which type of drive you are using, i can already tell you it is to be expected based on your description and pveperf results.

This is an example of one of my Home machines:
I7-3930k (Q4 2011 - 4 years old)
Sandisk SDSSDP06 (It is a 2,5 year old model, hat was really cheap and not even considered mediocre, much less best of the crop back then)

CPU BOGOMIPS: 76795.56
REGEX/SECOND: 1305090
HD SIZE: 14.15 GB (/dev/dm-0)
BUFFERED READS: 416.79 MB/sec
AVERAGE SEEK TIME: 0.31 ms
FSYNCS/SECOND: 236.47
DNS EXT: 336.08 ms
DNS INT: 273.74 ms (--- Redacted ---)

Now lets compare this to your Server 9, which should net you the highest IOwait times:

Server9:
CPU BOGOMIPS: 63837.12
REGEX/SECOND: 1147742
HD SIZE: 94.37 GB (/dev/dm-0)
BUFFERED READS: 513.17 MB/sec
AVERAGE SEEK TIME: 6.42 ms
FSYNCS/SECOND: 2998.12
DNS EXT: 51.01 ms
DNS INT: 46.58 ms

so every time there is an IO (Input Output) operation happening your disk-subsystem takes on average 6.4 ms to satisfy the request. In Comparison my slow ssd takes 0.31 ms to fulfill that. We are talking in the area of 20x faster here.
To-Dumb this down a little:
IO's do not typically come alone. They are most of the time dependent on one another, whereby there is some IO Operation (1.) happening and the IO operation coming behind (2.) it is dependent on the 1. one to finish. And there is loads of em chaining up in a queue of multiple parallel queues.

It starts adding up really quick.

So the short answer is:
Replace your OS-Drive with a off-the rack non special SSD and you will see some big improvements for everything that utilizes that SSD. Using some High-End SSD's will give you "Hyperspeed" (TM).

3. Backup speeds - When backing up to my NFS server, which, BT runs Proxmox so it can acts as a test server/emergency backup if a VM or host dies, I average 100MB/s on our 1Gb network. Then, that backup process with randomly drop to 5MB/s and STAY THERE. I've often come in the next morning and found a guest still backing up when it should have taken 30 minutes max.

That is most likely due to your NFS-Server not being able to sustain more then 5 MB/s once you filled its cache. Again i'd say your looking at Resource-Starvation. I' not sure what "my NFS server, which, BT runs Proxmox" means, but i assume its run by you, using the same underlying disk system ?

1. CLUSTER - My cluster is super finicky. It loves to fall apart. I've had this issue for years, and it's such a pain. I just did a clean install of 4.1 this weekend, and fingers crossed, the cluster is fine so far.

Can you tell us more about it ?
  • CPU-Type per Node
  • Installed Ram per node
  • Nic(s) and speed per node
  • average CPU-utilisation per node
  • perhaps even how many Drives you "could" stick into each node if you were to do so
  • perhaps even what type of VM's you are running
 
Q1: First off, what are you looking for ? just a couple small VM's ? VM's that get taxed heavily and are public ? Personal only VM's ? Reliability ? Performance ? Large storage ? would help to narrow your use-case a bit better.

This is in an education setting. Three VMs are DB driven, accessed by 70 users simultaneously. 4 are Fileshare/AD, and 2 are TS. The remainder are lightweight for maintenance/specific duties. All WIndows Server 2008r2/2012r2.

The fileshare and AD are serving about 700 devices currently, but may ramp much higher in next few years.

You mentioned IO-wait. TBH, without knowing exactly which type of drive you are using, i can already tell you it is to be expected based on your description and pveperf results.

Mostly SATA/SAS 7200rpm drive ranging from 1TB to 4TB. I'll get model numbers if necessary, but I know qcow2 is affecting performance, hence my question on how much LVM may affect. Also, the pveperf only tests the OS drive, not the drive serving the guests, but I'm assuming similar speed.

That is most likely due to your NFS-Server not being able to sustain more then 5 MB/s once you filled its cache. Again i'd say your looking at Resource-Starvation. I' not sure what "my NFS server, which, BT runs Proxmox" means, but i assume its run by you, using the same underlying disk system ?

Is there a way to measure this? IS there a better way than an NFS share? It's an NFS share on ext4 across a Raid5.

Can you tell us more about it ?
  • CPU-Type per Node
  • Installed Ram per node
  • Nic(s) and speed per node
  • average CPU-utilisation per node
  • perhaps even how many Drives you "could" stick into each node if you were to do so
  • perhaps even what type of VM's you are running

CPU/RAM:
E5-2630 32GB
X5650 8GB
E5-2407 16GB
E5-2407 32GB
E5620 32GB
E5-2470 16GB
E5-2430 32GB

Missing the two newest servers, but these are the older ones, so anything applied here would apply to them.

The NICs are all 1Gb on 1Gb switches on 10Gb backbone.

Average CPU is low. I'll measure over next few days if it will help, but memory and CPU usage is always low. Most of these servers could hold up to 6-8 drives.

I know I could dramatically increase speed with SSD. We can't afford such right now, and the institution I work with doesn't have much funding, much less for infrastructure. Trying to maximize what we have. So I know drive performance isn't great, but I want to know what set up would work best on the hardware given. I've seen benchmarks with qcow vs raw, and I chose qcow due to snapshots.

Questions I'm after are should I compile the larger drivesinto two chassis and create a server for hosting the guest drives only? If so, in what way? DRBL? ZFS? iSCSI? FreeNAS or OpenSAN or whatever? That's where I'm lost.

Or is my current setup adequate, and tweaks to the storage model could increase performance?
 
Your CPU's are "fine"-ish for what you are doing. The only problem is that your Storage-Subsystem is "Suboptimal".

If you can not even afford the really, really low end Consumer grade SSD's
This is in an education setting. Three VMs are DB driven, accessed by 70 users simultaneously. 4 are Fileshare/AD, and 2 are TS. The remainder are lightweight for maintenance/specific duties. All WIndows Server 2008r2/2012r2.

So you DO NOT have a requirement for Large-Storage Space ?
ps.: Windows Servers as VM are not what i'd "consider lightweight" - unless your not using them :p




questions I'm after are should I compile the larger drives into two chassis and create a server for hosting the guest drives only? If so, in what way? DRBL? ZFS? iSCSI? FreeNAS or OpenSAN or whatever? That's where I'm lost.

Suggestions
My Go-To suggestions are always "Raid", ZFS, Gluster of Ceph. (in that order - worst to best - based on my experience and use-cases)

  1. Ceph is what i am most versed in (as i operate it for a living). Without SSD's and multiple 1G Nics per Host i'd not touch it.
  2. Gluster might work (especially for Read-Performances - as it s able to read from all nodes in parallel, but when writing you are Striping the data over the network to all clients in parallel - which is relatively slow)
  3. ZFS at its core is best compared to Raid (for lack of something "similar") It is fault tolerant, can benefit from SSD or Ram based Caches and generally resilient. The caches are what accelerates it. So using a fast SSDs and loads of Ram is what you are after here (this is what i'm least familar with)
  4. Raid you probably know about, its the old and trusted workhorse in most places. Its the old thing, that slowly goes to die, but will work even after burried.

What i'd do in your shoes:

IF you can not even afford some low-Grade SSD's for the OS (Sata 3, >= 50K IOPs writes, 120GB - start at 45 Euros/Drive) and maybe housing the VM's being mostly starved for resources, then your only option is the following:

  • Scrap your current 9-Node Proxmox-Cluster
    • Consolidate that Hardware into a 4-Node Proxmox and a single Node NAS.
      • It does not sound like you need the memory, not the CPU resources for your usecase.
        • With the Energy you safe you now can afford a couple of cheap SSD's (half serious suggestion)
        • The left-over Hardware you sell at Ebay for cheap, now enables you to buy a couple of cheap SSD's (half serious suggestion)
  • Rip out all your Drives
    • Consolidate them into Piles of the same size of Disks (or close to same size)
  • Create Raid-10 or ZFS equivalent Storage Subsystems. (Or Raid-0 if you Trust in your Backup Strategies and do not mind a Outage should a drive fail)
    • This depends on the Number of Disks your chassis can hold.
      • You need at least 4 with Plain Raid-10
      • You probably want at least 3 with ZFS
  • Create a NAS-Server for your Backups
    • Stick your 4th or 5th fastest CPU in it
    • Stick all Drives from your "Large Capacity Drive Pile" in it.
    • Stick a reasonable amount of Memory into it (>=16 GB)
    • Stick your Highest Capacity Drives into it
    • install a NAS-System on it
      1. Rockstor
        1. currently evaluating it
          1. already like it the most after 72 hours of putting it through the paces, then the others
          2. mostly due to the fact that BTRFS on Ceph works a lot better / lightweight then ZFS on Ceph
          3. like the no-nonsense approach
        2. the stable update license is reasonable priced
          1. 15 USD (1yr) - 35 USD (5 yr) per NAS
      2. FreeNas (the previous go-to)
      3. OpeMediavault (okay-ish system)
    • See about getting additional 1G-Nics (Poor mans QOS) or a 10G-Nic for this "NAS"
    • Use it in conjunction with NFS for your Proxmox-Backups
      • depending on the Amount of this this chassi can hold, you might even use it for shared storage for your VM's (highly doubt it based on your Disk-speeds)
  • Create 3-4 Proxmox Nodes
    • use your fastest CPU's
      • The Node housing your DB VM's should be the fastest.
    • Create Raid or ZFS "equivalent" for each Node
      • The Node housing your DB VM's should be the fastest. (the more drives, the faster)
    • Consolidate your Ram into these Nodes
      • The Node housing your DB VM's should be having the most Ram, that is also the fastest.

This should give you a "reasonable" System based on your constraint budget, available resources, and will decrease long-term cost (power is not free - regardless what they tell you)

Hope it helps.
 
Last edited:
I have a couple of 120GB SSD in my office, I'll break two away and plan a test box.

On the server, stick with qcow2 or go LVM/raw?

Any other feedback? Tweaks to maximize Proxmox? Other changes to the environment on the software side?

Thanks Q-Wulf for the feedback, definitely willing to check out this direction.
 
qcow2 is affecting performance, hence my question on how much LVM may affect.

Using LVM or raw has less overhead than qcow. I have a couple nodes with multiple TB filesystems storing raw files. When that filesystem needs fsck it can take a long time. With LVM you can avoid that.

I think q-wolf is right, your storage is inadequate for your needs. I've moved all of our windows db VMs to nodes with SSDs, it makes a huge difference in performance.

Maybe adding some SSD and using dm-cache or flashcache would be a cheap solution to your problems.
 
I do raw, but that is due to ceph.


Any other feedback? Tweaks to maximize Proxmox? Other changes to the environment on the software side?
Your performance issues can be broke down as following:

99% is your sub-par Storage-Subsystem.
0.4% is your LVM/Raw/Qcow question.
0.4% is the (IDE / SCSI / Virtio, the virtual controller and the settings surrounding it)
0.1% is "tuning other Proxmox settings"

to put it into perspective. I could throw in a Horse vs a A380 analogy, but i do not actually think its necessary.
 
Also, here's the pveperf for these servers, so you can gauge their current performance: http://pastebin.com/ydP1fNY8
Hi,
but the pveperf show the systemdisk and not the vm-storage.

How looks your reads+fsyncs on your vmstorage? Use something like this
Code:
pveperf /var/lib/vz
CPU BOGOMIPS:  52800.84
REGEX/SECOND:  1251169
HD SIZE:  16.24 GB (/dev/mapper/pve-data)   
BUFFERED READS:  223.14 MB/sec   
AVERAGE SEEK TIME: 5.70 ms   
FSYNCS/SECOND:  5096.52   
DNS EXT:  62.58 ms   
DNS INT:  0.72 ms

Like q-wulf and e100 allready wrote, it's looks that your IO-subsystem isn't fast enough.

Udo
 
Here's the performance on the vmstorage: http://pastebin.com/BK5yXm6u

I've ordered a couple of SSDs for the OS drives to test, but, just asking, how much benefit will I see upgrading Proxmox system drives to SSD and not the storage drives?
Hi,
the results looks not so bad - do you see iowait during measurement? Normaly if your IO-subsystem is under load your transfer-rate ans esp. fsyncs drops dramaticly.

Perhaps there is another bottleneck?

Look with atop during an high iowait and look, where the bottleneck is (which hdd). Do you use nfs-storage?
Just read your first post - yes your backup-storage is nfs. Is the IO-Wait is gone, if you backup to an local HDD?

Udo
 
Last edited:
I set it all up clean last Friday. Since then I've run 3 backups, 2 full and one incremental. No issues. One larger VM slowed to 30Mbps and took longer than expected, but none stopped and locled the VM (previous issues). So here's hoping performance is better.

One thing in the environment I changed is that the VMBR0 used to be on a trunk port and I used it for some VM traffic too. Now it's a dedicated access port, and backups and the cluster have played fair so far. Maybe there's correlation, maybe not.

IOWAIT is minimized as well. Same physical hardware, but I moved to RAID10 on VM stograge instead os RAID5 due to the performance impact.

I opened this thread in hopes that when these issues appeared I'd have direction to go, but most of it has resolved. I'm glad for that.

On the SSD front, I do want to reask my question: How does SSD for the ProxMox OS impact VM performance that dramatically?

Thanks!
 
no - the OS don't need an SSD... if the host is also an ceph-mon, an SSD is perhaps not an bad idea.

Correct, the Ceph-mon you wanna have on SSD drives.

I've ordered a couple of SSDs for the OS drives to test, but, just asking, how much benefit will I see upgrading Proxmox system drives to SSD and not the storage drives?

The last time i used Proxmox in conjunctiion without a SSD-Drive was back in 2013 right around the time 3.0 came out. I used it on a OVH-Server with 5 HDD's (one for OS, 4 in Raid-10 for VM's). I had issues with IO-wait and switched to a new server (same specs, just 1 SSD (+ 4 HDD (Raid-10 for VM's). No more IO-Issues.

Since then have never again used a HDD for OS-Drives, may it be Linux, Windows or OSX for professional or personal projects.

The advantage you get is not so much with the increase in parallel IO or increased bandwidth (altho it can add up), its about access time (latency). Since SSD's are typically 10-20x faster on that front, once you statistically stop hitting the Drive-Cache, which on newer HDD's means, that your OS uses more then 128 MB of Space.

At work we use cheap 60 GB drives as OS-Drives, especially on the Ceph-nodes we go with as much commercial grade Hardware as we can. I prefer to utilize horizontal scaling over vertical scaling, simply because i most of the time can do more with less (including resources like power and money), but i do not wanna start a discussion about Enterprise off the shelf, vs. Consumer self-build again :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!