LXC-VM Resource Allocation.

RealEngineer · Jun 25, 2022

Hey Y'all. I've got a question. I hope it isn't a stupid question, but the only way to know is to ask, because google hasn't been forthcoming.

I'm not an IT guy. By education, I'm an engineer. Computers are my hobby. I know enough to install proxmox and configure multiple LXCs/VMs over command line, but I probably only know 5-10% much as a CompSci or Cybersecurity professional.

I currently have two applications running on my one node (more applications coming): a pi-hole running on an LXC, and an ubuntu server that I am setting up for file storage. I don't have a ton of resources to go around (4 core 8 thread, 16GB ram) so the pihole only has 1 core, 512mB of RAM. The server has 3 cores and 4 GB. Under normal circumstances, the pihole runs fine, not even close to max resource utilization, because it is only supplying a handful of devices. Whenever I do something intensive on a VM (any VM, not just the ubuntu server), the pihole claims it is under heavy load on CPU through the web interface, then eventually becomes unresponsive to any and all commands, breaking internet on all devices set to use it. The tasks underway on VMs are not querying the pihole (file copying, etc) so it isn't being overloaded with DNS requests.

To my untrained eye, it looks like VMs are taking resources out from underneath LXCs when there is plenty to go around. How can I go about diagnosing/fixing this?

Abd7 · Jun 26, 2022

does not answer the resource allocation question,
but I believe this will solve if you put pi-hole on a real 512MB VM instead of LXC

RealEngineer · Jun 26, 2022

Abd7 said:
does not answer the resource allocation question,
but I believe this will solve if you put pi-hole on a real 512MB VM instead of LXC

I had considered that as a fix, but it appears now that the whole server is experiencing performance issues . Again, the task on the VM is copying large files. I couldn't even get the server to create a new VM in less than 10 minutes while the task was running. I have noticed I/O delay is hovering around 40-60% while copying. That's a new type of problem for me. Working on how to fix.

Dunuin · Jun 26, 2022

Are you using a SSD to store your guests? Otherwise its normal that you easily run into high IO delay. HDDs can only offer like 100-200 IOPS. Then you got overhead because of virtualization, mixed blocksizes, storages engines and filesystems that are way more complex compared to a simple NTFS and so on. And server workloads often use sync writes, which is even more worse with HDDs.

Concering the CPU load that Pi-hole is complaining about: LXCs aren't fully isolated. Applications inside the LXC will see the real hardware of the host. When Pi-hole is alerting that you for example got a 4.0 CPU load but you only got 1 core, then you can ignore this, as the 4.0 is not the load of the LXC but the load of the whole server (or PVE host).
But that your internet won't work on workloads running on that VM indicates that your PVE host can't keep up. If the VM uses too much ressources, than that will slow down the PVE system itself aswell as all other LXCs and VMs. If you just got a quadcore CPU, don't assign more than 3 vCPUs to a single guest. So when your internet isn't working you should check if that is caused by IO delay or too much CPU load.

RealEngineer · Jun 26, 2022

Dunuin said:
Are you using a SSD to store your guests? Otherwise its normal that you easily run into high IO delay. HDDs can only offer like 100-200 IOPS. Then you got overhead because of virtualization, mixed blocksizes, storages engines and filesystems that are way more complex compared to a simple NTFS and so on. And server workloads often use sync writes, which is even more worse with HDDs.

Concering the CPU load that Pi-hole is complaining about: LXCs aren't fully isolated. Applications inside the LXC will see the real hardware of the host. When Pi-hole is alerting that you for example got a 4.0 CPU load but you only got 1 core, then you can ignore this, as the 4.0 is not the load of the LXC but the load of the whole server (or PVE host).
But that your internet won't work on workloads running on that VM indicates that your PVE host can't keep up. If the VM uses too much ressources, than that will slow down the PVE system itself aswell as all other LXCs and VMs. If you just got a quadcore CPU, don't assign more than 3 vCPUs to a single guest. So when your internet isn't working you should check if that is caused by IO delay or too much CPU load.

Thanks for the information. No, not using an SSD for guests, though I do have one lying around. To be clear, you are only talking about storing the VMs/containers/guest OS' themselves on SSDs, not everything, correct? As in, that could solve the IO issue? I get that harddrives can be slow (they are in zraid1, so there is at least some striping.) The breaking internet was due to the pihole not being able to run (no DNS resolving). I don't have any VMs with more than 3 cores doing anything.

Dunuin · Jun 26, 2022

RealEngineer said:
Thanks for the information. No, not using an SSD for guests, though I do have one lying around. To be clear, you are only talking about storing the VMs/containers/guest OS' themselves on SSDs, not everything, correct?

You can store eveything on SSDs. But the PVE system needs no fast storage and maybe you also got cold data in your guests that don't need to be that fast. But atleast all virtual disks you are running your VMs/LXCs system partition and services on should be on an SSD. Also keep in mind that for server workloads (like ZFS) enterprise/datacenter SSDs are recommended. Consumer SSDs might die very fast or could be as slow as a HDD (especially the QLC ones).

RealEngineer said:
As in, that could solve the IO issue? I get that harddrives can be slow (they are in zraid1, so there is at least some striping.)

A raidz1 ZFS pool actually got even less IOPS than a single disk. If you want more IOPs you need to use a striped mirror. Raidz will only increase throughput.

RealEngineer said:
The breaking internet was due to the pihole not being able to run (no DNS resolving). I don't have any VMs with more than 3 cores doing anything.

Then maybe your CPU can't work because its always waiting for the disk to read/write data (which is what IO delay means).

RealEngineer · Jun 26, 2022

Right now, I am copying files on a VM with 3 cores, and everything else ground to a halt again. Resource utilization is as follows:

Dunuin said:
Then maybe your CPU can't work because its always waiting for the disk to read/write data (which is what IO delay means).

If I understand you correctly, the array is way slower than the CPU, a bottleneck. Even though there are plenty of CPU cycles to go around, there is no more harddrive IO to allow anything else to function. If I were to place HDD bandwidth limits on the VMS (say, the media server can only take up 75% of what the array is capable of) would that allow other VMs to continue functioning, assuming none of them totally consume the rest of the bandwidth?

Another thing that is puzzling to me is that I never saw something like this happen when all applications were running on one host OS... copying all these files off the last OS to transfer them didn't freeze up the whole computer, I was still able to use web browsers, other programs, etc. Can you explain why this is? I am guessing one host OS is more able to dynamically allocate bandwidth, vs that's supposed to be my responsibility in proxmox.

Dunuin said:
You can store eveything on SSDs. But the PVE system needs no fast storage and maybe you also got cold data in your guests that don't need to be that fast. But atleast all virtual disks you are running your VMs/LXCs system partition and services on should be on an SSD. Also keep in mind that for server workloads (like ZFS) enterprise/datacenter SSDs are recommended. Consumer SSDs might die very fast or could be as slow as a HDD (especially the QLC ones).

I cannot buy enough SSDs to store all my media, but if I understand correctly, if my VMs were on a different drive (not even necessarily a faster one), the copying operation would take up all of the raid Z array's bandwidth and the VM OS' would be free to continue operating as long as the CPU has cycles to spare, correct?

Dunuin said:
A raidz1 ZFS pool actually got even less IOPS than a single disk. If you want more IOPs you need to use a striped mirror. Raidz will only increase throughput.

... after googling IOPS vs Throughput, and understanding what I read, I still don't understand how you can increase one and not the other. Is that a raid 5 vs 10 thing, or Zraid1 vs Zraid 10 thing? I do see that raid 10 would have a higher write speed...

I chose proxmox knowing it was not the easiest way to accomplish my goals in order to learn more about computers and server management. Thank you for your patience with my questions and willingness to explain things.

Dunuin · Jun 26, 2022

RealEngineer said:
Right now, I am copying files on a VM with 3 cores, and everything else ground to a halt again. Resource utilization is as follows:
View attachment 38432

If I understand you correctly, the array is way slower than the CPU, a bottleneck. Even though there are plenty of CPU cycles to go around, there is no more harddrive IO to allow anything else to function. If I were to place HDD bandwidth limits on the VMS (say, the media server can only take up 75% of what the array is capable of) would that allow other VMs to continue functioning, assuming none of them totally consume the rest of the bandwidth?

The problem is not the bandwidth of your storage its the IOPS. Lets say your pool can handle 100 IOPS. Do 100x 4K sync writes per seconds and 400kb/s would be enough to to fully saturate your pool.

RealEngineer said:
Another thing that is puzzling to me is that I never saw something like this happen when all applications were running on one host OS... copying all these files off the last OS to transfer them didn't freeze up the whole computer, I was still able to use web browsers, other programs, etc. Can you explain why this is? I am guessing one host OS is more able to dynamically allocate bandwidth, vs that's supposed to be my responsibility in proxmox.

Did the other host OS used a ZFS raidz? ZFS got reeeeally high overhead and is sync writing alot. Running virtualization and enterprise grade filesystems is a total different workload compared to for example a simple baremetal NTFS Win11 or ext4 Linux installation.

RealEngineer said:
I cannot buy enough SSDs to store all my media, but if I understand correctly, if my VMs were on a different drive (not even necessarily a faster one), the copying operation would take up all of the raid Z array's bandwidth and the VM OS' would be free to continue operating as long as the CPU has cycles to spare, correct?

Best you benchmark your pools performance using fio. Running pveperf might also be a good start in case your PVE is also installed to this raidz pool. The interesting value there is the "FSYNCS/SECOND" which is the sync write IOPS your pool can handle.

RealEngineer said:
... after googling IOPS vs Throughput, and understanding what I read, I still don't understand how you can increase one and not the other. Is that a raid 5 vs 10 thing, or Zraid1 vs Zraid 10 thing? I do see that raid 10 would have a higher write speed...

I chose proxmox knowing it was not the easiest way to accomplish my goals in order to learn more about computers and server management. Thank you for your patience with my questions and willingness to explain things.

IOPS performance is how many small (4k) random read/write operation the storage can handle within a second. Throughput is how much data can be sequentially read/written when doing big (for example 1M) read/write operations.
If you want an example, lets compare 8 disk in a raidz1 vs striped mirror where "1x" means 100% performance or 100% capacity of a single disk:

	8 disk raidz1:	8 disk striped mirror:
Capacity:	7x	4x
IOPS:	<1x	4x
Throughput Write:	7x	4x
Throughput Read:	7x	8x
Minimal reasonable blocksize when using VMs with ashift=12:	32K	16K
Resilver time:	horrible	ok
Disks may fail:	1	1-4

So an raidz1 is terrible at IOPS performance as the IOPS performance won't scale with the number of disks. Only the throughput will scale with the number of disks. A striped mirror got a way better IOPS performance as its not a single raidz vdev but four mirror vdevs that are striped together and can work in parallel. Because 4 vdevs can read/write in parallel you theoretically get 4 times the IOPS performance.

RealEngineer · Jun 26, 2022

Dunuin said:
The problem is not the bandwidth of your storage its the IOPS. Lets say your pool can handle 100 IOPS. Do 100x 4K sync writes per seconds and 400kb/s would be enough to to fully saturate your pool.

So, it is IOPS not bandwidth, but the same principle of "reserve some for the rest of the system" is correct and feasible, right?

Did the other host OS used a ZFS raidz? ZFS got reeeeally high overhead and is sync writing alot. Running virtualization and enterprise grade filesystems is a total different workload compared to for example a simple baremetal NTFS Win11 or ext4 Linux installation.

... it was not. I didn't know that about ZFS. I may need to re-work how my data is stored.

Dunuin said:
Best you benchmark your pools performance using fio. Running pveperf might also be a good start in case your PVE is also installed to this raidz pool. The interesting value there is the "FSYNCS/SECOND" which is the sync write IOPS your pool can handle.

I think I understand. PVE is on a separate drive, which may explain why it remains functional, since those iops aren't being eaten up when I run a difficult task on a VM on the ZFS pool.

Dunuin said:
IOPS performance is how many small (4k) random read/write operation the storage can handle within a second. Throughput is how much data can be sequentially read/written when doing big (for example 1M) read/write operations.

Thanks for explaining, I understand better than I did before.

Dunuin said:
If you want an example, lets compare 8 disk in a raidz1 vs striped mirror where "1x" means 100% performance or 100% capacity of a single disk:

8 disk raidz1: 8 disk striped mirror:
Capacity: 7x 4x
IOPS: <1x 4x
Throughput Write: 7x 4x
Throughput Read: 7x 8x
Minimal reasonable blocksize when using VMs with ashift=12: 32K 16K
Resilver time: horrible ok
Disks may fail: 1 1-4

So an raidz1 is terrible at IOPS performance as the IOPS performance won't scale with the number of disks. Only the throughput will scale with the number of disks. A striped mirror got a way better IOPS performance as its not a single raidz vdev but four mirror vdevs that are striped together and can work in parallel. Because 4 vdevs can read/write in parallel you theoretically get 4 times the IOPS performance.

This was helpful. ZFS can do striped mirror, I get there will be overhead, but will there be the same IOPS problems?

So... I was totally wrong about the cause of my problem. I don't think I have further questions, I'm going to see if proxmox/ubuntu server supports my raid card, which hopefully will help get rid of the overhead.

Dunuin · Jun 27, 2022

RealEngineer said:
So, it is IOPS not bandwidth, but the same principle of "reserve some for the rest of the system" is correct and feasible, right?

There is not really a way to reserve ressources. You can limit the bandwidth but the bandwidth doesn't directly relate to IOPS. Best is to get a storage that is fast enough so that it never gets so worse at all, that nothing is working anymore.

RealEngineer said:
I think I understand. PVE is on a separate drive, which may explain why it remains functional, since those iops aren't being eaten up when I run a difficult task on a VM on the ZFS pool.

Jup, if it would share the same pool then you wouldn't even be able to access the webUI or SSH until the IO load drops.

RealEngineer said:
This was helpful. ZFS can do striped mirror, I get there will be overhead, but will there be the same IOPS problems?

Running HDDs in a striped mirror is still terrible slow as SSDs could have up to 1000 times the IOPS performance. But might be multiple times faster than running HDDs in a raidz, depending on how many HDDs you stripe together.

RealEngineer said:
So... I was totally wrong about the cause of my problem. I don't think I have further questions, I'm going to see if proxmox/ubuntu server supports my raid card, which hopefully will help get rid of the overhead.

With that you run into other problems. Not easily possible to switch hardware if your server fails, your data might silently corrupt over time as there is no bit rot protection, not possible to use replication (great for backups or a HA cluster), no blocklevel compression or deduplication to save space, still no great IOPS performance as a HW raid might use a very big blocksize especially if you don't got a HW raid with RAM cache + BBU, ...
Best you test both and do some fio benchmarks. then you can compare what works best for you.

I personally would use a dedicated SSD pool for your VMs/LXCs that run your services and a HDD pool just for your cold data like the files shared by your NAS VM. Then transfering alot of stuff to the NAS won't slowdown other services. And you get the benefit of cheap big storage for things that can be slow and a expensive but fast storage for the things that have to be fast like virtual disks storing the OS. With your NAS VM you then for example could put a small virtual disk, your ubuntu is installed to, onto your SSD pool and a second big virtual disk, where you store your files you want to share, onto the HDD pool.

Neobin · Jun 27, 2022

Dunuin said:
I personally would use a dedicated SSD pool for your VMs/LXCs that run your services and a HDD pool just for your cold data like the files shared by your NAS VM.

+1

Also keep in mind, that we are talking about a 13 year old system [1] with only 16 GB RAM, ZFS, virtualization, large data transfers and this all on only a single pool of, probably also old, spinners.

So as Dunuin said, I also think your best improvement is to use SSD-storage for the host and your guests and use the HDDs only for your cold data.

[1] https://ark.intel.com/content/www/u...l-xeon-processor-x3450-8m-cache-2-66-ghz.html

RealEngineer · Jun 27, 2022

Upgrading may be a possibility in the future, but for now I am stuck with the hardware I have. I am getting the impression enterprise tools like proxmox/zfs may not be the best way to get the job done with my current level of hardware, and that's fine. I understand I'll have to compromise on reliability features to get reasonable performance at this stage, and for now, I am okay with that, I'm not running anything super important that isn't backed up. I look forward to when I'll get to give proxmox another try with the hardware it needs. Thank you for your help, I've learned a good deal, and that was my goal.

RealEngineer · Jul 16, 2022

RealEngineer said:
Upgrading may be a possibility in the future, but for now I am stuck with the hardware I have. I am getting the impression enterprise tools like proxmox/zfs may not be the best way to get the job done with my current level of hardware, and that's fine. I understand I'll have to compromise on reliability features to get reasonable performance at this stage, and for now, I am okay with that, I'm not running anything super important that isn't backed up. I look forward to when I'll get to give proxmox another try with the hardware it needs. Thank you for your help, I've learned a good deal, and that was my goal.

My issue is fixed, figured I'd post to help anyone in the future. Among other things, I read ars technica's articles on ZFS, and would highly reccomend them to anyone getting started with ZFS. They explain all the terminology in detail, I found it quite invaluable.

For one, Dunuin was right about zraid vs mirrored drives. Zraid will give iops of lowest performing drive per vdev. I had one decent performing drive, and 2 poor performing drives. I bought another decent performing drive, and I now have 2 mirrors, one with the faster drives and one with the slower ones. I intelligently choose which data to put where based on workload, but just using ZFS mirrors already improved iops significantly.
Second, by following performance recommendations here
https://openzfs.readthedocs.io/en/latest/performance-tuning.html
I learned to tune ZFS datasets to specific applications. I now have OS and FILE datasets on each mirror. OS datasets have 4k blocks, FILE datasets have 1M blocks. This SIGNIFICANTLY helped my iops. I was able to run multiple OS tasks on my faster mirror without depriving my DNS server of IOPS it needed to function, and large, long file-copy ops are now done on my slower pool to avoid hogging all the IOPS resources. I am now experiencing much lower (10-20%) IO delay during file copy (something I only have to do once), without breaking any VMs.

With just a little bit of tuning, I was able to get the exact same application to work 10x (figuratively) better on the same old anemic hardware. It would ultimately be better if I bought better drives, more ram, newer server, and I will, but at the very least the server functions during the mean time.

Search

Search

LXC-VM Resource Allocation.

RealEngineer

Member

Abd7

New Member

RealEngineer

Member

Dunuin

Distinguished Member

RealEngineer

Member

Dunuin

Distinguished Member

RealEngineer

Member

Dunuin

Distinguished Member

RealEngineer

Member

Dunuin

Distinguished Member

Neobin

Distinguished Member

RealEngineer

Member

RealEngineer

Member

We value your privacy