Slow perfomance ZFS and LVM

ATX-250

Member
Jun 5, 2019
6
0
6
37
Good day to all.

I have five servers with Proxmox version 5 installed and all have the same problem with the performance of hard drives inside the guest VMs.
Problem with ZFS and LVM.
The ZFS problem is the following: inside virtual machines that are located on the ZFS storage, when copying files from a file server (separate physical machine) after copying about three gigabytes of data, the copying speed starts dropping from 70 MB per second to zero (for a few seconds the copying stops completely, then the speed rises to 50-60 MB, some more data is copied and again the speed drops to zero and so on). At the same time, all the VMs on this ZFS pool begin to lag terribly.
During copying, iotop shows a lot of zvol processes with IO from 10 to 39%. When copying speed drops to zero, the zvol processes in iotop disappear, and the IO becomes 0%.
But if I kill ZFS on the same disks and create an ext4 partition from them and connect it as a virtual machine storage, I plug the disk into the guest VM from it - then when copying the same data
(iso-image) speed is stable at around 70-80 Mbit per second. VM does not hang, does not lag.
ZFS created a partition with zpool create -o ashift = 12 parameters. ZFS-POOL / dev / mapper / cfs-zfs. CFS-ZFS is an encrypted device that sits on top of a five-disk raid6 mdadm.
I also tried different parameters of ashift and volblocksize and the size of clusters of file systems inside virtual machines - does not affect the speed and result.

Problem LVM - created LVM partition from single disk without encryption. (tried from two, tried on top of mdadm raid1 - does not affect the effect). On the LVM storage, when copying the same data, the speed it starts from 70-80 MB, and then drops to 20-30 MB per second and stays so stable.
It is very strange, but if you create a disk (lvcreate -L 30G vm-vg) on the same LVM slot, mark it in ext4 and connect it to a virtual machine, then there is no speed problem.
There are no speed problems on the Proxmox server itself. I copy from the same file server the same iso-image in /zfs-pool - the speed is 70-80 MB per second and is kept stable.

Guest VM - Win2012R2 (full updates), 2 cores, 8GB RAM, Virtio SCSI HDD, Virtio ethernet.
Linux VM has exactly the same problem.
I’ll immediately add that I tried to copy the data, delete it and copy it again to the same place - it does not affect the situation.

Hard drives are connected via LSI SAS 9260-8i. Each disk is assembled as a separate raid0 from one disk.
Hard drives - 6 pieces of 300Gb SAS 10k RPM and 2 600Gb HP 10k RPM.
CPU: 2 x Intel Xeon CPU E5-2609
Supermicro X9DRi-LN4 +
RAM: 128Gb DDR3 ECC
pve-manager / 5.4-3 / 0a6eaa62 (running kernel: 4.15.18-12-pve)

Another configuration of two servers for example, that this is not bad iron:
HP DL510 G8
CPU: 4 x Intel Xeon CPU E5-4640
RAM: 368Gb
HP SmartArray Raid controller. Two HP 600Gb SAS 10k drives in RAID1.
External storage HP P2000G3. 9 disks in RAID6 for ZFS virtual machines. 6 RAID5 disks for LVM virtual machines. 8Gbit / s FibreChannel.
pve-manager / 5.3-11 / d4907f84 (running kernel: 4.15.18-11-pve)

Supermicro X11DPi-N
CPU: 2 x Intel Xeon Silver 4110
Adaptec 6x1Tb SATA RAID Controller. Each disk in RAID-0 is created.
ZFS RAID-1 system (two disks)
Virtual machine storage: ZFS over RAID-5 mdadm of four disks.
pve-manager / 5.2-1 / 0fcd7879 (running kernel: 4.15.17-1-pve)

I know about the fact that ZFS does not like RAID controllers and it is desirable for it to allocate an SSD disk for cache and slog, but what about LVM? On Proxmox 4 with the LVM repository, I had no problems. The problems apparently began in fifth version of proxmox. I started using ZFS with proxmox version 5 for VM replications. At the time of the creation of a replica of any of the VMs, it is impossible to use the others (in the same ZFS-pool).

Translated from Russian to English to google translate ^ _ ^
 
1) I assume the issue is due to the controllers' cache. This is a familiar behaviour for me - the speed is really fast while the controller manages to fill the cache, but when the cache is full - it cannot push more data onto the RAIDs. I encourage you to try the same experiment with a passthrough capable controller.

2) I would also double-check if all the Virtio drivers, especially the disk ones are set correctly.

3) And i would also set a boundary for how much RAM can ZFS use for ARC. It may be the case it's saturating the RAM and the machines start lagging due to lack of it.

PS: feel free to message me in Russian.
 
Vladimir, thanks for reply.
1) Raid0 from LSI controller build with these param: Current Cache Policy: WriteThrough, ReadAhead, Cached, No Write Cache if Bad BBU. I have not BBU. I do not thing, that trouble with cache of HW Raid controller - on ext4 do not have speed trouble.
2) Virtio Drivers from latest stable ISO (virtio-win-1.4). Without it VM do not see Virtio devices.
3) When VM begin lag free space of RAM still around 30-40Gb (on the Proxmox). I mean server, that have 128Gb RAM. On server that have RAM 368Gb free RAM around 100-120Gb. I did not make any ZFS tweaks - all is has default configs.
 
@ATX-250,
If i were you, i'd look into why EXT4 is ok and ZFS is not.
The only significant differences are:
  1. (assuming EXT4 is done on a hardware RAID) the hardware setup, meaning how disks are made available to the OS. In this case the controller and drive settings play a major role;
  2. VirtIO drivers, especially on Windows VM can be tricky. Make sure the drive has VirtIO drivers, so does the controller.
  3. you may need to try debugging the ZFS. Try bounding the amount of RAM it can use.
I still think that the hardware controller is the most likely candidate, given that you have different setups and fresh Proxmox installs. Another likely candidate are VirtIO drivers. Try copying files from a Linux VM to another Linux VM and see if the issue persists.
 
On Linux hosts same trouble with performance :(
I added one SSD kingston 120Gb on my server as log device for ZPOOL from 6 disks HWraid6 - nothing change, speed down after copy around 1-2 Gb ISO.
But, if I do zpool from this one SSD - speed down after around 30Gb copy. On two SSD with ZFS raid-1 (zpol create mirror) - around 15Gb.
Now Installing Proxmox on desktop-base PC (Intel core-i3, 16Gb ram, 2x500 (I was choose RAID-1 from proxmox installer) HDD without any HW raid.
On desktop PC result at same - after copy around 3Gb speed go down :(
 
Ok, but is it not normal, though?
Here are a couple of things to keep in mind:
  1. even SSDs are very different. If you are not using enterprise SLC drives or PCIe drives, there is a buffering issue at some point. As the buffer gets full, the speed drops. 30GB sounds like a decent limit;
  2. if you copy a huge file (say over 30GB), keep in mind that the ARC cache within RAM is being stored. This means that when creating those 30GB, the speed is very good (RAM speed), but when it starts getting pushed to drives, you can hit a bottleneck at some point;
  3. the ZFS is not a "regular" file system. You may have compression enabled. It also needs to create checksums for the data within system. These factors may also affect performance.
Try doing the following, to debug the issue:
  1. try increasing the ARC size. By default it's 50% of all RAM available. Try, say, 8GB only. See how it affects the process. If the outcome is worse, try increasing it to 80GB and see if the results are better;
  2. try changing ZFS settings. Try removing the compression / checksums. See if you get a better result;
  3. try copying data on your SSDs on a non-ZFS system and see if you hit a speed drop at any point. If you do - it means that the SSD buffer is getting full and will slow down regardless of the file system.
 
Last edited:
@Vladimir Bulgaru
This is no one file 30Gb size - this is 6 ISO-images from 4 to 6 Gb each. Again, I don't see any performance trouble on XFS or EXT4 filesystems on this disks (SSD, SAS or clear SATA without any RAID). I see trouble with speed on 6 different PC (5 servers and one desktop PC from my previus post). Repeat - trouble only on ZFS and LVM filesystems and ONLY inside VMs - on the Proxmox-host have not this trouble even on ZFS. Also if I make EXT4 filesystem from LVM disk (lvcreate -l 50G test-lv my-VG-for-VM) and after: mkfs.ext4 /dev/mapper/test-lv, mounted it as /EXT4 and after add this on Proxmox as "Directory" and make disk for VM from this EXT4-Directory - there is will not trouble with speed inside VM.
I think, this is a some BUG of Proxmox.
@Angelo
I tested ZFS without any RAID controller - trouble with speed are same.
 
I'll try to map the whole data transfer path (if i miss something, i hope the community will fill in the gaps:
  1. When you initiate a VM to VM copy (on the same server), the data is requested from the drives. I expect there are no read bottlenecks, regardless of the system. The only limitation is the speed of the drives and controller throughput, but since you get good speed initially, there should be no limitations there
  2. The data is sent to the local (server) network via the VirtIO network adapter. As far as i know, this is the fastest and most performant you can get, so i expect no bottlenecks here either.
  3. The data is routed via Proxmox to another VM. I foresee some heavy CPU usage here, but given your setups, no bottlenecks should occur here.
  4. The data is received by the recipient VM via VirtIO network adapter. Again, most performant you can get, so i expect no bottlenecks here either.
  5. If it's ZFS, i assume it starts filling the ARC cache with the data to be written to drives. Depending on the internal workings of ZFS, the data will be compressed and checksums will be generated. This will be a very CPU intensive operation, but i assume this should not be a problem (although it'd be good to know if the speed drops happen right after one ISO is copied). I'd try disabling compression and checksums here just to make sure this is not the reason for the speed drop due to CPU bottlenecking.
  6. The data is being pushed through the controller. Depending on the setup and controller type, you may get a bottleneck here. There are several causes:
    1. the controller performance (basically it's a mini-computer of its own and has a CPU and may have cache)
    2. the queue of operations (for instance, ZFS can try pushing the data to many drives and the controller can "choke" at this point)
    3. the controller cache can perform in a weird manner. I notice that at least 2 of the 3 setups use controllers with cache. Although the size is too small to manifest itself after several GB of writes, it can still create weird occurrences of speed drop.
  7. The data is pushed from the controller to drives. At least for the first setup with 10k SAS drives you should have a couple hundreds of MB speed for sequential writes overall, if you're using RAID10 on ZFS.
From this i'd map the following to-do:
  1. Test out the ZFS tuning options. I'd remove the checksums and compression just to be certain it's not the cause.
  2. I'd experiment with a pass-through controller (no cache).
  3. I'd experiment with the most basic configuration possible - create a clean Proxmox instance, only 4 SAS drives in ZFS RAID10 and see how the system fairs.
As a different trail of thought:
  1. I just noticed that you are not attempting to copy files from VM to VM on the same Proxmox instance, but rather from the external storage to VM. Keep in mind that your physical network card, the router or the switch can cause bottlenecks.
  2. Moreover, the physical storage itself can also perform differently based on the hardware and cause bottlenecks.
The best way to make sure that Proxmox is not the cause is simply to try copying data from a VM to VM. If it goes smoothly, the issue may be networking related.
 
prob not the best recommendation but really speeds things up without needing to have ssd
Code:
zfs set sync=disabled rpool
then reboot
 
Hi to all,

I can guess that the problem is only guest related. Olso until some point @Vladimir Bulgaru have explain a possible cause.

"vm.dirty_ratio is the absolute maximum amount of system memory that can be filled with dirty pages before everything must get committed to disk. When the system gets to this point all new I/O blocks until dirty pages have been written to disk. This is often the source of long I/O pauses, but is a safeguard against too much data being cached unsafely in memory."

See here https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/

Maybe your old PMX version have a different default values for dirty_ratio and dirty_background_ratio compared with PMX 5.x. And also diiferent ram size could make things differents.

As a basic ideea regarding storage/disk, keep at minimum the storage layers: zfs, vdisk, LVM and FIleSytem. For sure is better to eliminate LVM layer. I do not use any LVM inside a guest, only the partions and file systems.

Sorry for my bad English, but maybe our co-forum friend @Vladimir Bulgaru will be kindly to translate for you in Russian.

Good luck.
 
But, if I do zpool from this one SSD - speed down after around 30Gb copy. On two SSD with ZFS raid-1 (zpol create mirror) - around 15Gb.

Because like I said this "dirty" settings are global. So 30 Gb / system with 1 ssd is the same with 15 Gb x 2 ssd = 30 / system.

good luck
 
There are no speed problems on the Proxmox server itself. I copy from the same file server the same iso-image in /zfs-pool - the speed is 70-80 MB per second and is kept stable.


Because zfs do not use the kernel cache system but any guest with lvm use the kernel guest system cache.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!