Performance issues with OpenMediaVault VM (ZFS/LUKS...)

depart

Active Member
May 8, 2019
11
0
41
54
Hi

For a home server/nas I'm using the latests versions of Proxmox (5.4) and OMV (4.1.22-1) on recent hardware (core i3-8100, 16gb of ram, installed on a ssd...). I have only one 8TB hard-drive with no raid configuration for my data storage.
I use my previous server (intel Atom J1900, 8GB of ram, an old 1TB hard-drive and another identical 8TB drive) as a second node of a cluster as a kind of backup solution to be able to continue to work in case of a failure on my main system (It powers itself on every night to synchronize [ZFS replication] with the main node and shuts itself off when finished).

As I want encryption on my data hard drive I've done the following things :
On the proxmox host :
- created a LUKS volume on that drive
- created a ZFS volume on the opened LUKS volume
- passed a Virtio SCSI volume of almost all the available space to my OMV VM
On my OMV VM :
- formated the space with ext4

For the record here are the commands used to create the thing:
cryptsetup luksFormat -c aes-xts-plain64 -s 512 -h sha512 /dev/disk/by-id/ata-ST8000DM004-2CX188_WCT123456
cryptsetup luksOpen /dev/disk/by-id/ata-ST8000DM004-2CX188_WCT123456 8to2_chiff
zpool create -o ashift=12 -O normalization=formD -O atime=off -m none -R /mnt -O compression=lz4 tank /dev/mapper/8to2_chiff



The whole thing works ok... but I have huge performance issues, whatever number of cores, ram, cache system I use for the VM/volume passed to the VM.
When I transfer files (mostly using samba) from my main computer to the OMV shared drive, I reach 113 MBs the first few seconds, then after a while the transfer slows down to few MBs, sometimes even stops for few seconds...
I always do my tests with a big 4GB file (and not with many little files as I know it affects the transfer speed).

The CPU usage increases a lot and the E/S delays goes to the roof on the proxmox host. It affects the whole system as for example sometimes a simple "ls" command in a ssh terminal is slowed down if I'm transfering files to the OMV VM at the same times.

I've made a lot of tests, trying to find the bottleneck without finding it clearly:
- the lan and network cards are not a problem: I can transfer to the full 1Gbs speed from many different computers, and even to the proxmox host itself
- the encryption does not seem to be the problem: AES_NI acceleration is ok on this hardware and it is also activated on the VM (VM CPU type: host).
- samba is not the problem: I tried to make some "local copies" between the proxmox host and the vm (scp) with the same kind of problems. I also installed samba on the host and tried to write to the ZFS space through my network with great performances.
- the OMV VM is probably not the problem by itself as the performances are still bad if I mount the VM container (as in "mount /dev/tank/vm-107-disk-0 /mnt/temp") and tries to write files to the mounted space. The results are very inconsistent: if I time a cp command, for the same 4GB file it can take as little as 38 seconds (great: 100MBs) up to 1min32 (really bad: 41MBs)
- The hard drive is not the problem: I can write to 100MBs without any problem

So I'm a bit confused, I'm about to move all my files directly on the ZFS volume on the proxmox host and share the files through samba installed on this host instead of the OMV method but I will loose the automatic ZFS VM replication to my second node I like so much (so fast).

Do you have ideas? Suggestions? What should I try?
 
Last edited:
Are you running LUKs on the machine with the J1900 CPU ?
As i'm not sure that CPU has the AES-NI that speeds up encryption and decryption LUKS can use.....
 
The J1900 is not the "main server", only a second node just used for backup in case the first node crashes. The J1900 node is powered off 99% of the time
 
The J1900 is not the "main server", only a second node just used for backup in case the first node crashes. The J1900 node is powered off 99% of the time
Sorry reread the part where you mount the VM volume and it "inconsistent"

Have you looked at htop, atop, iowait or iostat while doing the transfer ? Is it pinning the CPU core (htop) or waiting for disk (atop)
 
screenshot of atop (on the proxmox host) during a copy from myc pc to the samba share of the OMV VM
 

Attachments

  • atop.png
    atop.png
    39 KB · Views: 38
Last edited:
screenshot of atop (on the proxmox host) during a copy from myc pc to the samba share of the OMV VM
Wow look at the TAP interface go !

Is that disk ZFS by any chance ? If so, are you able to test without compression, and also triple checking ur ashift=12 setting m
 
the tap interface seems to be a "false positive", yes it says 2799% but... of a 10 Mbps interface! which is in reality 279Mbps. On a 1Gbps interface it is not that much.

The disk is LUKS>ZFS then inside is a virtio SCSI volume passed to the OMV VM then formatted in ext4 inside the VM.

I have the same bottleneck (without the tap107i problem) if I mount the ext4 VM volume on the proxmox host itself and copy files from or to this disk.

On the opposite side, if I copy a file to the ZFS "free space" on the hard drive (on the proxmox host, in the same folder (/dev/tank) where the "vm-107-disk-0" is, I have zero performance problem.

ashift is ok (zpool get all | grep ashift --> 12)
I don't know for the compression, but as I said, if I read and write directly on the pool, there is no performance problem.
 
Last edited:
On the opposite side, if I copy a file to the ZFS "free space" on the hard drive (on the proxmox host, in the same folder (/dev/tank) where the "vm-107-disk-0" is, I have zero performance problem.

Hi,
It is normaly what you see(zvol performance < zfs dataset)!
It is not so clear to me how do you "passed a Virtio SCSI volume of almost all the available space to my OMV VM"(so PMX server is local but OMV is remote?) ?
 
Last edited:
I'm sorry if I used the wrong words... it's not exactly easy to understand all the imbricated layers on that disk :)
By "passed a Virtio SCSI volume of almost all the available space to my OMV VM" I mean that I created a volume for the OMV VM using the regular way :
In the VM "hardware" settings, I chose SCSI as the type, which uses the "VirtIO SCSI" controller, specified the storage location (tank) and set a size "7400" GB in my case.
That is the volume (/dev/tank/vm-107-disk-0) passed to the VM
Everything is local.
 
By "passed a Virtio SCSI volume of almost all the available space to my OMV VM" I mean that I created a volume for the OMV VM using the regular way :
In the VM "hardware" settings, I chose SCSI as the type, which uses the "VirtIO SCSI" controller, specified the storage location (tank) and set a size "7400" GB in my case.
That is the volume (/dev/tank/vm-107-disk-0) passed to the VM
Everything is local.


Now is clear almost like crystal ;) !

The swcond question: most of the time I guess you use samba share(provided by OMV guest), so how is your data usage pattern(you read/write big files, small or is mixed)?
 
Last edited:
Yes most of the time I use the samba share on the OMV (not OVH :) ) VM.
For performance tests purposes I always use big files. As I explained in the first post I mostly use one 4GB file.
For example when I was writing to a samba share on my old J1900 computer which was using Windows (before becoming a proxmox node) and the same kind of "samba" share I had constant 100MBs write speeds on that 8TB drive (NTFS at that time). Now on a much fastest hardware (4x CPU performance and 2x RAM), even with the virtualization layers I'm a bit disappointed to see such bad and inconsistent performances.

I'm actually in the process of moving the whole content of the "vm-107-disk-0" volume straight to the ZFS space and I will use samba on the host...
 
Last edited:
For example when I was writing to a samba share on my old J1900 computer which was using Windows and the same kind of "samba" share I had constant 100MBs write speeds on that 8TB drive (NTFS at that time)

Do not compare oranges with appels! You have add encryption layer, so is not the same thing.

For performance tests purposes I always use big files. As I explained in the first post I mostly use one 4GB file.

I have ask something else(I understand that you make some tests now, but how will be after you fix the performance)?

You can create a CT instead of VM, an in this CT you can install samba-server - this will be better then in any VM!
 
Do not compare oranges with appels! You have add encryption layer, so is not the same thing.



I have ask something else(I understand that you make some tests now, but how will be after you fix the performance)?

You can create a CT instead of VM, an in this CT you can install samba-server - this will be better then in any VM!

(I didn't specified it but) My NTFS drive was encrypted using Bitlocker before, and the processor didn't have AES_NI instructions !

For real world performance (sorry I hadn't understood your question), I face almost every kind of situations, for example I often make sync backups of photos from my main computer to that "NAS" through samba, that means a lot of reads (for comparison purposes) and a lot of write of sometimes big (42 megapixels RAW files) sometimes small (thousands of tiny exported jpegs : thumbnails, 800px wide versions...). I also transfer video projects and files, so lots of read/writes too on big files.
I store on this space lots of backups coming from other servers, some are simple rsyncs of web server content (php, js, css...), some are zipped dumps of mysql databases or even sometimes raw image of the SD card of a raspberry pi.
So real world is just... everything, it depends of the day and time :)

For CT vs VM, yes, that would be a better solution, I will probably create a few of them for a nextcloud instance and some other stuff to replace some OMV services) but OMV requires a fully functional VM and doesn't appear to work properly in a container.
 
Last edited:
... and for CT case:

- disable compression(any encrypted block is not compressible) on zfs
- set on zfs logbias = thruoutput(insted of defaul = latency)
- use noatime in /etc/fstab inside the CT
- if most of the time you will use bigger files, then you coud use zfs recordsize = 1 M, insted of 128k(default), or maybe better use 2 vdisk in CT, one for bigger files(1 M), and the rest with 128k(small files).
 
... and for CT case:

- disable compression(any encrypted block is not compressible) on zfs
- set on zfs logbias = thruoutput(insted of defaul = latency)
- use noatime in /etc/fstab inside the CT
- if most of the time you will use bigger files, then you coud use zfs recordsize = 1 M, insted of 128k(default), or maybe better use 2 vdisk in CT, one for bigger files(1 M), and the rest with 128k(small files).

Compression could still be usefull as I plan to use a directory (passed to the CT) or a samba share from the host, the files will be stored natively on the ZFS volume. In fact I already discovered that the compression is useful. After my rsync from the OMV volume to the ZFS volume, the result is smaller, nice thing (something like 200GB on about 6TB). The LUKS volume is "outside" the ZFS pool, the LUKS volume has to be opened, then I can access the ZFS pool inside.

The performances are great now. I removed my OMV volume and VM and installed some other services on the other CT I have (miniDLNA, nextcloud...). Goodbye OMV :(
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!