What is the best file system for ProxMox?

Rocha Neto

Member
Nov 10, 2016
25
0
21
40
/home
Hello everyone.

My environment has the following specifications.

Intel (R) Xeon (R) CPU E3-1240 V2
2 x 1 TB SATA (Raid 1 Soft)
1 x 1 TB SATA (/backup)

LVM was created, and included successfully in ProxMox.

When creating and testing VMs made CTs showed an average of 180 MB/s using dd.

I was very disappointed with this outcome. Created 4 CTs and faked a test on all at the same time. Obviously the value of 180 mb/s was divided between the 4 CTs and the result was more disheartening.

Thinking of solving the problem, I added more 2 x 512 GB SSD (Raid 1 Soft)

I did the following tests:

The/dev/md3 disk had the partition ext4, and added the LVM in proxmox. I created a CT this LVM and test result dd was 340 mb/s.

I detected so that if we can't do LVM snapshot.

I decided to then change the LVM to LVM-Thin. That way it would be possible to get the snapshot. But to do the test with a new CT this LVM-thin, I get the frustrating result of 270 mb/s. I am sure that this value will decrease when add new VMs or CTs.

Given this scenario, what friends recommend?
What is the best file system should I use for a read/write speed better?
Should use the SSD to cache the SATA?
Should use ZFS?

Thank you for the opportunity. I count on your help.
 
First, dd is not a benchmark. Please test with a decent tool like fio. Second, throughput (as you 'measured') is not relevant for most virtualization purposes, whereas IOPS (I/O operations per second), which are only higher if you use more spinning disks or SSDs.

Without knowing your goal, it is hard to recommend something. Also, having mdadm-based software raid, you will not get support from the Proxmox company when you buy subscriptions.
 
  • Like
Reactions: Rocha Neto
First, dd is not a benchmark. Please test with a decent tool like fio. Second, throughput (as you 'measured') is not relevant for most virtualization purposes, whereas IOPS (I/O operations per second), which are only higher if you use more spinning disks or SSDs.

Without knowing your goal, it is hard to recommend something. Also, having mdadm-based software raid, you will not get support from the Proxmox company when you buy subscriptions.

Thank you very much for your reply.

I have disabled mdadm. I created 2 partitions for use of SSDs.

Could you recommend me what tool to perform the tests? Whereas dd is not benchmark.
 
Please use hardware raid or ZFS with their software raid (that is supported), do not use single point of failures.

Was I already said, use fio.

Using ZFS with RAID 1 (2x1TB SATA) with cache (1x500GB SSD)

Create one CT and test with FIO

[root@CT100 fio-2.0.14]# fio random-rw.fio
random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.0.14
Starting 1 process
random_rw: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [m] [94.9% done] [13806K/13706K/0K /s] [3451 /3426 /0 iops] [eta 00m:06sJobs: 1 (f=1): [m] [94.9% done] [0K/0K/0K /s] [0 /0 /0 iops] [eta 00m:06s] Jobs: 1 (f=1): [m] [98.3% done] [15628K/15196K/0K /s] [3907 /3799 /0 iops] [eta 00m:02sJobs: 1 (f=1): [m] [99.1% done] [4364K/4448K/0K /s] [1091 /1112 /0 iops] [eta 00m:01s] Jobs: 1 (f=1): [m] [99.1% done] [0K/0K/0K /s] [0 /0 /0 iops] [eta 00m:01s]
random_rw: (groupid=0, jobs=1): err= 0: pid=2092: Fri Nov 11 21:14:02 2016
read : io=524704KB, bw=4576.1KB/s, iops=1144 , runt=114642msec
clat (usec): min=1 , max=94476 , avg= 8.16, stdev=367.55
lat (usec): min=1 , max=94476 , avg= 8.23, stdev=367.55
clat percentiles (usec):
| 1.00th=[ 2], 5.00th=[ 2], 10.00th=[ 3], 20.00th=[ 3],
| 30.00th=[ 3], 40.00th=[ 3], 50.00th=[ 3], 60.00th=[ 4],
| 70.00th=[ 4], 80.00th=[ 6], 90.00th=[ 7], 95.00th=[ 19],
| 99.00th=[ 40], 99.50th=[ 44], 99.90th=[ 69], 99.95th=[ 91],
| 99.99th=[ 1400]
bw (KB/s) : min= 184, max=18966, per=100.00%, avg=6202.98, stdev=4922.43
write: io=523872KB, bw=4569.7KB/s, iops=1142 , runt=114642msec
clat (usec): min=5 , max=5670.6K, avg=865.17, stdev=33512.17
lat (usec): min=5 , max=5670.6K, avg=865.26, stdev=33512.17
clat percentiles (usec):
| 1.00th=[ 6], 5.00th=[ 6], 10.00th=[ 7], 20.00th=[ 8],
| 30.00th=[ 8], 40.00th=[ 9], 50.00th=[ 9], 60.00th=[ 9],
| 70.00th=[ 10], 80.00th=[ 15], 90.00th=[ 18], 95.00th=[ 26],
| 99.00th=[ 52], 99.50th=[ 57], 99.90th=[45312], 99.95th=[675840],
| 99.99th=[1302528]
bw (KB/s) : min= 190, max=18441, per=100.00%, avg=6192.81, stdev=4901.26
lat (usec) : 2=0.01%, 4=29.05%, 10=50.21%, 20=13.76%, 50=6.15%
lat (usec) : 100=0.71%, 250=0.04%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.02%, 750=0.01%, 1000=0.01%
lat (msec) : 2000=0.02%, >=2000=0.01%
cpu : usr=0.31%, sys=1.91%, ctx=1134, majf=4, minf=6
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=131176/w=130968/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
READ: io=524704KB, aggrb=4576KB/s, minb=4576KB/s, maxb=4576KB/s, mint=114642msec, maxt=114642msec
WRITE: io=523872KB, aggrb=4569KB/s, minb=4569KB/s, maxb=4569KB/s, mint=114642msec, maxt=114642msec

What you say about this?

Thanks
 
Again, without your goal for your server, there are no suggestions.

You use 2 SATA disks and a random read/write test, such that the cache is not affective at all (random read).

You could go with ZFS (RAID1) on SSD for your data/containers and ZFS (RAID1) on 2 SATA for your backups. Compression should be enabled on all disks such that the 512 GB is not that slow. You can also combine the storage slow and fast for data that is not read quite often or is huge (e.g. movies, images, etc.)
 
  • Like
Reactions: Rocha Neto
Again, without your goal for your server, there are no suggestions.

You use 2 SATA disks and a random read/write test, such that the cache is not affective at all (random read).

You could go with ZFS (RAID1) on SSD for your data/containers and ZFS (RAID1) on 2 SATA for your backups. Compression should be enabled on all disks such that the 512 GB is not that slow. You can also combine the storage slow and fast for data that is not read quite often or is huge (e.g. movies, images, etc.)

Very thanks.

I wish host VPS for some customers. This will be using.

Putting the 1 SSD as cache and 1 SSD for the Logs, I got a very good performace.

I enabled compression, disable sync. It is a good practice?
 
No, that's a very bad practice if you care for your data and makes your log device useless. When using a log device you should use sync standard or always.

Enable to standard now. Thanks.

zfs set compression=on storage
zfs set sync=standard storage
zfs set primarycache=all storage
zfs set atime=off storage
zfs set checksum=off storage
 
Again, without your goal for your server, there are no suggestions.

You use 2 SATA disks and a random read/write test, such that the cache is not affective at all (random read).

You could go with ZFS (RAID1) on SSD for your data/containers and ZFS (RAID1) on 2 SATA for your backups. Compression should be enabled on all disks such that the 512 GB is not that slow. You can also combine the storage slow and fast for data that is not read quite often or is huge (e.g. movies, images, etc.)

I tried to go in front, but I did not succeed.

My current scenario is the same.

pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
storage 49.4G 855G 37 150 145K 1.08M
mirror 49.4G 855G 37 138 145K 429K
sda6 - - 12 16 110K 724K
sdb6 - - 12 16 109K 724K
logs - - - - - -
sde1 356K 464G 0 12 285 681K
cache - - - - - -
sdd1 1.54G 464G 0 8 39 424K
---------- ----- ----- ----- ----- ----- -----

logs and cache are ssd disks.

I am testing with a VM writing a 200 GB file. My problem is that when the writing isgoing on I suffer with overload. The writing process consumes all server processing (E3-1270 8 x 3.40 Ghz)

And when I use "zpool iostat 2" writing no more than 2 MB/s

I would like to use this environment to host VPS for my clients site hosts and applications.

Could you help me to see how I could improve it?

Thank you.
 
You have not enabled deduplication by chance?

You can test the io delay with iostat -x 5 while writing to determine if there is a bad disk.
 
You have not enabled deduplication by chance?

You can test the io delay with iostat -x 5 while writing to determine if there is a bad disk.

Thanks for your reply. But I gave up ZFS. When writing to disk the processor went up to heaven. I tried every possible tunnings but nothing work.

So, in my scenario that is SATA (2x 1TB Raid 1) + SSD (2x 512GB Raid 0), I went to a fight to test other ways to get abetter result.

I found, and I would like to share, for those who want to have a scenario like mine,which uses SSD to improve performance of mechanical disks.

I tested the bcache, which utilizes the SATA and SSD and creates another device.With the device merged, I created a LVM and became thinpool. The result was verygood, but when there was writing on the disk it also caused overload on the serverconsuming processing.

It was then that desconsiderei the bcache and used the dm-cache or lvmcache. It creates a VG with SATA disks and SSD. Created then 3 LVs, date, cache and cache.The result was surprising.

The lvmcache gives me an amazing writing, performance and the load is no more than 10% on the processor. Really fantastic.

Thanks for the help. I believe now I found something that gives me a perfect performance to virtualize VMs for my hosting clients.

In this scenario, there is only one thing I won me 100% happy, is that if I turn my lvmcache in thinpool I lose a lot of performance. No thinpool I don't have how to make snapshots only backups.

It would be great to offer snapshots of my vps hosting clients.
 
I also tried bcache, flashcache and lvmcache - all with similar performance. lvmcache booted dead slow and every time I needed to restart my machine, the cache is rewritten to disk. This was a big performance killer for me. Have you similar experience?

I still can't believe you're going to have fun with 2 (!) spinning disks. Building a SSHD with 1 TB data and 512 GB cache is a total waste - just buy another 2 SSDs and use a RAID5 on 4 SSDs(with or without ZFS), but don't use only 2 spinning disks. For decades, one uses as much spinning disks as possible to get performance, this has not changed with SSDs in general, but the high watermark is much higher.
 
I also tried bcache, flashcache and lvmcache - all with similar performance. lvmcache booted dead slow and every time I needed to restart my machine, the cache is rewritten to disk. This was a big performance killer for me. Have you similar experience?

I still can't believe you're going to have fun with 2 (!) spinning disks. Building a SSHD with 1 TB data and 512 GB cache is a total waste - just buy another 2 SSDs and use a RAID5 on 4 SSDs(with or without ZFS), but don't use only 2 spinning disks. For decades, one uses as much spinning disks as possible to get performance, this has not changed with SSDs in general, but the high watermark is much higher.

I haven't experienced this problem.

Actually are 2 x 1 TB SATA and 2 x 512 GB SSD. In LVM Cache

I used the following setting.

LV 900GB date (rotating Discs)
LV 900GB cache (SSD Drives)

To start, I will use this scenario, at least for now. My supplier does not have high-level SSDs disks. These discs that I have on my server are Samsung 850 Evo. He hasa horrible writing. And the cost is high to insert more discs. And it is worth mentioning that I'm afraid of not getting a favorable outcome, taking into account the quality of the SSDs from my supplier.

Could you explain better the scenario with 4 SSDs?

PS: I have still 1 more 1 TB sata disk for backups
 
These discs that I have on my server are Samsung 850 Evo.

You will not have a lot of fun with the evo's. Please monitor your health via smartctl and check in the disks tab of the host in the Proxmox VE gui.

Could you explain better the scenario with 4 SSDs?

Just use ZFS in mirroring mode or zRAID-1. I use 6 ssds in mirror mode:

Code:
root@proxmox4 ~ > zpool status -v
  pool: rpool
state: ONLINE
  scan: scrub repaired 0 in 0h42m with 0 errors on Wed Oct 12 07:52:54 2016
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdb2    ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0

errors: No known data errors


root@proxmox4 ~ > zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool  2,60T   483G  2,13T         -    11%    18%  1.00x  ONLINE  -
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!