Proxmox newbee question performance LVM-Thin vs ZFS

boumacor

New Member
Dec 19, 2023
12
0
1
For the last couple of months I've been using proxmox to test a bit running linux containers and I'm very happy with the software, it's amazing.

After reading online the difference between zfs (what I default used) and LVM-Thin I've started to test LVM-Thin on my system, but the results I don't understand, I'm sure I'm thinking / doing something wrong and need help.

Useing a i5-7500 Desktop, 24 GB memory 256 GB STA SSD (Proxmox) and a 500GB Hitachi SATA Harddisk. Installing 8.1.3 on the sata SSD and useing the HDD for the CT's.

When I use the HDD as a LVM-Thin storage, create a ubuntu 22.04 CT and run a test ==> dd if=/dev/zero of=file.out bs=1024 count=1000000 oflag=direct the write speed results are between 5.5 and 5.6 MB/s. When I read the data back ( dd if=file.out of=/dev/null bs=1024 ) the results are between 56 MB/s and 84 MB/s. This seems very slow.

After removing the LVM-Thin and creating a ZFS (single disk, no compression and default ashift 12 and changeing /etc/modprobe.d/zfs.conf to options zfs zfs_arc_min=128 options zfs zfs_arc_max=1024) I've tryed again. Same test, different numbers, writeing 205 MB/s and reading 346 MB/s. I could expect a bit of difference, but this seems very strange

Where do I go wrong ?
 
Same test, different numbers, writeing 205 MB/s and reading 346 MB/s. I could expect a bit of difference, but this seems very strange
Well, this seems strange as a 500GB HDD would not achieve 346MB/s. Even if this is a newer SATA3 HDD that could go >300MB/s interface wise, a 500GB platter has not the platter density to achieve such values.
For me, this seems like you are benching some cache.
 
Last edited:
Would make some sense with default cache settings. Thats why I used different settings in zfs.con. But why is LVM-Thin so slow compared to ZFS ?
 
Still using the same ARC settings for cache. Results below. This are the results :

Write : fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1
Read : fio --name=random-read--ioengine=posixaio --rw=randread --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1

ZFS :
WRITE: bw=4626KiB/s (4737kB/s), 4626KiB/s-4626KiB/s (4737kB/s-4737kB/s), io=279MiB (292MB), run=61740-61740msec
READ: bw=139MiB/s (145MB/s), 139MiB/s-139MiB/s (145MB/s-145MB/s), io=8316MiB (8720MB), run=60001-60001msec

LVM-Thin :
WRITE: bw=3176KiB/s (3252kB/s), 3176KiB/s-3176KiB/s (3252kB/s-3252kB/s), io=313MiB (329MB), run=101077-101077msec
READ: bw=549KiB/s (563kB/s), 549KiB/s-549KiB/s (563kB/s-563kB/s), io=32.2MiB (33.8MB), run=60003-60003msec

Any thoughts ?
 
Still using the same ARC settings for cache. Results below. This are the results :

Write : fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1
Read : fio --name=random-read--ioengine=posixaio --rw=randread --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1

ZFS :
WRITE: bw=4626KiB/s (4737kB/s), 4626KiB/s-4626KiB/s (4737kB/s-4737kB/s), io=279MiB (292MB), run=61740-61740msec
READ: bw=139MiB/s (145MB/s), 139MiB/s-139MiB/s (145MB/s-145MB/s), io=8316MiB (8720MB), run=60001-60001msec

LVM-Thin :
WRITE: bw=3176KiB/s (3252kB/s), 3176KiB/s-3176KiB/s (3252kB/s-3252kB/s), io=313MiB (329MB), run=101077-101077msec
READ: bw=549KiB/s (563kB/s), 549KiB/s-549KiB/s (563kB/s-563kB/s), io=32.2MiB (33.8MB), run=60003-60003msec

Any thoughts ?
Not right away, but I would also make, out curiosity, the same test from the host. I.e. I would create a volume in that thin pool just so I can benchmark it (not running inside a CT). It is odd indeed. But this last at least does not give you arbitrarily high numbers.
 
@tempacc346235 Sure, good idea. I've created a Thin-LVM (called local500GB) and it's added to the Proxmox storage overview. Created ext4 fs and mounted the filesystem. Here are the results :

WRITE: bw=60.8MiB/s (63.7MB/s), 60.8MiB/s-60.8MiB/s (63.7MB/s-63.7MB/s), io=4096MiB (4295MB), run=67398-67398msec
READ: bw=485KiB/s (497kB/s), 485KiB/s-485KiB/s (497kB/s-497kB/s), io=28.4MiB (29.8MB), run=60011-60011msec

The write speed is much faster, and the read speed is slower. For me doesn't make any sense. There is nothing running on the machine, just al clean install. Any ideas ?
 
It just feels like we are missing something and looking holistically at this. Can you show lsblk? Do you have any non-LVM (at least non-thin) partition where you can benchmark that drive "raw"? It would be interesting to know if it's the physical drive that is slow or something in the layers make it so.
 
This is the result of lsblk :
root@pve03:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 465.8G 0 disk
├─local500GB-local500GB_tmeta 252:4 0 4.7G 0 lvm
│ └─local500GB-local500GB 252:7 0 456.3G 0 lvm
└─local500GB-local500GB_tdata 252:5 0 456.3G 0 lvm
└─local500GB-local500GB 252:7 0 456.3G 0 lvm
sdb 8:16 0 238.5G 0 disk
├─sdb1 8:17 0 1007K 0 part
├─sdb2 8:18 0 1G 0 part /boot/efi
└─sdb3 8:19 0 237.5G 0 part
├─pve-swap 252:0 0 8G 0 lvm [SWAP]
├─pve-root 252:1 0 69.4G 0 lvm /
├─pve-data_tmeta 252:2 0 1.4G 0 lvm
│ └─pve-data 252:6 0 141.2G 0 lvm
└─pve-data_tdata 252:3 0 141.2G 0 lvm
└─pve-data 252:6 0 141.2G 0 lvm
root@pve03:~#

the sdb is a 250 GB SSD with proxmox and the sda is the 500GB Sata harddisk.
 
Code:
root@pve03:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 465.8G 0 disk 
├─local500GB-local500GB_tmeta 252:4 0 4.7G 0 lvm 
│ └─local500GB-local500GB 252:7 0 456.3G 0 lvm 
└─local500GB-local500GB_tdata 252:5 0 456.3G 0 lvm 
└─local500GB-local500GB 252:7 0 456.3G 0 lvm 
sdb 8:16 0 238.5G 0 disk 
├─sdb1 8:17 0 1007K 0 part 
├─sdb2 8:18 0 1G 0 part /boot/efi
└─sdb3 8:19 0 237.5G 0 part 
├─pve-swap 252:0 0 8G 0 lvm [SWAP]
├─pve-root 252:1 0 69.4G 0 lvm /
├─pve-data_tmeta 252:2 0 1.4G 0 lvm 
│ └─pve-data 252:6 0 141.2G 0 lvm 
└─pve-data_tdata 252:3 0 141.2G 0 lvm 
└─pve-data 252:6 0 141.2G 0 lvm 
root@pve03:~#

Can you also post lvs and lvdisplay result (enclosed in [ CODE ] ... [ /CODE ] tags)?
 
Sure, no any problem :


root@pve03:~# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert local500GB local500GB twi-a-tz-- 456.32g 0.00 0.38 data pve twi-a-tz-- 141.22g 0.00 1.13 root pve -wi-ao---- 69.37g swap pve -wi-ao---- 8.00g root@pve03:~# lvdisplay --- Logical volume --- LV Name data VG Name pve LV UUID E45q3D-o04B-ZXby-T3JP-zRoD-mRU9-WuROF0 LV Write Access read/write LV Creation host, time proxmox, 2023-12-19 11:35:36 +0100 LV Pool metadata data_tmeta LV Pool data data_tdata LV Status available # open 0 LV Size 141.22 GiB Allocated pool data 0.00% Allocated metadata 1.13% Current LE 36153 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:6 --- Logical volume --- LV Path /dev/pve/swap LV Name swap VG Name pve LV UUID GcbW11-x2OS-Befo-wWVA-NY2W-zGFZ-HXJV41 LV Write Access read/write LV Creation host, time proxmox, 2023-12-19 11:35:33 +0100 LV Status available # open 2 LV Size 8.00 GiB Current LE 2048 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:0 --- Logical volume --- LV Path /dev/pve/root LV Name root VG Name pve LV UUID fxgNPq-u8np-gA2i-fUPO-mWOg-I7Wl-8xvvPQ LV Write Access read/write LV Creation host, time proxmox, 2023-12-19 11:35:33 +0100 LV Status available # open 1 LV Size 69.37 GiB Current LE 17759 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:1 --- Logical volume --- LV Name local500GB VG Name local500GB LV UUID Xz5GAq-Y99H-0JPf-GQwa-SfGY-ypLu-ec8zL2 LV Write Access read/write LV Creation host, time pve03, 2023-12-28 11:08:58 +0100 LV Pool metadata local500GB_tmeta LV Pool data local500GB_tdata LV Status available # open 0 LV Size 456.32 GiB Allocated pool data 0.00% Allocated metadata 0.38% Current LE 116818 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:7 root@pve03:~#
 
The write speed is much faster, and the read speed is slower. For me doesn't make any sense. There is nothing running on the machine, just al clean install. Any ideas ?
write cache. if you really want to bypass it for your benchmarks, use -o direct as an argument for fio. Bear in mind that on an empty drive writing to disk is a simple matter for the fs, and will yield the best results. this will change as the file system fills up.

all that said, you're benchmarking a single hdd spindle (of low density at that.) it doesnt matter how you test, the disk is slow.
 
write cache. if you really want to bypass it for your benchmarks, use -o direct as an argument for fio. Bear in mind that on an empty drive writing to disk is a simple matter for the fs, and will yield the best results. this will change as the file system fills up.

all that said, you're benchmarking a single hdd spindle (of low density at that.) it doesnt matter how you test, the disk is slow.

But he has ... READ: bw=485KiB/s ... this is not a speed even a tape has.
 
But he has ... READ: bw=485KiB/s ... this is not a speed even a tape has.
What do you expect when running a 4K random read test? Thats 485KB/s / 4K IO = 121 IOPS. As fast as a non-10/15K-RPM-2.5"-HDD gets when hitting it with random IO. Most of the time the heads are flying around seeking sectors.

Wiki:
1703791124216.png
 
Last edited:
  • Like
Reactions: alexskysilk
thats for a 4k random read job. like I said, slow. this is why you dont host databases on spinning disks lol.

As for tape- tape cant take this workload. tape can only take sequential.
I do not want to add more noise to this (I am still thinking myself what might be wrong there), but fair enough (I can take the nitpicking) tapes are sequential, the thing is he has already --end_fsync=1 there, the direct would not have given him any better numbers. 0.5M/s is not a normal SATA read speed not even for randread.
 
the direct would not have given him any better numbers.
correct. it wouldnt improve anything, it would make it WORSE- which is the point; without caching a hdd is near useless.

0.5M/s is not a normal SATA read speed not even for randread.
Sata is the bus. the bus speed is the speed LIMIT of whats can theoretically travel over it, not the required speed. @Dunuin was more clear as to the reason why you're seeing that speeds, and that it is the expected result.
 
correct. it wouldnt improve anything, it would make it WORSE- which is the point; without caching a hdd is near useless.

i just meant to say end_fsync gives him more real life result

Sata is the bus. the bus speed is the speed LIMIT of whats can theoretically travel over it, not the required speed. @Dunuin was more clear as to the reason why you're seeing that speeds, and that it is the expected result.

i am dropping out words tonight for some reason, I wanted to sat SATA HDD, basically I would expect something like 100M/s sequential and several M/s random from a spinning drive (without cache).

@boumacor Can you do --rw=read the same just so we have this out of the way?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!