Is zfs pool with none RAIDz more dangerous than ext4?

why-be-banned · Jul 20, 2023

I want to create a pool with two hard disks, I want to get as much space as possible, so I setted "raid level" to "Single Disk".
Is is more dangerous than ext4?

leesteken · Jul 20, 2023

The nice part of ZFS is that it will report to you when files are corrupted. Ext4 will silently give you the wrong bytes, which you might not notice until all your backups also contain the corrupted files. IMHO: ZFS single-drive is safer (and has more overhead) than ext4.

Dunuin · Jul 20, 2023

Also keep in mind that this is basically a raid0. So losing one disk and all data is gone. So 2x small disks in a raid0 is less relianbe than 1x big disk,

donrusso · Aug 9, 2024

I have one 18 TB HDD in PVE and the same HDD in PBS.

ECC RAM on PVE, so what is the best file system (performance) ZFS or ext4 ??

waltar · Aug 9, 2024

Ext4 is an outdated filesystem with limited filesystem size, limited inode numbers, no reflink block cloning support, no metadata special device support, regressing under parallel I/O load, inode generation take hours on raid volumes which mostly unknown to users as mkfs.ext4 promt comes back like ready, should go into museum as all these bad's could be fulfilled easily with xfs. Neverthenone ext3/4 is sensitive to power outages like zfs which I not seen with xfs, supported 15 years 5000 xfs on daily basis, so that's my enterprise filesystem. Bit rot detection is nice in zfs but it's not it's function like a power outage, it's a problem to the disks with give wrong data back like the power plant fail, checksum against bit rot is a workaround while not fixed where happend.

esi_y · Aug 9, 2024

donrusso said:
I have one 18 TB HDD in PVE and the same HDD in PBS.

ECC RAM on PVE, so what is the best file system (performance) ZFS or ext4 ??

XFS

waltar · Aug 9, 2024

If anybody wants checksums with xfs take an external storage systems like Netapp E-series or even much faster a DDN sfa with has 63 data blocks to 1 checksum block, so performance loss is 1/64 or could be disabled per schema set. Last one with all erasure coding raid1/5/6 schemas possible on disk groups ranging from 2-900 disks and virtual spares, it's like draid but much more flexible.

donrusso · Aug 9, 2024

ok, thank you all. So, in both, pve&pbs i will use XFS filesystem.

waltar · Aug 11, 2024

ZFS is a hightech filesystem which could not be denied but it has it's pros and even cons.
What I really really like in zfs is the ability to (d)raid(z) nvme's with really good r/w performance
while mdadm get's horrible write performance, read could be 30% better but that's no advantage against a factor 7 slower in write.
For 16-24 nvme's you are able to get lot better performance with 2 actual hw-raid ctrl. (eg perc12 h965i) but that comes at a luxus price.
Second is checksum of data which could be otherwise with xfs only reached with external raid storage system.
Third is if one use virtualization just changed blocks of virtual disks will be replicated by zfs send/recv.
Fourth I like in zfs is it's snapshot implementation, could do one every minute 10 years long
against any kind of mistakes of hw, sw(os+app), user and even virus etc manipulation
but it could be perhaps never needed, this feature is an insurence into the future.
And don't to forget here the zfs community always helpful and so much responseful - nicest ever !!
The cons of zfs are performance when you work with millions of files and tens to hundrets of TB
and that's while advertising as zetabyte filesystem but still struggling really hard long before the PB range.
If you generate 1TB of new data and eg remove 500GB every day, oh what a pity for zfs send/recv
instead of using rsync over nfs-mount to not have the ssh core 100% limit while even using parallel rsync's.
Second there are these endless zfs kernel tuning parameter which you can change small or big
but really in performance there's nothing changing remarkable. Most effect could be reached by changing
recordsize of datasets. If you go to 64k, 32k, 16k, 8k metadata performance get better but at same moment throughput goes even more and more down,
if you take bigger recordsize to 256k/.../1M/.../16M metadata performance fall into a black hole.
So in my opinion the default of 128k is best compromise in a mix of files environment with different sizes and types,
or if using zfs special device also recordsize=1M.
Third again performance to nfs, metadata find ..., read could better but write ..., then you ask to SLOG ... but what is zfs doing ?
Sync data is written to pool slowly without slog. When have slog it's first written to there and stay there for hopefully nothing
and now that data must still written unchanged slow to pool ... so while this is still writing all same time reads are slowed down
- thing to a computed results file like 500g - for a quiet long time also.
Fourth there's this zpool cannot import problem, if you google "cannot import" daily 30 days with filter "last 24h" you find 5-10 posts each week.
Doing all these guys wrong (and why while docu to zfs is really good) or all using weak hardware ?? Hands up - I don't know.

esi_y · Aug 11, 2024

waltar said:
ZFS is a hightech filesystem which could not be denied but it has it's pros and even cons.
What I really really like in zfs is the ability to (d)raid(z) nvme's with really good r/w performance
while mdadm get's horrible write performance, read could be 30% better but that's no advantage against a factor 7 slower in write.
For 16-24 nvme's you are able to get lot better performance with 2 actual hw-raid ctrl. (eg perc12 h965i) but that comes at a luxus price.
Second is checksum of data which could be otherwise with xfs only reached with external raid storage system.
Third is if one use virtualization just changed blocks of virtual disks will be replicated by zfs send/recv.
Fourth I like in zfs is it's snapshot implementation, could do one every minute 10 years long
against any kind of mistakes of hw, sw(os+app), user and even virus etc manipulation
but it could be perhaps never needed, this feature is an insurence into the future.
And don't to forget here the zfs community always helpful and so much responseful - nicest ever !!
The cons of zfs are performance when you work with millions of files and tens to hundrets of TB
and that's while advertising as zetabyte filesystem but still struggling really hard long before the PB range.
If you generate 1TB of new data and eg remove 500GB every day, oh what a pity for zfs send/recv
instead of using rsync over nfs-mount to not have the ssh core 100% limit while even using parallel rsync's.
Second there are these endless zfs kernel tuning parameter which you can change small or big
but really in performance there's nothing changing remarkable. Most effect could be reached by changing
recordsize of datasets. If you go to 64k, 32k, 16k, 8k metadata performance get better but at same moment throughput goes even more and more down,
if you take bigger recordsize to 256k/.../1M/.../16M metadata performance fall into a black hole.
So in my opinion the default of 128k is best compromise in a mix of files environment with different sizes and types,
or if using zfs special device also recordsize=1M.
Third again performance to nfs, metadata find ..., read could better but write ..., then you ask to SLOG ... but what is zfs doing ?
Sync data is written to pool slowly without slog. When have slog it's first written to there and stay there for hopefully nothing
and now that data must still written unchanged slow to pool ... so while this is still writing all same time reads are slowed down
- thing to a computed results file like 500g - for a quiet long time also.
Fourth there's this zpool cannot import problem, if you google "cannot import" daily 30 days with filter "last 24h" you find 5-10 posts each week.
Doing all these guys wrong (and why while docu to zfs is really good) or all using weak hardware ?? Hands up - I don't know.

I really enjoyed reading this post despite the lack of formatting.

ZFS has had its issues since the active development got really active and people try to use all its new features. Also somehow we are forgetting it was never designed for NVMes (arguably neither was XFS though, but it's not so intricate).

waltar · Aug 11, 2024

And there is no one you could provide any numbers to a xfs+special vs a zfs+special efficience, see
https://forum.proxmox.com/threads/best-practice-zfs-sw-raid-hdd.152415/page-2 ?
Unfortunatelly until today I didn't had a zfs fileserver with special device and with even slog together, hopefully in a future project.

Nevertheless the future remains exciting, bcachefs development is rapid while mostly a one man show - strong gratulations,
xfs get's 16k atomic writes (perhaps switchable cow on/off as reflink block cloning still default since RHEL 8.0beta years before),
ext5 is coming (don't know any features about yet), btrfs looks still at the moment but at the end as you all know
zfs development is always mega activ and always for new surprisements good

waltar · Aug 12, 2024

I think I have a funny example onto zfs performance: When you have an application dir which consists of lot of subdirs and files which are 31GB in size (in a "normal" filesystem like xfs/ext4) it uses 21GB in a lz4 compressed dataset. Now you "time tar" that dir and pipe with cat to /dev/null.
Doing that with xfs on a 8+1 hdd-raid5 or 21+2 hdd-raid6 with "echo 3 >drop_caches" it takes 3:30 to 3:45min, so the disk number doesn't change much here. Doing that on a zfs dataset default recordsize=128k on a raidz2 pool 4x6hdd you see in iotop "Actual disk read" 200-250MB/s and to "Total disk read" about 700MB/s ... you think, hey that would run fast as even just compressed 21GB are to read ... but at the end time reports 13:10min ... What and why the hell is zfs reading such much trash also until done the read what is should really read ?!?
When you tar the app dir into a tar file and ls -l it's same size on both filesystem types even when it uses less space with lz4.
You cannot take iostat, zfs iostat or iotop to evaluate zfs performance, you must take the measured "time" against the data amount and "compute" yourself "data/time" afterwards !! That's why I said to my colleagues (we all have fun against each other) zfs is a (music) vinyl disk or tape drive filesystem as for one file "side A" must be read and for the next "side B" or "A" again and again and again until ready anytime ...

Search

Search

Is zfs pool with none RAIDz more dangerous than ext4?

why-be-banned

Member

leesteken

Distinguished Member

Dunuin

Distinguished Member

donrusso

Member

waltar

Member

esi_y

Active Member

waltar

Member

donrusso

Member

waltar

Member

esi_y

Active Member

waltar

Member

waltar

Member