1 SSD and 2 HDD - best storage setup?

alexc · Dec 5, 2019

If I have only this disks in server: 1 x NVME SSD and 2 x SATA HDDs, and no RAID card, what will be the best setup maximizing VM (no containers, only VMs) storage and server speed?

1. HDDs are in mirror (seems to be md-based one), and SSD as single disk for VMs (yes we'll do backups to HDD, and we can afford get back to previous backup if SSD fails).

2. HDDs are in zfs mirror with SSD as cache (may be even two SSD partitions as two ZIL/SLOG and L2ARC) - not much experience with ZFS under PVE, looks a bit risky setup

3. - ?

Please advice how I can better use such disks.

ness1602 · Dec 5, 2019

I would install proxmox on SATA raid1 or Zfs1 ,and i would host VM's on NVME, with daily backups.

alexc · Dec 5, 2019

Yes I think this is the best approach. The problem is how mirror should be created out of HDDs. I can:

1. Use md. Time-consuming, but proved to be working solution. Not recommended by PVE itself nor supported in ISO based installer.

2. Use ZFS. Rumors claim I should put boot on non-ZFS partition and the rest on ZFS for rarely chance of new kernel won't be able to boot from ZFS-based boot (never faced such problem myself but read about).

Maybe I can use small portion of SSD as L2ARC for the ZFS made of HDDs - is it worth it?

Frankly I do like ZFS1 idea, but hate it it fails to boot after kernel update.

DANILO MONTAGNA · Dec 5, 2019

Depending the NVME size, the best option is to use it for VM storage.. if it's too small in size, use as ZIL/ARC cache only (entirely) and maintain the VMs on HDD disks.

alexc · Dec 5, 2019

DANILO MONTAGNA said:
Depending the NVME size, the best option is to use it for VM storage.. if it's too small in size, use as ZIL/ARC cache only (entirely) and maintain the VMs on HDD disks.

NVME is enough sized to keep all VM data - the problem is it single one.

What you will recommend as robust mirror technology: ZFS or MD or maybe something else?

DANILO MONTAGNA · Dec 5, 2019

alexc said:
What you will recommend as robust mirror technology: ZFS or MD or maybe something else?

If you are going to use the NVME as VM storage, install Proxmox on MD/XFS, and make VM daily backups on these disks.

alexc · Dec 5, 2019

DANILO MONTAGNA said:
If you are going to use the NVME as VM storage, install Proxmox on MD/XFS, and make VM daily backups on these disks.

XFS even that Debian tends to use ext3/4?

DANILO MONTAGNA · Dec 5, 2019

alexc said:
XFS even that Debian tends to use ext3/4?

XFS is better than EXT3/4.. also it supports online partitions change/extend without reboot..

alexc · Dec 5, 2019

DANILO MONTAGNA said:
XFS is better than EXT3/4.. also it supports online partitions change/extend without reboot..

I only say PVE won't migrated to XFS so far (and not appears to do so), and they do aware of any FS "goodness" I suspect.

Ok, anyway you vote for MD not ZFS?

DANILO MONTAGNA · Dec 5, 2019

alexc said:
I only say PVE won't migrated to XFS so far (and not appears to do so)

I'm using proxmox clusters on top of XFS without any problems...also it's better than EXT4 when you store large files... I think it's a customer decision rather than proxmox option..

As I said, I would go for MD... but it's up to you!! if you use ZFS without ZIL/ARC cache SSD disk, the ARC cache will use physical RAM to store cached data, and default is to use 50% of host available RAM, using ZIL with only one SSD disk is also a risk, since you can have data loss in case of disk failure, where the cached data wrote on SSD disk was not committed to HDD disks before disk failed or host power failure.

In your case I think it will not be a problem, since you are comfortable to restore VM from an old vzdump backup (maybe your RPO is about 24 hrs) in case of a disaster.

LnxBil · Dec 5, 2019

DANILO MONTAGNA said:
XFS is better than EXT3/4.. also it supports online partitions change/extend without reboot..

Partition change is - as the word says - the partition, so no XFS involved. kaprtx or partprobe are the tools to do that. ext4 has online grow and offline shrink. XFS has only online grow.

alexc said:
3. - ?

If you don't go down the ZFS road, in which you cannot use the NVME as efficiently as my proposed solution, you can go with md-raid on the disks and then additionally some cache technology like flashcache, bcache in write through (write back is faster but very dangerous with only one disk) mode so that every read is automatically cached on the NVME until reboot. I used this stack for quite some time a few years ago and it worked fine.

DANILO MONTAGNA · Dec 5, 2019

LnxBil said:
Partition change is - as the word says - the partition, so no XFS involved

Yep, I meant filesystem extend but wrote partition... But I didn't know ext4 support online extend, I thought only XFS could do that because I started to use only XFS a long time ago...

Thanks for the clarification!!

alexc · Dec 6, 2019

Thank you for your recommendations! You see, I'd prefer MD as it is possible to recover it easily (while ZFS recover is hard to understand thing if something go wrong). Sadly, PVE setup won't handle MD mirror out of box, so need to set it up manually (or set up Debian and put PVE as package).

By the way, ZFS mirror on two HDDs only - how slow can it be vs the same HDD under MD? Rumors says ZDF is quite slow, and deadly slow it not tuned up properly.

LnxBil · Dec 6, 2019

alexc said:
By the way, ZFS mirror on two HDDs only - how slow can it be vs the same HDD under MD? Rumors says ZDF is quite slow, and deadly slow it not tuned up properly.

That depends heavily on how you use it. Having transparent compression will actually increase throughput and storage efficiency if the data is compressible (e.g. operating systems, logs etc.). Resilvering disks is much faster because only the used data is mirrored, not everything like with normal raid1 (md included). So setting up a RAID1 (or more precise a mirrored vdev) in ZFS will directly yield a synced mirror. Having snapshots is great. ZFS combined lvm and filesystem in one thing, which is great, because free space can be used in all filesystems/zvols. Support for Copy-on-Write (CoW), so clones are very fast. You have incremental replication and the best feature of all is the silent data corruption prevention by self-healing due to checksumming. Every flipped bit on disk will be found and corrected. You already mentioned that ZFS has Proxmox VE support and that is because of its awesome features.

Of course all of those feature cost cpu cycles and memory, but features outweight the "potential slowliness", how much that will be. Just for comparison: raw throughput measured in fio will be slower for ZFS (depending on the used data and setup), but you cannot compare only that. If you take the use case virtualisation into account, you will need snapshots and may want to transfer them to other machines and then ZFS has the huge advantage, because of its incremental replication. You also have snapshot and rollback capability for the hypervisor, which is also very nice.

Don't get me work, I also used md for over a decade, but ZFS beats it feature-wise out of the ballpark and that's what matters now.

alexc said:
You see, I'd prefer MD as it is possible to recover it easily (while ZFS recover is hard to understand thing if something go wrong)

I really don't see that. If you have a failed disk, the commands are very similar and are just a disk replace. The rest is taken care of - like in mdadm, just faster, because you only need to mirror used blocks, not everything as with mdadm.

alexc · Dec 6, 2019

Funny thing is that I prefer to keep VM's data in .qcow2 files, not in LVM-thin or (not tried yet) ZFS. The reason is simple and it more important that extra layers for data: I can easily copy over these files, even to different host server, even to external HDD, and it'll still be "file", not just data structure somewhere in volume management system. So I'd like to play with ZFS (and I think I will give it a try) but I would like to know that magic way how can I convert in-ZFS-stored VM disk into general qcow2/raw file.

LnxBil · Dec 6, 2019

alexc said:
but I would like to know that magic way how can I convert in-ZFS-stored VM disk into general qcow2/raw file.

It's not magic. In Linux (as in unix), everything is a file and so you just have to read it:

Code:

dd bs=128k if=/dev/rpool/proxmox/vm-100-disk-0 of=file.raw
qemu-img convert -f raw -O qcow2 /dev/rpool/proxmox/vm-100-disk-0 file.qcow2

It has been asked and answered a lot in this forum.

alexc · Dec 6, 2019

Thank you very much. This was a point I missed (and it was quite a problem for me to consider ZFS.

I've heard a lot of times that while ZFS stores disks as raw it is more efficient due to it has native snapshot/compression support. I would one day check if deduplication is that great thing when store data of many similar VMs, but suspect this is bad idea in the long-run.

The main problem for me is ZFS settings: I know I really need guru to tune it up properly. And virtual host server is not the place you may want to play with disk/fs settings.

Seems like I also should try to deploy ZFS on single SSD, too, just to check.

LnxBil · Dec 6, 2019

alexc said:
I would one day check if deduplication is that great thing when store data of many similar VMs, but suspect this is bad idea in the long-run.

Yeah, I try it from time to time on newly created pools, but it still is very slow. I hope to see a huge performance gain with the use of allocation classes for externalizing the dedup table on SSDs. You normally cannot have enough ram to fit everything.

alexc said:
This was a point I missed (and it was quite a problem for me to consider ZFS.

I fully understand. I also always want to be able to extract my data just in case.

alexc said:
The main problem for me is ZFS settings: I know I really need guru to tune it up properly. And virtual host server is not the place you may want to play with disk/fs settings.

I haven't tuned anything except to change the amount of ARC to use. That's all. Everything else is "stock PVE".

alexc said:
Seems like I also should try to deploy ZFS on single SSD, too, just to check.

Yes, it also works without any problem. On most of my laptops, I just use one disk (not more room to use proper mirroring). On every PVE instance however, I always use at least two disks (for mirroring).

alexc · Dec 6, 2019

LnxBil said:
Yes, it also works without any problem. On most of my laptops, I just use one disk (not more room to use proper mirroring). On every PVE instance however, I always use at least two disks (for mirroring).

I suspect ZFS over SSD (or SSD under ZFS) will be much slower that SSD itself. Even that SSD is NVME one.

You see, ordinary way to use SSD for me will be create LVM, create a partionion over it, format is as (ok, let's give a try, instead of ext4) xfs, then mount it in PVE, and add this mount path to PVE and storage. Then I'd put VM disks as qcow2's on this mounted disk. Quite a pack of levels, but I got used to do it in such a way.

Now, I can cast ZFS over SSD

I just create ZFS pool (I hope the term is right?), add it to PVE as ZFS pool, and PVE will manage the rest. I'll save of abstraction levels but ZFS is slower by itself, so I really try to understand will it be too slow or it'll be ok for run VMs on it. Hate to waste NVME SSD speed for nothing, and hope on single non-mechanical disk it can be good anyway.

LnxBil · Dec 7, 2019

alexc said:
I suspect ZFS over SSD (or SSD under ZFS) will be much slower that SSD itself. Even that SSD is NVME one.

Hmm ... every disk is faster without a filesystem, but that is probably not what you meant. You still haven't explicitly said what "slower" is for your. What is your workload that you want to compare? Here is benchmark that shows that ZFS is slower, but not by much and xfs has not one feature that ZFS has. Using a filesystem without any checksumming is kinda strange to me nowadays. I can see bit rod on the monthly scrubs on my disks. I would have lost over 1 MB of data in over 3 years since I have ZFS on my NAS. Maybe not much, but losing one bit in the wrong file ... and it's gone for good.

With respect to speed, nothing beats using just block devices with LVM directly on the disk, but you will not have good snapshots (lvm snapshots are just weird and completely different than normal virtualization snapshots like in qcow2, zfs or even vmdk and hyper-v), no CoW cloning, etc. I'm repeating myself :-/

If you want speed, do LVM, but if you want a modern filesystem perfectly tailored for (single hypervisor) virtualization, use ZFS.

1 SSD and 2 HDD - best storage setup?

Renowned Member

Renowned Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Member

Distinguished Member

Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member