LVM thin - adding raid1

sancho_sk

New Member
Dec 9, 2022
7
1
1
Hi, all.
I am running my small home system with Proxmox successfully - no excessive load, just HomeAssistant, NextCloud, Frigate and few similar machines.
It all runs from 2TB nVME SSD, so far everything is OK.
As storage, the SSD is formatted for LVM thin provisioning.
Now, I start to see wear of 9% on the SSD - which is expected, as I have it already running for 3+ years and it's basic (read lowest cost) consumer grade drive.

What I'd like to achieve is to add a mirror to the LVM setup.
I found plenty of tutorials online how to do this for "normal" LVMs, but nothing for thin.

Is this even possible?
If yes, can someone suggest some how-to, please?
 
LVM-thin is just a (fixed size) layer on top of LVM. I would expect that mirroring the underlying LVM is what you are looking for. Note that it will only give you protection against one drive disappearing not data corruption on one of the drives (for which you need ZFS or Btrfs).
 
The drive failure protection is exactly what I'm looking for...
The problem is - the "regular" how-tos for LVM adding raid1 disk always work only with regular LVM, not withi thin.
 
On all forums I've read that using ZFS on consumer-grade SSDs is strongly discouraged. So I used LVM instead...
That seems... rather strange but I too did see something about that where Proxmox is concerned.
I am unsure why it would be that using LVM would not have the same exact issue, if not worse so.

Can you share this info so I may see?
Something about many small transient writes ... vaguely recall.
 
That seems... rather strange but I too did see something about that where Proxmox is concerned.
I am unsure why it would be that using LVM would not have the same exact issue, if not worse so.
ZFS is doing a lot of sync writes which are terribly slow and cause horrible wear when not using a SSD with power-loss protection to be able to cache sync writes in the SSDs DRAM.
Then ZFS got heavy overhead/write amplification because of the Copy-on-Write mechanism so you usually want a SSD with higher TBW/DWPD to compensate that.
And the checksumming relies on data not to corrupt while in RAM so ideally you also want ECC RAM.

When only using consumer hardware it might be better to choose a lighter filesystem/raid/volume management.

LVM-Thin on Mdadm raid1 would also be another unsupported lightweight option.
 
Last edited:
  • Like
Reactions: sancho_sk
Thanks, @Dunuin . This was exactly my way of thinking - using ZFS on consumer SSD will be slow, memory is also not ECC and while the data are valuable, the instant changes are not. So as long as I can recover journal or have a copy, I would be fine.
The whole idea of RAID1 would be to avoid downtime while one of the SSDs reports SMART errors or wearout.
I do have on-site and off-site backup for the critical data every 24 and 48 hours, so instant data loss is not a big threat.
And my whole house has 19kWh battery backup, so the probability of unexpected power outage is relatively small.
 
I suppose with the original Q answered, I'd now be going on a tangent in my use case. Let us see what we observe.

With copies set to 2, even if a bad block was somehow on both vdevs I have yet another level of built in fault tolerance :)

My setup uses the following : a mirror of 2x SSD vdevs ( low end consumer grade )
With 256GB ECC RAM on host.

rpool recordsize 128K
rpool copies 2
rpool compression zstd
rpool compressratio 2.27x
rpool atime off

This is what I see writes wise:

Bash:
root@X99:~# zpool iostat rpool 5 25

              capacity     operations     bandwidth

pool        alloc   free   read  write   read  write

----------  -----  -----  -----  -----  -----  -----

rpool       5.68G   224G      0     31  37.9K   399K

rpool       5.68G   224G      0     35      0   429K

rpool       5.68G   224G      0     35      0   430K

rpool       5.68G   224G      0     35      0   413K

rpool       5.68G   224G      0     30      0   376K

rpool       5.68G   224G      0     36      0   432K

rpool       5.68G   224G      0     31      0   373K

rpool       5.68G   224G      0     32      0   341K

rpool       5.68G   224G      0     32      0   440K

rpool       5.68G   224G      0      0      0      0

rpool       5.68G   224G      0     32      0   350K

rpool       5.68G   224G      0     39      0   645K

rpool       5.68G   224G      0     38      0   702K

rpool       5.68G   224G      0     40      0   493K

rpool       5.68G   224G      0     30      0   371K

rpool       5.68G   224G      0     36      0   445K

rpool       5.68G   224G      0     29      0   378K

rpool       5.68G   224G      0     30      0   338K

rpool       5.68G   224G      0     34      0   427K

rpool       5.68G   224G      0     36      0   378K

rpool       5.68G   224G      0     33      0   424K

rpool       5.68G   224G      0     33      0   419K

rpool       5.68G   224G      0     34      0   416K

rpool       5.68G   224G      0     30      0   371K

rpool       5.68G   224G      0     33      0   424K

On your RAID1 / mirror over LVM, what kind of writes/iostat values do you observe?
 
With copies set to 2, even if a bad block was somehow on both vdevs I have yet another level of built in fault tolerance
I don't see the point. If a 2-disk mirror isn't save enough you could create a 3-disk mirror with copies=1 to have 3 copies of everything so the same data on 2 vdevs could corrupt while only losing 66% of the raw capacity. With a 2-disk mirror and copies=2 you would lose 75% (2x 2 copies= 4 copies of everything) of the raw capacity and only 1 instead of 2 disks is allowed to fail. There is really no need to use anything other than copies=1 unless there is no space in the case to add a another disk.
 
Last edited:
@sancho_sk - Once you are done on setup please share observations on your RAID1 / mirror over LVM, on the writes/iostat values. As I remain... somewhat sceptical.

To explain upon this point, as shown above capacity is of no concern here:
capacity 224G, alloc 5.68G

Even if it was though - also factor in this to any maths based calculations:
rpool compressratio 2.27x

With that ratio of 2.27 effectively means that even with 2 copies, the total data written to storage is lower than copies=1 with compression off.

Corruption is not possible as such in ZFS, as all writes are written in a single atomic operation.
Subsequent CHKSUM or READ errors however, indeed are possible.

"only 1 instead of 2 disks is allowed to fail" - true, if talking of both vdevs being totally FOOBARed at same point in time but what is often more common than not are some set of WRITE, CHKSUM and or READ errors being present prior to this on a given vdev. Normally long before SMART values mark a vdev as failed or otherwise.
 
Even if it was though - also factor in this to any maths based calculations:
rpool compressratio 2.27x

With that ratio of 2.27 effectively means that even with 2 copies, the total data written to storage is lower than copies=1 with compression off.
But you would get the same 2.27x compression ratio when using a 3-disk mirror with copies=1 and its even more reliable than your 2-disk mirror with copies=2. ;)

Corruption is not possible as such in ZFS, as all writes are written in a single atomic operation.
There could still be bit rot. Atomic operations only prevent data from corrupting in case of an power outage/kernel crash while the data is written. It won't help against data corruption with data at rest. For that you got the checksumming of ZFS to fix those corruptions in case you got parity data or mirrors or a single disk with copies=2+. But even here it would be better to have a 2-disk mirror with copies=1 instead of a single disk with copies=2 as the single disk with copies=2 won't protect you against failed disks while both waste the same amount of space.
 
Well, this was a frustrating endeavor.
I've bought another 2TB SSD (same vendor) so that I can move things around without loosing machines.
However, after a week of work, I am getting back to ZFS.
Reason?
Setting up the LVM thin raid was relatively easy thanks to the guides above. So far so good.
However, the raid sync took few hours.
And then, when I started to move some VM disks to the raid, it often stopped at random percentages and I got dmesg full of
Code:
md: requested-resync of RAID array mdX
and
Code:
md: mdX: requested-resync interrupted
.
No idea what caused these, but even after overnight run the system was not able to copy 600GB disk image from ext4 dir volume to LVM thin raid.
So, after 3rd attempt to create and copy the partition for my nextcloud, being for almost a week without nextcloud backup of our phones and tablets, I had to count my losses, re-formatted the array back to ZFS and already in process of copying data there.

Thanks to all who contributed with advice and support.

If there is someone with experience running LVM raid1 thin provisioning with success, it would be great if you can share your experience.
 
  • Like
Reactions: Himcules

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!