How to deal with degraded btrfs-arrays?

Aug 20, 2015
35
5
73
Germany
I am playing around with btrfs at the moment. What I haven't understood yet how to deal with degraded arrays:

First I tried to install PBS in a VM, rootfs on ext4, so nothing special. Installed kernel 5.11 and btrfs-progs from backports.
Made three disks and put them into a raid1-array with btrfs (I really had to test this raid1 with uneven disks, nice!).
Detached on of the three disks while system is running and nothing really happens.
Opposite to other implementations btrfs doesn't seem to have any notifications built in, so I have to monitor if a disk is failing/an arrays is degraded?
When trying to boot the system hangs because it can't mount the btrfs-volume. Even if I put degraded into fstab and /dev/sdb instead of the UUID.
On the rescue-shell I can login and mount just fine with /dev/sdb and degraded-option.
What I didn't manage is to put the degraded as rootflag in grub, which might be what is needed to boot the system in degraded state.

After this I installed pve-7-beta in a VM with three disks as btfrs raid1, directly configured in the installer.
It seems that the installer just does a root-volume for the system, no subvolume (or only the root-subvolume? It is still very new to me, I don't know if I have all the naming correct).
When detaching a disk here I am going to the initramfs-shell and don't really know how to proceed further at the moment.

If I want my systems to boot in degraded state, like other implementations do it, should there be a permanent entry to boot in degraded state?

What have others experienced with btrfs so far?
 
So about booting a degraded btrfs: yes, you'll need to use the `rootflags` grub option there, or wait for the initramfs to pop up and then mount it manually to `/root` via `mount -o degraded /dev/sdXY /root` and hit Ctrl+D.
You can of course add a custom grub entry to boot in degraded state, but I would recommend against *generally* adding the option.

Another thing to note, when using 3 disks, if you're using regular raid1, only *data* exists on all 3 disks, metadata will only be duplicated once, which IMO is rather... well... bad, and I'd recommend using `raid1c3` for this case :-/

As for creating more subvolumes during installation, I don't know. If you could do things like use different compression settings for different subvolumes like you can in ZFS, it would probably make sense to make a few more. But currently I don't really see a lot of benefits, apart from maybe adding things like `/home` I guess, but you can do that after the installation anyway.
 
So about booting a degraded btrfs: yes, you'll need to use the `rootflags` grub option there, or wait for the initramfs to pop up and then mount it manually to `/root` via `mount -o degraded /dev/sdXY /root` and hit Ctrl+D.
In the rescueshell it worked, in the initramfs it didn't - or it was kind of late that day, I'll try that again.

You can of course add a custom grub entry to boot in degraded state, but I would recommend against *generally* adding the option.
Thank you, it looks wrong to me, too, to have it generally. But that btrfs won't boot degraded as almost every other raid-whatever-implementation does feels weird, too.

Another thing to note, when using 3 disks, if you're using regular raid1, only *data* exists on all 3 disks, metadata will only be duplicated once, which IMO is rather... well... bad, and I'd recommend using `raid1c3` for this case :-/
Good point. I was using raid1 for data and metadata here and it's at the moment only for testing things out. Nice thing with btrfs is that it's possible to "migrate" the metadata from raid1 to raid1c3 online, one of the things where btrfs really shines.
As for creating more subvolumes during installation, I don't know. If you could do things like use different compression settings for different subvolumes like you can in ZFS, it would probably make sense to make a few more. But currently I don't really see a lot of benefits, apart from maybe adding things like `/home` I guess, but you can do that after the installation anyway.
Nothing what must be added to the installer, I just wondered, because I read somewhere* that you should always use a subvolume, even if it's only one, and not the root-(sub-)volume. But I don't know if this is really something important.
*somewhere: somewhere on the internet. A bit of a problem with btrfs it seems that there is the official wiki, yes, but besides that there is some information all over the place, sometimes for different versions, sometimes best practices mixed with personal practices and so on, which makes finding real good information a bit difficult when starting with btrfs.

Thanks for taking the time to answer!

One thing I don't understand if this should be the case: When not having the root-fs on btrfs, only an extra mounted volume, which is degraded, should the boot still fail respectively stuck in an early init-state?
 
I've just set up a pve7 in a VM again to test a bit more:
PVE-Root installed on ext4 disk, so nothing special.
Added disks and made a btrfs-raid1 which is mounted via fstab. Detach a disk, so the array becomes degraded and reboot.
System hangs for a while, then the emergencyshell appears because dependency for local file system failed.
Things I tried:
modified fstab to use /dev/sdb instead of UUID und add degraded, so it can mount.
Then mounts just fine in the emergency shell.
Reboot again, emergencyshell again.
Try adding rootflags=degraded results in unknown mount option, because it only applies to the root-fs, which isn't btrfs. So the rootflags isn't an option to get it going.
What I scratch my head about is: when fstab is modified so it mounts in the emergencyshell just fine, why doesn't mount it fine when booting up?
 
As for creating more subvolumes during installation, I don't know. If you could do things like use different compression settings for different subvolumes like you can in ZFS, it would probably make sense to make a few more.
Different mount options per-subvolume is currently not supported in btrfs (except for generic options) [1] [2]
 
Added disks and made a btrfs-raid1 which is mounted via fstab. Detach a disk, so the array becomes degraded and reboot.
System hangs for a while, then the emergencyshell appears because dependency for local file system failed.
If it's just for data and not critical for system startup (iow. it's not your /etc or /usr mountpoint ;-)), you probably want to add nofail to the fstab entry (see man 5 systemd.mount). When using it as a storage for PVE you can use the is_mountpoint storage option to tell pve to check that it's mounted.

I can only speculate, but I think the idea behind not mounting something degraded by default is that the tools triggering the initial mount (ie systemd from fstab) might not know all the disks required for the mount and some might take a bit longer to appear? Eg. if you're using a bcache layer or mix a hard disk which is there immediately with an LVM disk which only appears after lvm has been initialized... I guess with ZFS people would just say "don't" ;-)
 
Code:
nofail
is indeed something which helps that the system boots (btrfs mount only used for non-system-files here).
But it doesn't mount the filesystem. The options:
Code:
subvol=blah,degraded,nofail
- when I log in and type
Code:
mount /mnt/blah
it mounts without issues (in degraded state).
This behaviour isn't really logical to me.
When having access to the hardware it isn't a problem, but I think how these things should work when you have e.g. a rootserver where you might don't have IPMI. At the moment it seems to be easier to handle ZFS or LinuxRAID then BTRFS, doesn't it?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!