Best practices - boot from raidz or standard disk

rcd

Well-Known Member
Jul 12, 2019
246
26
58
63
On a system with zfs/raidz, is it best to boot directly from the raidz or is it better to add a small standard disk to boot from? I can imagine if things goes belly up for any reason it's easier to recover with a separate boot drive. I realize the best may be to boot from a mirrored pair, but given that is not an option?
 
Having the VM disks stored on another RAID than the system itself can definitely be an advantage.

If you cannot run the system itself on a mirror, you can at least back up the important files such as
/etc/network/interfaces
and
/etc/pve/*

In case you need to reinstall.
 
  • Like
Reactions: takeokun and rcd
Hijacking this post because it rises another question :
If you have pve on a mirrored pool (and your VM on a separate RAID), can you "dd" weekly your pve SSD to a 3rd SSD ?

This way, if something goes wrong, you can boot from the 3rd drive instead of the mirrored drives.
 
We had problems in the past on booting up from ZFS, most of the time the scenario would be a the server does not come up after a reboot after an upgrade. In that case we had to boot from another device and fix the ZFS in order to boot.

Maybe things are more stable now but we decided to have a reparated ext4 drive just for the system.
 
If you have pve on a mirrored pool (and your VM on a separate RAID), can you "dd" weekly your pve SSD to a 3rd SSD ?
`dd` ing a running system sounds like a bad idea as you will most likely get an inconsistent state.
 
Ok, so what could be the best solution if you want a real backup pve install ?

Because RAID is good, but if anything goes wrong, it goes wrong on all the RAID members...

If I install debian, then pve, mdamd RAID 1 on 2 SSD for the install, and each time I want to backup, I remove one SSD from the mdadm raid (so it won't be a running system), and "dd" from this SSD to a 3rd one, then connect back the SSD into the RAID.

Just by writing it it seems dirty haha
 
Just by writing it it seems dirty haha
Reading it makes me feel dirty ;)

In general, if you have a simple one node installation you should be fine backing up your config files. A reinstall is done quickly and you can restore the configs and everything should be working again (after a reboot)

Most likely fast than `dd` ing a backup back
 
I do have a simple node install, however the server is not hosted in my place. And I cannot manage a remote reinstallation if proxmox does not even boot.
But, I can manage to boot from a backup SSD from the BIOS though...

Easier and less dirty :
  1. Install Proxmox (from Debian, directly, ZFS or not, whatever) on mirrored devices.
  2. Install Proxmox a second time on the third device
  3. Rsync /etc/pve/* and /etc/network/interfaces from the mirrored devices to the third device
Thoughts ?
 
So let's say you want to reinstall Proxmox for whatever reason, but there's an existing and functioning zraild pool that you want to keep. What do you do?

add a new drive to the server, boot a Proxmox installer, tell it to install on the new drive - will it automatically understand not to overwrite, or perhaps even import, the existing pool?
 
So let's say you want to reinstall Proxmox for whatever reason, but there's an existing and functioning zraild pool that you want to keep. What do you do?

add a new drive to the server, boot a Proxmox installer, tell it to install on the new drive - will it automatically understand not to overwrite, or perhaps even import, the existing pool?
Pretty much. If you are afraid that you might use one of the drives of the existing pool during installation, you can also disconnect them during installation.

IIRC you should then make sure that the old pool has the cachefile set and after you import it, recreate the initramfs.
 
I tried replacing the boot drive and installing a fresh Proxmox, now what?

Code:
root@pve2:~# zpool status
no pools available
root@pve2:~# zpool import
no pools available to import
root@pve2:~# zpool list
no pools available
root@pve2:~# zfs list
no datasets available

What do I need to do to get it to appear. I didn't explicitly export it first, in case of a boot disk failure I wouldn't get the chance anyway.
 
Do you see the disks on which the pool should be?

What does lsblk show?
 
Yes, the disks are there.

Code:
root@pve2:~# lsblk
NAME               MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                  8:0    0 55.9G  0 disk
├─sda1               8:1    0 1007K  0 part
├─sda2               8:2    0  512M  0 part /boot/efi
└─sda3               8:3    0 55.4G  0 part
  ├─pve-swap       253:0    0  6.9G  0 lvm  [SWAP]
  ├─pve-root       253:1    0 13.8G  0 lvm  /
  ├─pve-data_tmeta 253:2    0    1G  0 lvm
  │ └─pve-data     253:4    0 25.9G  0 lvm
  └─pve-data_tdata 253:3    0 25.9G  0 lvm
    └─pve-data     253:4    0 25.9G  0 lvm
sdb                  8:16   0  3.7T  0 disk
├─sdb1               8:17   0    2G  0 part
└─sdb2               8:18   0  3.7T  0 part
sdc                  8:32   0  3.7T  0 disk
├─sdc1               8:33   0    2G  0 part
└─sdc2               8:34   0  3.7T  0 part
sdd                  8:48   0  3.7T  0 disk
├─sdd1               8:49   0    2G  0 part
└─sdd2               8:50   0  3.7T  0 part
sde                  8:64   0  3.7T  0 disk
├─sde1               8:65   0    2G  0 part
└─sde2               8:66   0  3.7T  0 part
sdf                  8:80   0  3.7T  0 disk
├─sdf1               8:81   0    2G  0 part
└─sdf2               8:82   0  3.7T  0 part

I just remember, the pool was encrypted. I got the encryption key, but trying to use it still doesn't show the pool

Code:
zfs load-key poolz/pool_poolz_encryption.key
cannot open 'poolz/pool_poolz_encryption.key': dataset does not exist
 
Last edited:
Actually isn't there a guide for zfs somewhere. I have so many questions but find it quite hard to find the answers, like:

- when I create a VM the installers always want to create ext4 file systems for it. Since this is now on zfs wouldn't it make more sense to just create a datasetas the file system?
- same goes for data disks. I want to have my data on separate disks for the VM's, and again I am forced into using ext4. I see some suggestions to just add data directly on the host file system and bind mount it into vm's, but it seems a bit hack'ish. Surely there must be a better way?

Probably I will have many more questions as I dig deeper into this rabbit hole, so if there's a decent guide somewhere I'd really appreciate a pointer.
 
I just remember, the pool was encrypted. I got the encryption key, but trying to use it still doesn't show the pool
As the error suggested, the pool needs to be imported for this to work. How did you set up the pool initially? Did you set the partitions yourself? The 2G partition at the beginning does seem a bit unusual to me.

You could possibly try a zpool import -D and see what it returns.
sdf 8:80 0 3.7T 0 disk
├─sdf1 8:81 0 2G 0 part
└─sdf2 8:82 0 3.7T 0 part

- when I create a VM the installers always want to create ext4 file systems for it. Since this is now on zfs wouldn't it make more sense to just create a datasetas the file system?
VMs expect a block device that they can partition and format with their own file system. That's why for VMs PVE creates datasets of the type `volume` which are exposed as block devices. They will not show up in the file system. For containers `file system` datasets are created which can be seen in the file system.

- same goes for data disks. I want to have my data on separate disks for the VM's, and again I am forced into using ext4. I see some suggestions to just add data directly on the host file system and bind mount it into vm's, but it seems a bit hack'ish. Surely there must be a better way?
Bind mounts work for containers that use the PVE hosts kernel and thus can access file systems on it via a bind mount.

To pass through data stored on the PVE host itself to a VM you will need some kind of network share on the host itself. Samba or NFS.
In the future, `virtio-fs` will probably make that situation much better.

if there's a decent guide somewhere I'd really appreciate a pointer.
Have you seen our official documentation already? https://pve.proxmox.com/pve-docs/pve-admin-guide.html
 
zpool I think I have tried everything, also -D -f and many other options, always the same result. I've given up on this and start over. It annoys me a bit as I would have liked to know that I'd be able to reuse a zfs filesystem with a new installation, but on the other hand I can't waste more time with it as there wasn't anything import on the filesystem. Yes I set up the partitioning myself, but I see nothing in the documentation saying that should be an issue or how to possibly get around it.

vm installers I guess that makes sense, I don't suppose the installers have any way of knowing they're running in a vm with the underlying media being "ready to use". Anyway, shouldn't there at least be a way to add a data drive "as is" as a secondary drive? Is that perhaps what will be in the "virtio-fs" you mentioned?

bind mounts So essentially, if you want some sort of large data storage being shared with different virtual machines, the best way would be to bind-mount a dataset on the host filesystem to a container, and from there share it through some other means, nfs/smb/sshfs/webdav ?

admin-guide Yes I use your pve-admin-guide a lot. I find it very good as a reference, but things are not always completely explained. More examples would be nice, like you find it in most man pages. ;)
 
Too bad that the pool disappeared somehow. The partitioning scheme isn't necessarily bad, just unusual. If you add a disk to a Zpool VDEV and pass it as the full disk, e.g. /dev/sda, it will look like this:
Code:
sda     259:0    0 931.5G  0 disk 
├─sda1 259:1    0 931.5G  0 part 
└─sda2 259:2    0     8M  0 part

vm installers I guess that makes sense, I don't suppose the installers have any way of knowing they're running in a vm with the underlying media being "ready to use". Anyway, shouldn't there at least be a way to add a data drive "as is" as a secondary drive? Is that perhaps what will be in the "virtio-fs" you mentioned?
The underlying media isn't "ready to use" for a VM. The concept of a virtual machine makes it necessary to simulate physical hardware. An operating system expects raw block device storage and cannot use the file system of an underlying virtualization host.
You can add additional disks to the VM which will be (if the storage is ZFS) zfs volumes.

Some people pass through disk controllers directly to the VM via PCI passthrough in order for the disk to have raw access to the physical disks. But this only works in small single node setups.
Once you start to cluster your virtualization hosts you usually want to be able to move VMs between nodes. This makes it necessary to have this kind of abstraction.

All the solutions to exchange data between the host and VM use some kind of protocol that the guests need to be aware of and usually means to install additional software.

bind mounts So essentially, if you want some sort of large data storage being shared with different virtual machines, the best way would be to bind-mount a dataset on the host filesystem to a container, and from there share it through some other means, nfs/smb/sshfs/webdav ?
A bind mount remounts a part of a file system somewhere else in the file system. This is needed for containers if they should be able to access a directory structure outside from the directory in which they are locked in to.

The host itself doesn't need any bind mounts as possibly installed services can access them already in their original location.
admin-guide Yes I use your pve-admin-guide a lot. I find it very good as a reference, but things are not always completely explained. More examples would be nice, like you find it in most man pages. ;)
We are happy if you open enhancement requests in our bugtracker[0] if you have specific areas in the documentation that you wish to be improved upon.

[0] https://bugzilla.proxmox.com