Step by Step Install w EFI, ZFS, SSD cache, File Server

lampbearer

Active Member
Dec 10, 2017
12
7
43
58
Install Proxmox on a Dell R510 server (12 SCSI/SATA bays) with the following criteria:
UEFI boot
ZFS mirrored boot drives
Large Drive support (> 2TB)
SSD-backed caching (a Sun F20 flash accelerator with 96GB of cache on 4 chips)
Home File Server

There were lots of “gotchas” in the process:
Debian 9 doesn’t completely support a good UEFI strategy on Raid/Mirror Drives
Proxmox pretty strongly insists upon /dev/sdX drive naming whereas ZFS strongly encourages /dev/disk/by-id naming
SSD cache backing doesn’t clearly define whether sharing an SSD (on separate partitions) across multiple drive sets is really acceptable or not.

I ran into and solved problems with ZFS ghosts and various boot issues
One note: My server came with an H700 drive controller (cache-backed/battery-backed). This is not recommended for ZFS – you end up having a hardware raid controller underneath ZFS and that is begging for problems. I bought an H200 integrate (goes in the dedicated slot on my Dell) – had lots of concerns about whether I would need to cross-flash to get 6GB/s (SATA3) working and whether it would have large drive recognition. I should not have worried – all is good – supports up to 6GB with Dell firmware and the backplane firmware is fine as-is.

Standard disclaimer – this worked for my hardware. It was correct as of 2018. It may erase your drives, lose your data, etc. You are responsible for reading and understanding whether this makes sense for you to try and for protecting yourself. I’m just trying to save someone else the 100+ hours of reading I did to get to this point.

Instructions:

1. Download Proxmox

2. Create Proxmox Install Media
Download Rufus:

Click the Rufus file just downloaded
Insert a USB device (min 8GB)
Device: (USB letter)
Partition: (GPT or MBR for BIOS, UEFI)
Click ISO image and select proxmox file (other items will populate – run with defaults)
Click Start
Message about Grub needing to download files (allow it to)
Message about burning as mixed-ISO or DD (choose DD – mixed ISO doesn’t work right)
Wait for burn

3. Clear ZFS from previously used disks
If you have used ZFS on the disks before, it is highly likely that the partition scheme needs to be zapped in order not to cause problems.
Using the rescue procedure below but stopping as soon as it is fully booted, you should use these commands:
ls –ll /dev/disk/by-id (to show which disk is which and get the letters – be very careful, this is going to blow away the partition table)
sgdisk –zap-all /dev/sdX
zfs labelclear –f /dev/sdXX

If ZFS MetaData exists for old pools, they need manually wiped.
dd if=/dev/zero of=/dev/sdx count=1 bs=512k

Check with
zpool import
If it still shows stuff, you may need to erase the last sector also
Find the sector number by
diskinfo –c /dev/sdx
mediasize in sectors is the number you want
dd if=/dev/zero of=/dev/sdx seek={mediasize - set last 4 digits to 0000} bs=512k
reboot after this

4. Install Proxmox (to the system)
Insert Install media flash into target computer
Insert 2nd flash (min 16GB) into target computer or use hard drive
Set Boot mode to UEFI
Use computer UEFI setup to set boot drive to install media (USB) – on the Dell this is F11
Boot computer from flash

Select “Install Proxmox VE”
My boot drives are 80GB 2.5 inch Toshibas plugged in to the internal 2 connectors (slots 12 and 13). They are small enough that I did not need ashift=12 (4K block size). But one of the reasons to use EFI is that it does support booting on drives > 2TB.
Select Options and choose RAID1, use ashift=9 for small drives with 512 sectors
You can install as either ext4 (single drive) or Raid1.
If you have lots of drives in your system, you can use tab, enter, and arrow since there are so many they scroll off the screen (tab and press-hold up-arrow without using enter to get the dropdown is fastest since it stops at top at “do not use” and there are so many drives to do). Note RAID1 does not work without extra steps for UEFI because the EFI partition must be FAT and therefore represents a single point of failure. Proxmox didn’t think this was logical, but there are workarounds by putting the EFI on all boot disks and doing some things to keep it in sync.

The Proxmox install process is going to build 3 partitions – 1 is a small BIOS boot partition, 2 is the zfs partition and you will get partition 9 (Solaris reserved) that we will repurpose as an EFS (EFI partition)
Fill in your geo location, ip address, email, and password

Complete Install
Reboot computer


Upon reboot, use F11 to select the flash drive and follow the procedure in the step below (Rescue Procedure) to get into chroot – ths is so you can complete EFI setup.

5. Boot up the Proxmox Rescue Environment
Rescue Procedure
This procedure can be used for maintenance to the debian environment from outside the actual environment.
You can use the Proxmox Install disk as a rescue disk
Boot into the Proxmox Install disk
Select “Install Proxmox VE (Debug mode)”
When the booting pauses at a command prompt, press “CTL – D”
Let it finish booting and then click “Abort” in the GUI – you will be at a command prompt again
zpool export -a

zpool import -N -R /mnt rpool

zfs mount rpool/ROOT/pve-1

zfs mount -a

If needed, you can chroot into your installed environment:
I’m not a Linux expert so I learned that chroot mode uses a live environment (you boot off removable media like the proxmox flash install) and you attach to a different set of files (like the ones you just installed on your system) and you can edit one environment with another – it’s call chroot.

mount --rbind /dev /mnt/dev

mount --rbind /proc /mnt/proc

mount --rbind /sys /mnt/sys

chroot /mnt /bin/bash --login



Do whatever you need to do to fix your system. (see next step)

When done, cleanup:



mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | xargs -i{} umount -lf {}

umount /mnt

zpool export rpool

reboot

6. Fix the system to boot with UEFI
verify the drive letters with ls -ll /dev/disk/by-id
Identify the /dev/sdX of your boot drive(s) – mine were sdb and sdc
This next step re-formats the Solaris reserved partition for FAT as required by an EFS parition
mkfs.vfat /dev/sdb9
verify /boot/efi already exists or mkdir to create it
mount /dev/sdb9 /boot/efi
grub-probe -d /dev/sdb9
update-grub
grub-install -d /usr/lib/grub/x86_64-efi /dev/sdb
find /boot/efi -type f

This step adds the entry into your NVRAM (computer flash used during boot by EFI – it may or may not work on your hardware – but you can also manually enter an item in your UEFI by using a special key when your system boots, and pointing UEFI to the proper hard drive and directory)
efibootmgr -c -disk /dev/sdb -part 9
(should show a new boot named Linux)

7. Exit out of Rescue Mode
mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | xargs -i{} umount -lf {}

zpool export rpool (this step may fail – that is okay, it can be fixed below upon final reboot)

reboot

8. Reboot the system from UEFI into Proxmox
If you were unable to export the rpool in the above step, the reboot will exit into busybox during the boot process. In fact there is no “if” about this – it always happens. ZFS can normally unmount but for some reason not in proxmox. You will have to do this step. Busybox is just a limited command environment with a subset of commands available. The boot will drop you into a screen with a prompt and you can type the 2 commands below.
Enter this command:
zpool import -N -f rpool
exit


9. Install boot information to the 2nd Mirror Drive
Install efi to the other RAID1 boot drive
mkfs.vfat /dev/sdc9
verify /boot/efi already exists or mkdir to create it
mount /dev/sdc9 /boot/efi
grub-probe -d /dev/sdc9
update-grub
grub-install -d /usr/lib/grub/x86_64-efi /dev/sdc
find /boot/efi -type f


umount /boot/efi
you can use the UEFI on the server (F11) to make a boot record for the backup drive – to tell which drive is which, in UEFI Boot manager, go to add, hit Fn-F1 for help, and after “SATA,Internal,Direct,0xD” the 0xD represents the slot number – D=13, C=12

10. Fix the raid references on the boot array
This step is problematic no matter what choice you make. ZFS wants to definitively identify disks – if you move things around in your system (which is kind of the whole point of a hotswap bay), you cannot count on the fact that /dev/sdb is always the same disk. Supposedly “newer versions of zfs include some magic that kind of ties /dev/sdX to a particular UUID (disk identifier)”. I’ll simply say, I didn’t find that to be the case – I have seen it do odd things on booting and I wanted to lock it in (as per ZFS recommendations). However, be warned – if you do lock it down to a unique device identifier, you will not be able to use the proxmox installer to get back in if anything ever glitches – and that is a big problem. I’m not smart enough to know how to build my own live disk and it is not cool that I can’t fix my system. Things break and mine has at times. Choose wisely – I can’t help you more than that with the choice.

Proxmox uses by-name (sdb) which is really bad – if sda goes offline the server won’t boot and it cannot find the other flash drives by name so any hiccups in usb (which I experienced several) cause issues.

zpool status
You should see your mirror set (if you used one) and it will reference /dev/sdX

To fix this try the following:
nano /etc/defaut/zfs
change #ZPOOL_IMPORT_PATH="/dev/disk/by-vdev:/dev/disk/by-id" to
ZPOOL_IMPORT_PATH=”/dev/disk/by-id”
update-initramfs –u

If you reboot at this point, the previous steps should be sufficient to now show long and unique drive names when you do zpool status.


Alternatively, you can do this: remove the disk and then re-attach it.
DO NOT use /dev/disk/by-id names in the following – they don’t work. There is a bug in proxmox’s implementation of zfs. It works fine on plain debian 9 but not in proxmox bare-metal
Check what it is currently using by this:
zpool status
zpool offline rpool sdc2
zpool detach rpool sdc2
zpool attach rpool sdb2 ata-mylongdrivenamec-part2
(wait for resilvering to complete – check with zpool status)
zpool offline rpool sdb2
zpool detach rpool sdb2
zpool attach rpool ata-mylongdrivenamec-part2 ata-mylongdrivenameb-part2


11. Set up SSD Caching
My system has 48GB of RAM. You can read a lot on when a ZIL (write cache) makes sense and when an L2ARC (2nd level cache) makes sense. Basically, the ZIL is not really for data security – it is always secure in ZFS, but does help with write speeds.

For ZIL, you want to buffer the fasted writes you can have – and the limiting factor is the Ethernet speed. Mine has 2 Gb ports (that’s giga-bit not giga-byte). 1 Gb/s = 128 MB/s. Times 2 = 256 MB/s and zfs writes every 5 seconds so it makes sense to butter twice as long to give a margin for error. That is basically 2.6GB of ZIL cache. You want the cache to be bullet-proof so you need some redundancy. My SSD has 4 chips so I did a raid 0+1 (2 mirrors of 2 striped chips) – this balances speed and reliability.

L2ARC is different – it is a read cache layered on top of memory. Don’t even bother with less than 32GB of RAM (48GB is a better minimum). The L2ARC keeps some file tables in actual system RAM at about a 1:10 ratio. And the best recommendations I could find said 5G per TB of disk. In my case, I pushed that a bit in my case to 4GB per TB of disk. You can see how I partitioned my SSD below. I did a RAID0 stripe – on the read cache, you are only buffering so there is no need to mirror or make it redundant.

Set up the ZIL for the boot volume
ZFS likes to have a write cache – and the cache is battery backed-up. It’s a bit more “iffy” on whether an L2ARC is needed. Primary memory provides most of what you need unless there is a whole lot coming off in sequential reads. The ZIL should be RAID 0+1 while the L2ARC should be RAID 0 strictly for speed.

a. Partition the SSD’s

I searched and searched on this – 96GB of SSD was plenty for me to do my system – I have 4 separate disk sets – 2 drives mirrored for boot, a set of 3 large drives mirrored for secure data, 2 smaller drives mirrored for less critical data, and a bunch of old 750’s raidx2 for speed. I needed 4 ZIL’s of 2.6GB. I read some stuff about “race conditions” (things completing in the wrong order) and there are certainly conceptual issues where raw speed is compromised by more than one thing talking to the same chip at the same time. But to the best of my knowledge, using separate partitions on the SSD cannot possibly introduce corruption to anything. Worst-case, something has to wait and it gets slightly slower (not a big deal since the SSD is much faster than the spindle drives).

I ended up with 8 partitinos.

get device names of SSD

ls -ll /dev/disk/by-id/ata-MARVELL*
device names are sdp,sdq,sdr,sds
cfdisk /dev/sdp
(choose gpt)
(on free space, select new, 1.3G)
(on new partition, select type, Solaris /usr & Apple ZFS)
do this 4 times for each SSD drive
(on free space, do a 9, 4, 4, and whatever is left)

(make sure and write, yes)

b. Add the ZIL (log) and L2ARC (cache) to the existing drive arrays
Here it is for the rpool (boot drives)
zpool add rpool log mirror /dev/disk/by-id/ata-MARVELL_CHIP1-part1 /dev/disk/by-id/ata-MARVELL_CHIP2-part1 mirror /dev/disk/by-id/ata-MARVELL_CHIP3-part1 /dev/disk/by-id/ata-MARVELL_CHIP4-part1
zpool add rpool cache /dev/disk/by-id/ata-MARVELL_CHIP1-part8 /dev/disk/by-id/ata-MARVELL_CHIP2-part8 /dev/disk/by-id/ata-MARVELL_CHIP3-part8 /dev/disk/by-id/ata-MARVELL_CHIP4-part8


Here I create my 6-disk (4+2) array and add log and cache to it
Again, these drives were 512 block sizes – if you are using large drives, you should dd –o ashift=12 to the following command.
zpool create raid750 raidz2 /dev/disk/by-id/ata-750DISK1/dev/disk/by-id/ata-750DISK2 /dev/disk/by-id/ata-750DISK3/dev/disk/by-id/ata-750DISK4 /dev/disk/by-id/ata-750DISK5 /dev/disk/by-id/ata-750DISK6
zpool add raid750 log mirror /dev/disk/by-id/ata-MARVELL_CHIP1-part4 /dev/disk/by-id/ata-MARVELL_CHIP2-part4 mirror /dev/disk/by-id/ata-MARVELL_CHIP3-part4 /dev/disk/by-id/ata-MARVELL_CHIP4-part4
zpool add raid750 cache /dev/disk/by-id/ata-MARVELL_CHIP1-part7 /dev/disk/by-id/ata-MARVELL_CHIP2-part7 /dev/disk/by-id/ata-MARVELL_CHIP3-part7 /dev/disk/by-id/ata-MARVELL_CHIP4-part7
zpool add storage log mirror /dev/disk/by-id/ata-MARVELL_CHIP1-part2 /dev/disk/by-id/ata-MARVELL_CHIP2-part2 mirror /dev/disk/by-id/ata-MARVELL_CHIP3-part2 /dev/disk/by-id/ata-MARVELL_CHIP4-part2
zpool add storage cache /dev/disk/by-id/ata-MARVELL_CHIP1-part5 /dev/disk/by-id/ata-MARVELL_CHIP2-part5 /dev/disk/by-id/ata-MARVELL_CHIP3-part5 /dev/disk/by-id/ata-MARVELL_CHIP4-part5


c. Test the speed
test speed with the following: (this is a 10G write test)
mkdir /speed
dd if=/dev/zero of=/speed/junk bs=1024000 count=10240

writes at 1.4G/s
dd if=/speed/junk of=/dev/null bs=1024000
reads at 4G/s

12. Connect to Proxmox and update
Use a browser at https://192.168.1.XXX:8006
Use putty to 192.168.1.XXX port 22 (SSH)
Update OS and reboot
apt-get update
apt-get upgrade-y

Do updates through the proxmox web interface as well

13. Create folders within storage to hold major classes of data
(raid750 is the name of my zpool above)
zfs create raid750/vm
zfs create raid750/media
zfs create raid750/template

a. Create storage in Proxmox:

Server View/Datacenter/Storage
Add template, Directory, /raid8t/template, VZDump Backup File, ISO Image, Container Template
Add vm, ZFS, storage/vm, Disk Image, Container and tick “Thin provisioning”

Thin Provisioning – this allows virtual machines that are defined at a particular size to only use the size they actually use – you might define a 32G disk but have 3G in use and it prevents hogging 32G on the physical disk.

b. Install some templates and ISO’s (not needed if you restored a zpool)

Server View, Datacenter, Server (click template on the left)
Can click templates and start downloading some stuff

Might need to update template list:
Server View/Datacenter/server/Shell (on the right)
pveam update

Or you can download from these places:
http://download.proxmox.com/images/system/
https://jenkins.linuxcontainers.org/view/LXC/view/LXC templates/
https://www.turnkeylinux.org/ (this shows screenshots but LXC is broken – get the actual here:
http://mirror.turnkeylinux.org/turnkeylinux/images/proxmox/

Download to PC then upload to proxmox – Use the first command and select Template for tar.xz or tar.gz files and ISO for .iso files

14. Install fileserver

a. Install the container:

i. Click on Create CT within proxmox
Walk through the screen using the following
512M RAM, 2 CPU, 8GB drive, assign 192.168.1.100/24 for address, mount turnkey-linux-fileserver, pw=yourpassword

Do not start it yet

b. Add mountpoints (you need your disk set which is visible within proxmox to be visible within the container)

i. In /etc/pve/lxc/100.conf

mp0:/raid8t/media,mp=/mnt/media
In the proxmox interface, select the container
Edit options
Set the container to start on boot.

c. Configure the file serer

i. pct start 100
pct console 100
login using the credentials on the container (root/yourpassword)
skip backup and migration
your@emailaddress for notifications
install

ii. Access locations:
https://192.168.1.100 (WebDAV (CGI)
https://192.18.1.100:12320 (Web Shell)
https://192.168.1.100:12321 (Webmin)
\\192.168.1.100 (ports 139/445) SMB/CIFS (Samba)
root@192.168.1.100 (port 22) (SSH/SFTP)
TKLBAM (backup and migration not initialized)

Login: root/yourpassword

iii. Add some samba users for windows machines to access the server
Navigate to http://192.168.1.100 and choose “Samba”
Create some unix users:

system/users and groups/create new user
user name, automatic, real name, normal password (enter it)
fix groups (especially users)

Create some file shares
Servers/Samba Windows File Sharing
Create a new file share
Choose a share name
Navigate to a directory
save
Click on it again
Define rights for users or groups
Test with Windows
 
Last edited:
Hi,

Congratulation for this tutorial, is almost very good, with some observations:

1. if I want to install any server, I also want to be capable to re-install if I will need in a short time(KISS=keep it simple GULETZ/stupid), so I think that from this point of view is not very good. But if I would need to do this setup, I will try this:
- I will use 2xSSD for proxmox install only(without zfs - so the UEFI part will be easy)
- Then I will add the rest HDDin a zfs pool
- the rest of 2xSSD I will use for zil/l2arc
- I think that UEFI is not a must to have, so I can live without this - in any case I do not see UEFI that a critical/security stuff, but maybe for others is a good to have this pain ;)

So in this case , I could do a PMX image only for 2xSSD, after I finish the instalation/configurations. In case of problems, I will can restore this image in a very short time(10-15 minutes) on other SSDs/HDDs. In your case, .... it will take hours.

2. In any case I will not dare to use the reserved SUN partition for UEFI, because if you will need to replace only one HDD, with a new one .... and you will have no lucky, then you will not be able to use this new disk because is smaller with few Mb(compared with your original HDDs)
 
  • Like
Reactions: Talion
Thanks for your guide, but there are some problems and further questions:

c. Test the speed
test speed with the following: (this is a 10G write test)
mkdir /speed
dd if=/dev/zero of=/speed/junk bs=1024000 count=10240

writes at 1.4G/s
dd if=/speed/junk of=/dev/null bs=1024000
reads at 4G/s

This is total bogus and therefore useless for ZFS. Zeros are actually not written to disk nor read from disk. Real speed tests are done with fio.

apt-get upgrade -y

You should NEVER EVER run this, you ALWAYS HAVE TO RUN dist-upgrade. This has been mentioned a thousand times.

My system has 48GB of RAM. You can read a lot on when a ZIL (write cache) makes sense and when an L2ARC (2nd level cache) makes sense. Basically, the ZIL is not really for data security – it is always secure in ZFS, but does help with write speeds.

ZIL does also not help with write speeds in general, only in special SYNC cases - and also depending on your VM cache strategy -, so most of the time you will bypass ZIL completely and have the speed of the disks.

1 Gb/s = 128 GB/s.

Wrong SI prefix: 1 Gb/s = 128 MB/s

9. Install boot information to the 2nd Mirror Drive
Install efi to the other RAID1 boot drive
mkfs.vfat /dev/sdc9
verify /boot/efi already exists or mkdir to create it
mount /dev/sdc9 /boot/efi
grub-probe -d /dev/sdc9
update-grub
grub-install -d /usr/lib/grub/x86_64-efi /dev/sdc
find /boot/efi -type f

How do you sync the information on an OS update? Because that is the main reason why UEFI is still not possible with ZFS in Proxmox VE.


But one of the reasons to use EFI is that it does support booting on drives > 2TB.

As does 'legacy' with GPT disks, so this is no real reason at all.
 
> This next step re-formats the Solaris reserved partition for FAT as required by an EFS parition

Thanks for your notes, but why would you want to overwrite two vdevs? http://www.giis.co.in/Zfs_ondiskformat.pdf Yea I know there are two more at the front, but one could use the part in L1 that's reserved for legacy boot, but format it uEFI. Its kinda small, but there is enough room and you could keep all 4 vdev.

sgdisk -Z -a1 -n1:34:2047 -t1:EF02 -c1:GRUB -a9 -n9:-8M:0 -t9:bf07 -c9:Reserved -a2 -n2:2047:0 -t2:bf01 -c2:ZFS /dev/sda
 
This is a great guide with lots of useful tricks, (some other tricks not so much)... but as noted, I would never want to do this in production - not even in my home, what if something went haywire, ie upgrade to newest rev of PVE, and you need to quickly get back up, there are a lot of commands there, therefore lots of room for mistakes along the way. Or as you said "works today in 2018", would you want to go troubleshooting a year from now when things change and have to reverse engineer what went wrong??

I dont understand your comments about booting EFI, the 80gb disks you use make UEFI pointless, and completely not worth the hassle, just set the system to legacy mode and be done with it.

The best thing to do is close to what you have done, a small pair of drives for OS, and as many disks as you can in a single pool of mirrored vdevs, with a single ZIL allocation (which may frequently go unused), the ZIL drive is a good thing, but in my experience a lot of traffic is going straight to the spinners at a slow speed. Understandably you are re-purposing hardware that was laying around or cheap (great find on the F20), and it is a great learning experience, but you might also be learning how to fix something that went terribly wrong. If you are using Samsung 750 ssds, there is no need to put a ZIL in front of these, that is just a waste of iops on your zil, and definitely waste ZIL on your OS mirror.

My typical goal is reserve front bays for mass storage and get as many spindles as I can afford up there, my standard install now is to put my PVE OS on a $25 dual port m.2 PCI card, with a pair of cheap m.2 SATA SSD like the Sam 850 Evo m.2. No UEFI needed, I have had no drive letter issues to date, and have a couple of systems were we swap a lot of different drives in/out. This eliminates about 50 of the commands you ran, and no need to boot rescue mode, etc, and if I need to blow out my PVE install I have no problems formatting that OS mirror, and being back online in 10 minutes or less.

For better performance you can put another dual port NVME card, and raid a pair of proper NVME drives in a separate pool for high demand VMs... or of course the Sun F40 looks good for ~$90.

Another option in a "home server environment" for fixing the PVE boot mirror issue is to do your base install, default LVM on a single disk, setup all storage, and then boot from a rescue disk or clonezilla, and dd the drive to a secondary. Set a daily chron job to rsync the PVE conf directories, and you can boot from that 2nd drive if the primary dies - obviously not production grade, but it sounds like you are in a home lab.

Here is a hint for tuning/benching your ZFS, run in separate windows:
Code:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/mnt/vmdatapool/fiotest --bs=4k --iodepth=64 --size=48G --readwrite=randrw --rwmixread=75

arcstat 2 250

zpool iostat -v yourpool 2 250

On a final note, my thanks goes out to you for taking the time to do this, I am sure it was countless long nights.... I will be keeping your notes, some of it may come in handy some day.
 
Thank you very much for the very fine additions and clarifications -- in all of my reading, I've never seen the note about dist-upgrade. I'll look into that.

I fixed the GB into MB (that was a typo - shame on me for it -- and I really appreciate the correction)

My personal reasons for pursuing UEFI have a lot to do with "how life ought to be" and that's pretty unique, I suppose. BIOS is really, really old and dying. The EFI user interfaces are more user-friendly and cleaner and they at least allow the possibility of swapping out boot drives for large drives. I get the whole "small SCSI with fast RPM" and I even went out and bought some old Toshiba 80GB 2.5inch drives specifically to use as boot drives and you are absolutely correct that makes UEFI vs BIOS a moot point. In fact, I am back on BIOS because I had a random corruption and could not get back in to boot on UEFI -- I think the corruption was caused during a kernel upgrade as well as a cable that got jostled when the server got moved downstairs, and I just got tired of the hassles of doing it the way I had.

No, UEFI wouldn't sync properly on updates -- that would have to be manual and it is good to point that out to the community. I just wanted to play with keeping "current" and I didn't see anyone else even try to get where I was headed -- the next nearest guide I found was 20 pages long with a full-manual install of Debian and didn't include proxmox at all.

I will certainly defer to anyone like you on Linux stuff -- I am far from an expert and I really appreciate those who take the time to educate people like me.
 
  • Like
Reactions: guletz
> This next step re-formats the Solaris reserved partition for FAT as required by an EFS parition

Actually I only had 3 on mine I think (I'm assuming you are using VDEVs and partitions interchangably) and you could certainly use the BIOS-targeted partition instead of the Solaris partition. Originally when I was experimenting, I didn't realize Proxmox would create BIOS partitions on a UEFI system -- it doesnt' when you install to a single drive, but it behaves differently when installing to a mirror through the installation interface. By then I knew I didn't care about the Solaris partitions (I'll never use sun stuff -- can someone tell me why I should care -- maybe other things need that and I ended up doing something unwise).

I didn't understand the command at the end -- I get parts of it -- and I experimented with a guide that involved manually creating partitions, but part of this guide was really to let proxmox do as much as it could itself. I'm trying to keep this system mostly "stock" because I really want maximum reliability on my OS and my data these days -- it is for family use more than hobby use and I want the high reliability and low down time. Thanks for the feedback -- I appreciate it.
 
To guletz -- Cost was a big issue to me -- I really would have liked to boot to SSD, but kind of felt I didn't need speed there -- the server's bios take nearly 5 minutes enumerating drives anyway. Conceptually I love everything you wrote -- I thought long and hard about wading into UEFI (as lots of posters here have commented on). That was more of a stupid "purist" approach and maybe shows too many of my mid-career Microsoft roots -- I got pretty used to accepting 2 fixes to 1 break and always running the latest and greatest in drivers and tech on principle. The whole reason I'm on proxmox now is that MS finally wore me out with Media Center, and Win10's approach to software Raid and their whole licensing mess on motherboard changes. It has gotten me back into command lines and Linux which because it is bullet-proof. I will absolutely accept the "slap with a wet noodle" on trying to be cute when I didn't need to be with UEFI.

When it comes to drives -- I've gotten pretty paranoid on keeping identical hardware on mirrors. If one dies and I can't get an identical model, I'd probably replace the pair on principle -- again, maybe that is MS paranoia or HW raid quirks -- ZFS is all new to me -- maybe I will learn to relax again.

To total impact -- thanks for all your great personal experience and advice -- wish I had found your comments when I was researching -- I would probably have done a lot differently.

Speaking of which: Can anyone advise on configuring snapshots, scrubs, and backing up the rpool/boot drives? I like totalimpact's suggested approach of a cron clone -- there are definitely times where timed clones rather than mirrors have saved my bacon.
 
  • Like
Reactions: guletz
I dont backup my boot devices, just rsync out my /etc/pve, and any /etc/lvm , /etc/zfs, but even those can be recovered easily.

The gui is simple enough for backups, you can use a local drive, remote nfs, etc:
https://pve.proxmox.com/wiki/Backup_and_Restore

If you have a remote (or local) zfs target for backups, then maybe pve-zsync is for you, it is 100% cli, and restore takes a bit of work, but there are several benefits, this is the only kind of snapshotting you would configure:
https://pve.proxmox.com/wiki/PVE-zsync

I didnt notice, but I think you have consumer spinners, I would scrub every 1-2 weeks, less if you have enterprise drives, ymmv.
 
Hi,
And thx. a lot for your nice words!

Speaking of which: Can anyone advise on configuring snapshots, scrubs, and backing up the rpool/boot drives? I like totalimpact's suggested approach of a cron clone -- there are definitely times where timed clones rather than mirrors have saved my bacon.

As our Forum clolegue @totalimpact wrote, I use the same ideea(more/less):

I dont backup my boot devices, just rsync out my /etc/pve, and any /etc/lvm , /etc/zfs, but even those can be recovered easily.

.... and something more: a good clonezilla image(2-3 image/year) when I use a non-zfs dedicated boot device(In case of boot problems, I could restor the last clonezilla image, make the last updates, and then resore my PMX related files)
Regarding zfs scrub, I make 1 scrub/week, and 1 smartcl -t /long /week, and a daily short smartctl. In case of zfs errors(I use a bash script that check many zfs parameters like: bad scrub results, bad vdev disk, scrub not started this week, the zpool capacity > 80%) I receive mails.

If you have a remote (or local) zfs target for backups, then maybe pve-zsync is for you, it is 100% cli, and restore takes a bit of work, but there are several benefits, this is the only kind of snapshotting you would configure:
https://pve.proxmox.com/wiki/PVE-zsync

I do the same ... ;)
 
I have a question about using your raidz as the VM storage. Do your drives ever spin down? I currently have my VM storage on the SSD I use for the proxmox host as I thought that to be the preferred method of hosting the vm's.

I currently use zfs on proxmox to pass a virtio HDD volume to a windows vm for sharing but I think I would rather move to something like the turkey fileserver container. Is it a bad idea to mount the zfs media store on other containers directly? It sounds like your recommandation would be to have the other containers mount the cifs via fstab rather than bind mount directly.
 
I have a question about using your raidz as the VM storage. Do your drives ever spin down? I currently have my VM storage on the SSD I use for the proxmox host as I thought that to be the preferred method of hosting the vm's.

IIRC, the ZIL is flushed every 5 seconds, so a time-based spin down would never really spin the disks down.
 
I'm far from an expert on your question. It's a testament to the robustness of Proxmox that from the time I first posted this thread to now I have done exactly zero maintenance of the install -- the combination of the Dell server and this configuration has been absolutely bullet proof and that is living in the country with power hiccups, storms and the like (no UPS).

Regarding your question -- I looked into spin-down when I first did the setup. For various reasons I don't think Debian configures spin-down out of the box and there were various writeups on device driver hooks and cron configurations that would force a spin-down. Supposedly electricity cost is about $1 per year per watt. This server is pretty thrifty -- fully maxed out on drives, it is pulling 217W on average. So that comes to about $18 a month in electricity cost. Spinning drives up and down is the single biggest thing that will wear out a drive (I'm using consumer 7200RPM drives not server-grade 10,000 RPM drives). Drives are ball-park $100 a piece and I've gotten 10 years of continuous use out of drives before -- usually Western Digital. Maybe I've just been lucky. The savings for spin-down in electricity would be outweighed by premature failure in the hardware and the hassle (my time) of restoring data and having to re-read on how to do that. So honestly I kind of abandoned the whole spin-down thing and just decided to have my server merrily humming along all the time.

Mounting across systems was kind of challenging from what I remember. I'm using EMBY for a media server and I hooked that into samba off the turnkey fileserver so that it was the only thing that actually mounted the raw zfs drives -- that's specifically why I went with that sort of setup.
 
Two years later and this is still a very comprehensive article on Proxmox ZFS. Thank you so much for your contributions to the community, great clear guide on your setup using ZFS. I was very intrgued to see your specific configuration of RAID as I've only used simpler setups with more identical drives and have since suffered in performance after ignoring caches.

Cheers!
 
Two years later and this is still a very comprehensive article on Proxmox ZFS. Thank you so much for your contributions to the community, great clear guide on your setup using ZFS. I was very intrgued to see your specific configuration of RAID as I've only used simpler setups with more identical drives and have since suffered in performance after ignoring caches.

Cheers!
Thanks so much for that - it is nice to know that it has helped someone. So - as my latest freebie suggestion, I recently started playing around with my home wifi setup like I did with these dirt-cheap Dell servers and Proxmox (why oh why didn't I figure out to buy cast-away enterprise stuff sooner?) For about $250 you can buy a 24 port POE managed switch, a wireless LAN controller, and 4 access points with mounts. That's about $20K worth of hardware at original prices. Insane range, reliability, and a whole other host of learning how to step back in time and use DOS-style command-line stuff (c'mon guys, web GUIs have been around in routers a whole lot longer ago than the 2012 manufacture date on this hardware). But hey, now my WiFi robustness matches my server robustness, and I'll never need to run heat to my closet where all this stuff is. Expert tip: you need more than 15W per device so get the right switch the first time so you don't have to reorder it!
 
Is this still needed in 6.3? I am just testing proxmox on a new server and trying to follow your steps I see the partition layout built by the installer is not the one described in your post and it seems that it just works out-of-the-box (I mean raid1 zfs). I am using some 2x SATA DOMs which are hot swappable and I played with them alternatively (booting) and everything is replicated correctly and recovered after a crash as well.