Hamtpon's tutorial on ZFS RAIDZ + SSD

stefanzman

Renowned Member
Jan 12, 2013
46
0
71
USA - Kansas City
www.ice-sys.com
Hello,

I wanted to get some feedback on Hampton's published tutorial - http://ghost2-k4kfh.rhcloud.com/proxmox-with-zfs-raidz-ssd-caching/

We are looking a create very similar systems with a single 60gb SSD and 4 physical drives, and this procedure seems like a good fit. So I am curious as to whether others in forum are in agreement with the detailed plan that Hampton laid out here.

- Is it still correct for PVE 5.0 or have recent changes rendered the procedure invalid?
- Are there problems using the single SSD, and is it necessary to manually reclaim space?
- Mentions vm.swappiness early on. It is OK to have swap file on the SSD?
- Is his partitioning optimal / recommended for the 64gb SSD:
  • 8GB ZFS Log partition
  • 16GB ZFS cache partition
  • 8GB Linux swap partition
  • 16GB "VZ data" partition (set in Proxmox installer)
  • 16GB Linux / (root) partition
- OK to have PVE OS with Ext4 on SSD or would other format be preferred?
- Does he use the best way to create and add ZFS physical disk array?

I was generally very impressed and appreciative of Hampton's work, but just want to make sure it is current and accurate.

Thanks much for any input.
 
Hi,
At a first sight, that tutorial is good to read it, has some good parts, but also had some bad parts, like:
- he use sdX for pool creation (very bad), without compression
- 64 GB SSD is very risky if you want to have slog and cache for zfs
- proxmox write a lot of data in various places(like /var/log), so your SSD will live not so long as you thinking (are many posts about this)
- you will need some tuning options for zfs and I will recommend to not use zil and cache on the same SSD(size < 64 GB)
 
Thank you, Guletz.

For this particular DELL server, it is difficult to install / setup a mult-disk array for the PVE OS separate from the 4 disk physical array. It is a 1U chassis that does not have much expansion options. So I may be stuck with a single disk (albeit not ideal) Would a 120gb or 240gb be workable? I was going to run 3 or 4 smaller VMs on the server, and it will not be extraordinarily active. Do you think that a 120g HD can last 12 months or more? The server has 32g RAM. Does this need to upgraded?

I did some searches in forum on sdX, SSDs and zfs tuning. There are quite a few posts, and I am not entirely sure which one's are relevant to this setup. Can can point me to some good posts on these topics?
 
Guletz - on SSDs, I found this TOPIC.

Within it, you said:

“I have several servers that use that use consumer SSD (including 120 G Kingston) for zfs cache and zil, and all are usable even now. I also have some proxmox nodes with consumers SSD for proxmox os. But the /tmp and /var/log are symlink to the zfs datasets (spinning HDD), and after 1/2 year usage, the health status is 99.0 %.”

Should I do that same with a 120g SSD drive and simlinking /tmp and /var/log to an area on the ZFS RAID10 array on the physical disks?

Will the zfs cache and zil be created on the physical disk array?
 
Did some more looking around and found the ZFS Tips and Tricks Wiki.

From that I assume that Hampton's "zpool create" command should not have referenced the /dev/x as the targets and also included the "compression=on" switch.

Is the Alignment Switch value set to 12 advisable?

Based on the reply above, we would be simlinking /tmp and /var/log from the SSD to handle the PVE OS logging. But there would be still be 2 partitions on the SSD for ZIL and LARC.
 
Should I do that same with a 120g SSD drive and simlinking /tmp and /var/log to an area on the ZFS RAID10 array on the physical disks?

Yes. Even more, /var/lib/rrdcached, coul be symlinked also, somewere on the zfs dataset. You will need to change the /etc/default/rrdcached for mach your dataset. I use mypool/var-lib-rrdcached, myzpool/tmp, myzpool/var-log.
Is the Alignment Switch value set to 12 advisable?

Most of the HDD are using 4k/ashift 12.

Will the zfs cache and zil be created on the physical disk array?

No. zfs cache/zil must be created on a dedicated SSD. Or it could be on the OS SSD, but this will make this SSD to have a not so long life, especially is the SSD has a small size. The zfs zil is usefull when you have many SYNCRON writes(NFS server as example), so I guess that in your case(few VM) will not help.

Zfs cache can be usefull, but depends .....you can use arcstat/arc_summary commands to see if it is OK or not for your case, after few days of usage.
 
From that I assume that Hampton's "zpool create" command should not have referenced the /dev/x as the targets

Find your HDD by serial number with this:

ls -l /dev/disk/by-id

- and then create your pool like this(replace mirror with your case)

zpool create -f rpool -o ashift=12 mirror -o cachefile=none -O atime=off -O compression=lz4 /dev/disk/by-id/ata-HGST_HDNxxxxxxxxx_Pyyyyyyyy /dev/disk/by-id/ata-ST4000VN0001-tttttttt_zzzzzzzz