[SOLVED] ZFS questions working with VM storage issue?

killmasta93 · Aug 3, 2021

Hi
I was wondering if someone could shed some light on the issue im having.
Currently i have proxmox working with ZFS and a few vms, I have a VM running ubuntu with OS ext4 but created another virtual disk and within that VM also has a ZFS for snapshots (zentyal)
now here is the issue as the storage on the VM shows differently on the host
i have troubleshooted a while but cannot get it working, i think that there is an alignment issue
I first created virtual disk with volblock size 4k

Code:

rpool/data/vm-145-disk-2                                        volblocksize  4K

on my Vm zentyal i have this usage 39.4 gigs and creating ashift 12

zpool create -f -o ashift=12 -o autotrim=on data2 /dev/sdb


NAME    USED  AVAIL     REFER  MOUNTPOINT
data2  39.4G  83.6G     38.0G  /data2

but the host shows this

rpool/data/vm-145-disk-2 63.9G 1.73T 63.9G -

i then tried with 8k block size and ashift 13 on the VM but i get this

on the vm

data3 40.5G 82.6G 38.5G /data3

and on the host

rpool/data/vm-145-disk-3 57.7G 1.68T 57.7G

Any ideas?

Dunuin · Aug 3, 2021

It is generally not a good idea to run ZFS ontop of ZFS because the overhead will multiply.

Can you tell us more what your rpool looks like?

killmasta93 · Aug 3, 2021

Thanks for the reply, i did read alot but the snapshots is what really helps alots inside of the vm
my rpool is a raid z-1


root@prometheus2:~# zfs get all rpool
NAME   PROPERTY              VALUE                  SOURCE
rpool  type                  filesystem             -
rpool  creation              Wed Aug 12 15:05 2020  -
rpool  used                  5.42T                  -
rpool  available             1.66T                  -
rpool  referenced            175K                   -
rpool  compressratio         1.12x                  -
rpool  mounted               yes                    -
rpool  quota                 none                   default
rpool  reservation           none                   default
rpool  recordsize            128K                   default
rpool  mountpoint            /rpool                 default
rpool  sharenfs              off                    default
rpool  checksum              on                     default
rpool  compression           on                     local
rpool  atime                 off                    local
rpool  devices               on                     default
rpool  exec                  on                     default
rpool  setuid                on                     default
rpool  readonly              off                    default
rpool  zoned                 off                    default
rpool  snapdir               hidden                 default
rpool  aclinherit            restricted             default
rpool  createtxg             1                      -
rpool  canmount              on                     default
rpool  xattr                 on                     default
rpool  copies                1                      default
rpool  version               5                      -
rpool  utf8only              off                    -
rpool  normalization         none                   -
rpool  casesensitivity       sensitive              -
rpool  vscan                 off                    default
rpool  nbmand                off                    default
rpool  sharesmb              off                    default
rpool  refquota              none                   default
rpool  refreservation        none                   default
rpool  guid                  2047715220241130019    -
rpool  primarycache          all                    default
rpool  secondarycache        all                    default
rpool  usedbysnapshots       0B                     -
rpool  usedbydataset         175K                   -
rpool  usedbychildren        5.42T                  -
rpool  usedbyrefreservation  0B                     -
rpool  logbias               latency                default
rpool  dedup                 off                    default
rpool  mlslabel              none                   default
rpool  sync                  disabled               local
rpool  dnodesize             legacy                 default
rpool  refcompressratio      1.00x                  -
rpool  written               175K                   -
rpool  logicalused           3.63T                  -
rpool  logicalreferenced     44K                    -
rpool  volmode               default                default
rpool  filesystem_limit      none                   default
rpool  snapshot_limit        none                   default
rpool  filesystem_count      none                   default
rpool  snapshot_count        none                   default
rpool  snapdev               hidden                 default
rpool  acltype               off                    default
rpool  context               none                   default
rpool  fscontext             none                   default
rpool  defcontext            none                   default
rpool  rootcontext           none                   default
rpool  relatime              off                    default
rpool  redundant_metadata    all                    default
rpool  overlay               off                    default

Dunuin · Aug 3, 2021

How much drive does your raidz1 consist of and what ashift and volblocksize you use on the host? With raidz is it normal that everything is way to big if you don't increase the volblocksize.

For raidz1 the volblocksize should look something like this:
3 disks: 4 * block size (set by ashift; so would be 16K for ashift of 12)
4 disks: 16 * block size (so would be 64K for ashift of 12)
5 disks: 8 * block size (so would be 32K for ashift of 12)

killmasta93 · Aug 3, 2021

Thank you so much for the reply, currently my host has 8 disks


root@prometheus2:~# lsblk -o NAME,PHY-SeC
NAME      PHY-SEC
sda           512
├─sda1        512
├─sda2        512
└─sda3        512
sdb           512
├─sdb1        512
├─sdb2        512
└─sdb3        512
sdc           512
├─sdc1        512
├─sdc2        512
└─sdc3        512
sdd           512
├─sdd1        512
├─sdd2        512
└─sdd3        512
sde           512
├─sde1        512
├─sde2        512
└─sde3        512
sdf           512
├─sdf1        512
├─sdf2        512
└─sdf3        512
sdg           512
├─sdg1        512
├─sdg2        512
└─sdg3        512
sdh           512
├─sdh1        512
├─sdh2        512
└─sdh3        512

volblock size by default is 8k and the ashift is 12
i tried 4k and 8k but cant get it to show the correct values

rpool/data/vm-145-disk-1                                        volblocksize  8K        default
rpool/data/vm-145-disk-2                                        volblocksize  4K        -

root@prometheus2:~# zpool get all | grep ashift
rpool  ashift                         12                             local

Dunuin · Aug 3, 2021

Look at this spreadsheet. With a raidz1 of 8 disks and ashift of 12 you would loose:
volblocksize 4K/8K = 50% raw capacity lost
volblocksize 16K = 33% raw capacity lost
volblocksize 32K/64K = 20% raw capacity lost
volblocksize 128K = 16% raw capacity lost
volblocksize 256/512K = 14% raw capacity lost
volblocksize 1M= 13% raw capacity lost

So right now with a volblocksize of 8K you loose 50% of your raw statorage due to parity and padding overhead. So I would atleast increase it to 32K or use a striped mirror for better write IOPS but less capacity instead.

killmasta93 · Aug 4, 2021

thanks for the reply, so i would need to use block size 32k for that rpool/data/vm-145-disk-2 so within the VM it would show the correct storage?
as for the zfs pool inside of the VM would it need to be ashift 12? and the same block size?

Dunuin · Aug 4, 2021

killmasta93 said:
thanks for the reply, so i would need to use block size 32k for that rpool/data/vm-145-disk-2 so within the VM it would show the correct storage?

Correct. But volblocksize can only be set at the creation of zvols. So you need to destroy and recreate all virtual disks (can easily be done by backing up VMs and restoring them afterwards) after changing the volblocksize for your rpool (Datacenter -> Storage -> rpool -> Edit -> Block size). But it will always be a bit bigger unless you use a volblocksize of 1M, because with a volblocksize of 32K you will still loose 7% of raw storage due to bad padding. But should be way better than now.

killmasta93 said:
as for the zfs pool inside of the VM would it need to be ashift 12? and the same block size?

I think so.

killmasta93 · Aug 6, 2021

Dunuin said:
Correct. But volblocksize can only be set at the creation of zvols. So you need to destroy and recreate all virtual disks (can easily be done by backing up VMs and restoring them afterwards) after changing the volblocksize for your rpool (Datacenter -> Storage -> rpool -> Edit -> Block size). But it will always be a bit bigger unless you use a volblocksize of 1M, because with a volblocksize of 32K you will still loose 7% of raw storage due to bad padding. But should be way better than now.

I think so.

so i think i solved this issue, so i had to create volblocksize 64k and on the VM i have to add ashift 16 and now it seems to be showing the data correctly

killmasta93 · Aug 7, 2021

Dunuin said:
Correct. But volblocksize can only be set at the creation of zvols. So you need to destroy and recreate all virtual disks (can easily be done by backing up VMs and restoring them afterwards) after changing the volblocksize for your rpool (Datacenter -> Storage -> rpool -> Edit -> Block size). But it will always be a bit bigger unless you use a volblocksize of 1M, because with a volblocksize of 32K you will still loose 7% of raw storage due to bad padding. But should be way better than now.

I think so.

i was curious how come by default volblock size on Proxmox is 8k when all the vms normally are 4k the NTFS windows or ext4 woulnt there always be an alightment issue for space? or its it better 8k?

Dunuin · Aug 7, 2021

killmasta93 said:
i was curious how come by default volblock size on Proxmox is 8k when all the vms normally are 4k the NTFS windows or ext4 woulnt there always be an alightment issue for space? or its it better 8k?

ZFS was initially made for Solaris and there 8K is the default block size. Thats why 8K is the default blocksize for ZFS.

killmasta93 · Aug 7, 2021

do these blocksize matter in the I/OPS? so what i have been seeing is that normally there is going to be an alignment issue as normally windows and linux have 4k by default (except when install MSSQL which should be NTFS 64K)

Dunuin · Aug 7, 2021

Yes, if you for example use a volblocksize of 64k everything with a lower blocksize than 64K will get terrible overhead. So your 64K might be ok for stuff like videos and fotos but terrible for VMs, databases and so on.

killmasta93 · Aug 7, 2021

thanks for the reply, but what is the VM data storage of a mssql is 64k NTFS wouldn't the volblocksize on proxmox should be the same?
So the lower the volblocksize on zfs Proxmox the better?

Dunuin · Aug 7, 2021

If your VM only writes as 64K blocks thats fine. But if it for example also uses 4K for some stuff you get a 16 times higher write amplification and 16 times the read overhead. So your read/write performance for 4K will be 16 times worse too and your SSD wil die 16 times faster.

killmasta93 · Aug 8, 2021

Thanks for the reply, so if i understood correctly its always best to leave 8k volblocksize on proxmox and leave by default the windows NTFS and linux ext4 as its 4k volblocksize, and if i change on windows NTFS storage 64k i would leave on proxmox the 8k volblocksize

Dunuin · Aug 8, 2021

killmasta93 said:
Thanks for the reply, so if i understood correctly its always best to leave 8k volblocksize on proxmox and leave by default the windows NTFS and linux ext4 as its 4k volblocksize, and if i change on windows NTFS storage 64k i would leave on proxmox the 8k volblocksize

You can't just choose any volblocksize. Whats possible and whats not depends on your pool layout. With your 8 disk raidz1 everything below 32K volblocksize is basically too much wasted capacity. If you really want to use a 8K volblocksize you would need to create two separate striped mirrors (raid10) of 4 drives. If you don't want 2 pools the lowest useful volblocksize would be 16K with a striped mirror of 8 disks. All of cause is only viable if you use a ashift of 12. As soon as you increase your ashift you also need to increase your volblocksize. Increasing the volblocksize wont help if you also increase the ashift. If you increase your ashift from 12 to 16 you also need to increase the volblocksize by factor 4.
So your 64K volblocksize + ashift 16 raidz1 is basically the same as a 16K volblocksize with ashift 12 so you still loose 33% raw capacity.

I made a benchmark yesterday where I tested reads/writes with 4K/16K/32K/4096K block sizes to my 32K raidz1 pool. Here you can see for example how much the overhead increases if you try to read/write with a block size that is smaller than the volblocksize.
I need to do more benchmarks but I will most likely delete my raidz1 pool and create a striped mirror of 4 drives with a volblocksize of 16K for normal VMs + a mirror auf 2 drives with a volblocksize of 4K for my VMs heavily utilizing DBs so the volblocksize is as low as possible. I hope that will decrease my high write amplification so the drives will live longer and hopefully also get a better performance.

killmasta93 · Aug 10, 2021

interesting, did not know about the 8 diskz1 everything below 32k is a waste, so it also depends on how many disks of the server having and if its a raid10 or raid z1
Lets say i raid 10 striped mirror ( 4disks) and running a MSSQL VM the recomended virtual disk would be 4k?

Dunuin · Aug 10, 2021

killmasta93 said:
interesting, did not know about the 8 diskz1 everything below 32k is a waste, so it also depends on how many disks of the server having and if its a raid10 or raid z1
Lets say i raid 10 striped mirror ( 4disks) and running a MSSQL VM the recomended virtual disk would be 4k?

For striped stuff it should be "data bearing disks * blocksize of your drives = volblocksize" So if using ashift=12 for that pool and 4 drives as striped mirror that would be 2 * 4K = 8K volblocksize.

killmasta93 · Aug 11, 2021

quick question when running SSD is there a rule of thumb? when configuring proxmox? as i see when configuring VM there is an option for SSD?

[SOLVED] ZFS questions working with VM storage issue?

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member