Native ZFS for Linux on Proxmox

Re: Repartition Proxmox to make a ZFS slice

How is this going for everyone?

I am VERY interested in this as my proxmox server doubles at the houses file server. I am interested/concerned about performance, but testing so far is not scaring me off (test box)

You could always create a block device in a zpool and them make that an LVM for KVM guests to use and then use zfs on the rest of the pool. If indeed guest VMs aren't liking being on a ZFS file system.
Then use a ZFS file system for storing backups etc.

I didn't find that KVM guests to be much of a problem on ZFS. You have to set the write back cache option for the drive image. There are some tuning issues to get the fsync rate acceptable that's about it.

My main problem is I can't figure out how to get live migration working. I had a bad experience with DRBD recently on Proxmox, so I am looking for alternatives. I can have a zfs volume replicate pretty fast to another server using zfs send and zfs recv, similar to described in this blog post http://cuddletech.com/blog/pivot/entry.php?id=984 but I can't figure out how to suspend the running VM, and start it on the other server without rebooting the suspended VM. I know that not a ZFS issue, just part of the overall picture.

What I am trying to achieve is a method of live migration that uses reasonably fast replication rather than shared storage which could be seen as a single point of failure. It doesn't matter to me that a VM is suspended for the time it takes ZFS to replicate recent changes, it's does matter if I have to use the current offline migration which takes a very long time to export a backup archive, transfer it, and import it.
 
Last edited:
Re: Repartition Proxmox to make a ZFS slice

I didn't find that KVM guests to be much of a problem on ZFS. You have to set the write back cache option for the drive image. There are some tuning issues to get the fsync rate acceptable that's about it.

Great! I think I will go ahead with this. I may go with the idea of creating a block device in my zpool for VMs and the rest ZFS for file storage. I will do some performance testing before hand.

So OpenVZ works too, but without quotas?
 
Re: Repartition Proxmox to make a ZFS slice

Great! I think I will go ahead with this. I may go with the idea of creating a block device in my zpool for VMs and the rest ZFS for file storage. I will do some performance testing before hand.

So OpenVZ works too, but without quotas?

No idea, I don't use OpenVZ, only KVM.
 
Re: Repartition Proxmox to make a ZFS slice

No idea, I don't use OpenVZ, only KVM.

Cool. I was hoping for Nemesiz to respond :)

So Erk, if you don't use OpenVZ, then you could use FreeBSD with native ZFS and use VirtualBox with the VirtualBox Web package (phpvirtualbox) that is in ports. I have never used phpvirtualbox so not sure how it is.

Just putting that out there.
 
Last edited:
Re: Repartition Proxmox to make a ZFS slice

Cool. I was hoping for Nemesiz to respond :)

So Erk, if you don't use OpenVZ, then you could use FreeBSD with native ZFS and use VirtualBox with the VirtualBox Web package (phpvirtualbox) that is in ports. I have never used phpvirtualbox so not sure how it is.

Just putting that out there.

I tried that, very messy. Virtualbox has some nice features, but overall the OSE version that FreeBSD uses is a pain.

I like Proxmox, my only problem at the moment is some kind of semi-live migration without using shared storage. I don't consider a ZFS replication of a snapshot to be shared storage, even though the snapshot might be only 30sec old, though it's good enough for my purposes.

I tried DRBD recently and the cluster lost the plot. Each server had identical second 1TB drives with two partitions, the first on each was the DRBD partition where I would put VM's, the second was a normal ext4 part where nightly local vzdump backups would go. When the nightly backup started it corrupted the DRDB partition, and the ext3 partition. The machines rebooted themselves due to DRBD having some sort of a reboot failsafe. One of them couldn't fsck the ext4 partition, so it just sat at a prompt wanting manual intervention, the other machine would get to the part where DRBD load, then reboot itself in an infinite loop as it couldn't see the other DRBD machine. I decided DRBD was not ready for prime time. I don't like NFS or iSCSI as they are a single point of failure, and I eventually intend to replicate to a DR site several km away. When I did tests with offline migration, it took like half and hour to move one VM, I think it was about 80GB, that was not fast enough. The data doesn't change that quickly, mostly email, but on several VM's ZFS replication should do fine, just trying to work out how to save a memory snapshot. I have posted another thread on the question
 
Re: Repartition Proxmox to make a ZFS slice

dkms - good stuff. Its the same as build manually just its recompile needed modules on kernel changes.

REQUEST to Proxmox team: please add this line to pve-header /DEBIAN/postinst

ln -sf /usr/src/linux-headers-2.6.32-13-pve /lib/modules/2.6.32-13-pve/build
+ln -sf /usr/src/linux-headers-2.6.32-13-pve /lib/modules/2.6.32-13-pve/source

ZFS has a good things like sub file systems. That makes ease to maintenance compression, snapshots, disk size, checksum, synchronization and so on. I have created sub file systems for all VPS.

data_zfs 219G 237G 33.3K /media/data_zfs
data_zfs/images 181G 237G 62.6K /media/data_zfs/images
data_zfs/images/108 4.49G 237G 4.49G /media/data_zfs/images/108
data_zfs/images/113 51.2G 237G 45.4G /media/data_zfs/images/113
data_zfs/images/116 16.0G 237G 16.0G /media/data_zfs/images/116
data_zfs/images/200 52.4G 237G 48.3G /media/data_zfs/images/200
data_zfs/images/201 21.5G 237G 21.5G /media/data_zfs/images/201
data_zfs/images/202 4.66G 237G 4.66G /media/data_zfs/images/202
data_zfs/images/203 6.77G 237G 6.77G /media/data_zfs/images/203
data_zfs/images/204 20.2M 237G 20.2M /media/data_zfs/images/204
data_zfs/images/205 19.8M 237G 19.8M /media/data_zfs/images/205
data_zfs/images/206 1.33G 237G 1.33G /media/data_zfs/images/206
data_zfs/images/300 4.29G 237G 4.29G /media/data_zfs/images/300
data_zfs/images/301 4.26G 237G 4.26G /media/data_zfs/images/301
data_zfs/images/302 4.44G 237G 4.44G /media/data_zfs/images/302
data_zfs/images/303 4.29G 237G 4.29G /media/data_zfs/images/303
data_zfs/images/304 4.30G 237G 4.30G /media/data_zfs/images/304
data_zfs/images/305 53.3K 237G 53.3K /media/data_zfs/images/305
data_zfs/images/306 956M 237G 956M /media/data_zfs/images/306
data_zfs/private 37.8G 237G 71.9K /media/data_zfs/private
data_zfs/private/100 287M 237G 287M /media/data_zfs/private/100
data_zfs/private/101 3.55G 237G 3.55G /media/data_zfs/private/101
data_zfs/private/102 5.96G 237G 5.87G /media/data_zfs/private/102
data_zfs/private/103 11.9G 237G 11.9G /media/data_zfs/private/103
data_zfs/private/104 485M 237G 485M /media/data_zfs/private/104
data_zfs/private/105 436M 237G 436M /media/data_zfs/private/105
data_zfs/private/106 5.59G 237G 5.59G /media/data_zfs/private/106
data_zfs/private/107 354M 237G 354M /media/data_zfs/private/107
data_zfs/private/109 1.76G 237G 1.76G /media/data_zfs/private/109
data_zfs/private/110 273M 237G 273M /media/data_zfs/private/110
data_zfs/private/111 5.46G 237G 5.46G /media/data_zfs/private/111
data_zfs/private/112 603M 237G 603M /media/data_zfs/private/112
data_zfs/private/114 465M 237G 465M /media/data_zfs/private/114
data_zfs/private/115 327M 237G 327M /media/data_zfs/private/115

Now i can set disk size for OpenVZ. To see how much disk size are used insice OpenVZ container need to recalculate quota 'vzctl quotainit <ctid>'. But its works only then VPS is off.

example:

root@test:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/simfs 4.0G 128K 4.0G 1% /
tmpfs 256M 0 256M 0% /lib/init/rw
tmpfs 256M 0 256M 0% /dev/shm
root@test:/# du -hsx /
257M /

# vzctl quotainit 110
vzquota : (warning) Quota file exists, it will be overwritten

root@test:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/simfs 4.0G 257M 3.8G 7% /
tmpfs 256M 0 256M 0% /lib/init/rw
tmpfs 256M 0 256M 0% /dev/shm
root@test:/# du -hsx /
257M /
 
Re: Repartition Proxmox to make a ZFS slice

I'm running 2 vz and 3 KVM on ZFS without any issues.
I admit I'm close to insanity because I use ZFS for compression and deduplication.

While I got a fabulous 1.71 for vz and kvm images I must say that performance is really low. (like days/kb) but its ok for my testing. Don't try that at home ;-)

Only thing I'm not able to solve yet is that pve seems to start before my ZFS and therefore when I reboot its not able to mount the zfs after proxmox started to write blank data into.

I wasn't yet able to find out what to move since I have rc2.d/S01zfs-mount which is pretty much the first to load.
 
Re: Repartition Proxmox to make a ZFS slice

be carefull with dedup, you need ram, maybe a lot of ram, depend the block size you use.

For each zfs block, with dedup, you need around 300 bytes memory.

I don't use dedup it with my zfs nexenta san for my vm, they are too much overhead. But compression works really fine.
(I'm around 40000iops)
 
Re: Repartition Proxmox to make a ZFS slice

Only thing I'm not able to solve yet is that pve seems to start before my ZFS and therefore when I reboot its not able to mount the zfs after proxmox started to write blank data into.

I wasn't yet able to find out what to move since I have rc2.d/S01zfs-mount which is pretty much the first to load.

By default zfs starts by number 20. 01 is to early i believe.

Check then vz and pve services starts.

I don't use dedup it with my zfs nexenta san for my vm, they are too much overhead. But compression works really fine.
(I'm around 40000iops)

Can you give a details about your nexenta ?
 
Re: Repartition Proxmox to make a ZFS slice

I'm running 2 vz and 3 KVM on ZFS without any issues.
Only thing I'm not able to solve yet is that pve seems to start before my ZFS and therefore when I reboot its not able to mount the zfs after proxmox started to write blank data into.

I wasn't yet able to find out what to move since I have rc2.d/S01zfs-mount which is pretty much the first to load.

Yesterday I did upgrade proxmox. After reboot zfs wasnt mounted. I had changed /etc/default/zfs to make mount option enable and changed /etc/runlevel.conf (I`m using file-rc startup system)

[TABLE="width: 500"]
[TR]
[TD]<sort>
[/TD]
[TD]<off->
[/TD]
[TD]<on-levels>
[/TD]
[TD]<command>
[/TD]
[/TR]
[TR]
[TD]13
[/TD]
[TD]0,1,6
[/TD]
[TD]2,3,4,5
[/TD]
[TD]/etc/init.d/zfs-mount
[/TD]
[/TR]
[TR]
[TD]13
[/TD]
[TD]0,1,6
[/TD]
[TD]2,3,4,5
[/TD]
[TD]/etc/init.d/zfs-share
[/TD]
[/TR]
[TR]
[TD]21
[/TD]
[TD]-
[/TD]
[TD]2,3,4,5
[/TD]
[TD]/etc/init.d/pvestatd
[/TD]
[/TR]
[TR]
[TD]25
[/TD]
[TD]-
[/TD]
[TD]2,3,4,5
[/TD]
[TD]/etc/init.d/pve-manager
[/TD]
[/TR]
[/TABLE]

/etc/init.d/pvestatd are responsible for creating private | images and others directories
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!