Host migration practices without a SAN

Gurn_Blanston

Member
May 13, 2016
44
0
6
55
Hello Everyone,

i have been playing with PVE for over a year now first in a lab on older Dell Precisions with at most 8 7200 RPM spindles per hosts until I arrived at what I thought was a workable setup. Then I got two higher end Supermicro 4 U chassis with lots of RAM, 40 CPU cores per host and 36 disk slots per host. 24 of the disk slots and controlled by three LSI 9300 HBAs to provide direct IT access to the disks for ZFS. There is a third host but it is mainly for a quorum and to provide a place for ISOs and ad hoc vzdump backups. Each host has Mellanox Infiniband HCAs, although used and not especially new. I meant to use the IB for storage traffic and have a Mellanox IS5030 switch for this purpose.

In the lab, I used zfs datasets for the images in .raw format because this seemed like the option for best performance while also giving me native ZFS snapshots. Also, by using local storage over SAS instead of iSCSI, I expected to get much better storage performance for the VMs.

The weakness of this approach is that migrating virtual disks requires actually moving the disk from one host's storage to another host's storage over some sort of network connection.

In my lab environment I experimented with a few options.
  • DRDB
  • NFS
  • PVE-Zsync
I should mention that one of the essential requirements of this cluster is that we use ZFS. This isn't "my" requirement but rather my customer's. Having played with ZFS for over a year now, I have to say I think it is a pretty good one and I have gotten reasonably familiar with its features and configuration.

However, by using Zpools instead of LVM or bare disks, DRBD becomes pretty complicated. I was able to get it working but it requires that you create zvol block devices that you then turn into DRBD block devices. Once you have DRBD working, you then have to make the DRBD block device into an LVM PV. Then you add that PV's Volume Group to the PVE Web GUI. It worked in my lab but only over a 1 Gb/s Ethernet connection so it wasn't especially fast. Also, how much IO am I losing to overhead? My PV is a DRBD block device, which is made out of a zvol block device which is on a zpool. DRBD is designed to use an actual physical disk as the block device! Anyway, I decided that DRBD was not going to be a high performance solution and it would be nightmarish to troubleshoot and support. So scratch DRBD.

PVE-Zsync also worked in general but you have to do some manual config editing at the host's console if you want to use it to migrate a virtual machine between hosts. Under the hood, it uses ZFS Send/Receive and Cron with ZFS snapshots to periodically synchronize the virtual machines. I believe ZFS Send uses SCP for network transport so it isn't especially fast compared to other options such as...

NFS- This seems to be the networked volume solution of choice for PVE. However, in my new environment, it is extremely unstable. I can't figure out why. I have another forum thread going on this subject and have just about given up on NFS. It works OK over 1 Gb Ethernet as it did in my lab but when I try and push it at Infiniband speeds or at 10Gbps Ethernet (I have both options) it is able to write to the NFS Export for a short time, then it for some reason stops for ten minutes or more then starts working again for a short time then stops...During these lapses, PVE's storage system is unable to "see" the export mounted on /mnt/pve/nfssharename. I can ping the IP of the NFS host but if I try to do an "ls" of the mount my terminal locks up until the share mysteriously comes back. This is a big blow to my design! This happens on Infiniband as well as 10G Ether straight piped between hosts. So,

Does anyone have a better idea on how to achieve vm migrations? I have just found out that PVE can do native ZFS send/receive if you are careful about how you name your storage but this will try to use the VMBR0 bridge interface, rather than my desired high speed IB interface.

I have not tried Ceph but can imagine that it isn't the fasted thing in the world. My zpools are using pretty fast SSDs and in terms of just copying files around on the local host work great. I get write speeds on the order of 1000 MB/s and I want to enjoy this benefit for backups, migrations, etc.

How about going back to iSCSI? Can I make each host an ISCSI server and host the vms on storage added to PVE GUI as iSCSI? Will the hosts be smart enough to see the block devices wether it is on the same host or on the other host? How much of a performance penalty is iSCSI going to add? Anyone have experience with this in actual production?

Thanks to anyone who has read this far and even more thanks to anyone who can offer some fatherly advice.

GB
 
This is a big blow to my design! This happens on Infiniband as well as 10G Ether straight piped between hosts. So,
How do you have your infiniband nics configured on the hosts?

For optimal TCP performance you should have something like this:
Code:
auto ib<n>
iface ib<n> inet static
        address w.x.y.z
        netmask w.x.y.z
        pre-up echo connected > /sys/class/net/ib<n>/mode
        mtu 65520
 
Here is what is in my /etc/network/interfaces:

Code:
auto ib0
iface ib0 inet static
  address  10.0.0.60
  netmask  255.255.255.0
  pre-up modprobe ib_ipoib
  pre-up echo connected > /sys/class/net/ib0/mode
  mtu 65520

As I said, I get the same issue with 10GbaseT Ethernet so I am no longer blaming this on Infiniband per se. Do you have experience with NFS on IPoIB? If so, has it been solid?

I could spend another month iterating through all the variables and I don't have a month so what I need is an alternative. Ideally, something faster than SCP and that sidesteps whatever weirdness is going on with NFS. I don't think I have the deep system knowledge or experience needed to trace the root of my NFS issue any time soon. If CIFS/SMB can't work without an authentication server then it isn't going to like working on my isolated IB network. I have to verify if this is so but I can't imagine otherwise. I am going to investigate this angle just in case I can find a way to do without a Samba/AD server. I am not interested in putting one in the storage network just because of the complication it adds although it might actually work.

Is anyone using iSCSI in a "two host" cluster with no SAN? From what I have read about ZFS over iSCSI, you would need a SAN, in fact, only Solaris is recommended according to the "Storage Model" docs so I don't think iSCSI is going to help me. Looks like I have painted myself into an corner made out of NFS.

GB
 
My NFS servers are the PVE hosts themselves. Could that be the underlying issue? Is sharing certain of my zfs filesystems via the "sharenfs" option a bad practice? I am exploring another avenue for doing vm migrations between cluster hosts. I believe it was Fabian who turned me on to the idea that if you use the same zpool and dataset names on each host then PVE will use ZFS Send/Receive to migrate the images from one host to another. By default this happens over SCP, which is then in turn limited in performance by how fast your CPU core can encrypt/decrypt. This limitation can be eliminiated by adding a line to the datacenter.cfg:

migration_unsecure:1

I still need a way to do VZDump backups though. Right now, doing these to a ZFS dataset shared out by NFS has continuous time outs in the worst way making it pretty it pretty much unusable. Sometimes it looks like it is some sort of intermittent hardware fault causing the timeout but I have none of these timeout issues when I use any other mechanism to access the zpools. SCP is fairly fast and local copies are super duper fast and there are no I/O errors under these circumstances. Is NFS/ZFS on PVE Kernel a less than bulletproof implementation? I mean that ZoL is still not entirely ported over from Solaris (features and performance wise) and maybe it just isn't stable at IB network speeds? I believe I have this problem with non-ZFS storage. I have to confirm this because I can't remember for sure. On one of the hosts I have a four disk hardware RAID 10 volume that I could try backing up to over NFS. I will confirm tomorrow.

I am thinking out loud, here, not bashing ZoL. Just trying to decide whether it is worth throwing more time and effort at trying to make NFS work. I could theoretically try other NFS servers but I would rather not have to buy additional hardware. It was attractive to just have the two big hosts back up to each other on dedicated backup zpools shared by NFS rather than a third party NFS server.
 
I have just discovered that there might be an issue with NFS and IPoIB with kernel 4.4.10. Try latest kernel which is 4.4.13 to see whether this solves your problems.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!