Hello Everyone,
i have been playing with PVE for over a year now first in a lab on older Dell Precisions with at most 8 7200 RPM spindles per hosts until I arrived at what I thought was a workable setup. Then I got two higher end Supermicro 4 U chassis with lots of RAM, 40 CPU cores per host and 36 disk slots per host. 24 of the disk slots and controlled by three LSI 9300 HBAs to provide direct IT access to the disks for ZFS. There is a third host but it is mainly for a quorum and to provide a place for ISOs and ad hoc vzdump backups. Each host has Mellanox Infiniband HCAs, although used and not especially new. I meant to use the IB for storage traffic and have a Mellanox IS5030 switch for this purpose.
In the lab, I used zfs datasets for the images in .raw format because this seemed like the option for best performance while also giving me native ZFS snapshots. Also, by using local storage over SAS instead of iSCSI, I expected to get much better storage performance for the VMs.
The weakness of this approach is that migrating virtual disks requires actually moving the disk from one host's storage to another host's storage over some sort of network connection.
In my lab environment I experimented with a few options.
However, by using Zpools instead of LVM or bare disks, DRBD becomes pretty complicated. I was able to get it working but it requires that you create zvol block devices that you then turn into DRBD block devices. Once you have DRBD working, you then have to make the DRBD block device into an LVM PV. Then you add that PV's Volume Group to the PVE Web GUI. It worked in my lab but only over a 1 Gb/s Ethernet connection so it wasn't especially fast. Also, how much IO am I losing to overhead? My PV is a DRBD block device, which is made out of a zvol block device which is on a zpool. DRBD is designed to use an actual physical disk as the block device! Anyway, I decided that DRBD was not going to be a high performance solution and it would be nightmarish to troubleshoot and support. So scratch DRBD.
PVE-Zsync also worked in general but you have to do some manual config editing at the host's console if you want to use it to migrate a virtual machine between hosts. Under the hood, it uses ZFS Send/Receive and Cron with ZFS snapshots to periodically synchronize the virtual machines. I believe ZFS Send uses SCP for network transport so it isn't especially fast compared to other options such as...
NFS- This seems to be the networked volume solution of choice for PVE. However, in my new environment, it is extremely unstable. I can't figure out why. I have another forum thread going on this subject and have just about given up on NFS. It works OK over 1 Gb Ethernet as it did in my lab but when I try and push it at Infiniband speeds or at 10Gbps Ethernet (I have both options) it is able to write to the NFS Export for a short time, then it for some reason stops for ten minutes or more then starts working again for a short time then stops...During these lapses, PVE's storage system is unable to "see" the export mounted on /mnt/pve/nfssharename. I can ping the IP of the NFS host but if I try to do an "ls" of the mount my terminal locks up until the share mysteriously comes back. This is a big blow to my design! This happens on Infiniband as well as 10G Ether straight piped between hosts. So,
Does anyone have a better idea on how to achieve vm migrations? I have just found out that PVE can do native ZFS send/receive if you are careful about how you name your storage but this will try to use the VMBR0 bridge interface, rather than my desired high speed IB interface.
I have not tried Ceph but can imagine that it isn't the fasted thing in the world. My zpools are using pretty fast SSDs and in terms of just copying files around on the local host work great. I get write speeds on the order of 1000 MB/s and I want to enjoy this benefit for backups, migrations, etc.
How about going back to iSCSI? Can I make each host an ISCSI server and host the vms on storage added to PVE GUI as iSCSI? Will the hosts be smart enough to see the block devices wether it is on the same host or on the other host? How much of a performance penalty is iSCSI going to add? Anyone have experience with this in actual production?
Thanks to anyone who has read this far and even more thanks to anyone who can offer some fatherly advice.
GB
i have been playing with PVE for over a year now first in a lab on older Dell Precisions with at most 8 7200 RPM spindles per hosts until I arrived at what I thought was a workable setup. Then I got two higher end Supermicro 4 U chassis with lots of RAM, 40 CPU cores per host and 36 disk slots per host. 24 of the disk slots and controlled by three LSI 9300 HBAs to provide direct IT access to the disks for ZFS. There is a third host but it is mainly for a quorum and to provide a place for ISOs and ad hoc vzdump backups. Each host has Mellanox Infiniband HCAs, although used and not especially new. I meant to use the IB for storage traffic and have a Mellanox IS5030 switch for this purpose.
In the lab, I used zfs datasets for the images in .raw format because this seemed like the option for best performance while also giving me native ZFS snapshots. Also, by using local storage over SAS instead of iSCSI, I expected to get much better storage performance for the VMs.
The weakness of this approach is that migrating virtual disks requires actually moving the disk from one host's storage to another host's storage over some sort of network connection.
In my lab environment I experimented with a few options.
- DRDB
- NFS
- PVE-Zsync
However, by using Zpools instead of LVM or bare disks, DRBD becomes pretty complicated. I was able to get it working but it requires that you create zvol block devices that you then turn into DRBD block devices. Once you have DRBD working, you then have to make the DRBD block device into an LVM PV. Then you add that PV's Volume Group to the PVE Web GUI. It worked in my lab but only over a 1 Gb/s Ethernet connection so it wasn't especially fast. Also, how much IO am I losing to overhead? My PV is a DRBD block device, which is made out of a zvol block device which is on a zpool. DRBD is designed to use an actual physical disk as the block device! Anyway, I decided that DRBD was not going to be a high performance solution and it would be nightmarish to troubleshoot and support. So scratch DRBD.
PVE-Zsync also worked in general but you have to do some manual config editing at the host's console if you want to use it to migrate a virtual machine between hosts. Under the hood, it uses ZFS Send/Receive and Cron with ZFS snapshots to periodically synchronize the virtual machines. I believe ZFS Send uses SCP for network transport so it isn't especially fast compared to other options such as...
NFS- This seems to be the networked volume solution of choice for PVE. However, in my new environment, it is extremely unstable. I can't figure out why. I have another forum thread going on this subject and have just about given up on NFS. It works OK over 1 Gb Ethernet as it did in my lab but when I try and push it at Infiniband speeds or at 10Gbps Ethernet (I have both options) it is able to write to the NFS Export for a short time, then it for some reason stops for ten minutes or more then starts working again for a short time then stops...During these lapses, PVE's storage system is unable to "see" the export mounted on /mnt/pve/nfssharename. I can ping the IP of the NFS host but if I try to do an "ls" of the mount my terminal locks up until the share mysteriously comes back. This is a big blow to my design! This happens on Infiniband as well as 10G Ether straight piped between hosts. So,
Does anyone have a better idea on how to achieve vm migrations? I have just found out that PVE can do native ZFS send/receive if you are careful about how you name your storage but this will try to use the VMBR0 bridge interface, rather than my desired high speed IB interface.
I have not tried Ceph but can imagine that it isn't the fasted thing in the world. My zpools are using pretty fast SSDs and in terms of just copying files around on the local host work great. I get write speeds on the order of 1000 MB/s and I want to enjoy this benefit for backups, migrations, etc.
How about going back to iSCSI? Can I make each host an ISCSI server and host the vms on storage added to PVE GUI as iSCSI? Will the hosts be smart enough to see the block devices wether it is on the same host or on the other host? How much of a performance penalty is iSCSI going to add? Anyone have experience with this in actual production?
Thanks to anyone who has read this far and even more thanks to anyone who can offer some fatherly advice.
GB