Greetings,
For several months, our small PVE environment was limited to 1Gbit ethernet and so that was generally our backup performance limiting factor. Then we got old used Infiniband for cheap on eBay. When it worked, you could do Iperf and see 15+ Gbit performance from memory buffer to memory buffer. However, in practice this never translated into fast VZdump backups. For example, we have a Windows 12 server with a 100 GB OS zvol and a 1 TB database zvol. A VZdump backup of this server using lz4 compression takes 3 hours and results in a roughly 400GB backup file.
When I study the iostat output, it appears to mostly be idle with brief surges of expected throughput. Searching through older PVE posts, I found some discussion around using a local storage to "stage" the vzdump process, which I tried but it doesn't seem to do anything much with the local staging target.
Regardless of whether one uses compression or not, it still seems to take 3 hours. We now have 10Gbit Ethernet so I can no longer blame Infiniband.
If I copy this same 400 GB .lzo file from one host to another over NFS (at 10G), I get throughput of around 400 MB/s. This is pretty close to the maximum sequential write performance for the target zpool and the kind of speed I want in backups and recovery!
With ZFS there are some alternative backup mechanisms. I am now experimenting with zfs send/receive but so far SSH seems to be another bottleneck keeping us from getting much more than about 150 MB/s of throughput. There are options like mbuffer that sidestep the SSH transport but they all seem to require synchronized commands on both the sending and receiving server making it complicated to automate for a noob such as myself. Testing zfs send/receive using mbuffer provides about 250 MB/s (compressed, not actual), which is better but still not anywhere close to being as fast as a bulk NFS file transfer.
I might get better zfs send/receive throughput with 128k recordsize/block size but NFS seems to like 16k best for bulk file transfers on our system. I may try changing record/block size and see if it makes a big difference but I don't expect that it will. The application vendor for the server being backed up requests a 16k block size for their Pervasive based database so we are pretty much stuck with it anyway. I suppose we could make a special 128k dataset on the receiving end, which might cut down on IP overhead.
Proxmox itself uses zfs send/receive to migrate vms from one host to another as long as your zpools and datasets have the same names on each host. Furthermore, there is a setting whereby you can do this migration in the clear, without SSH (nonsecure). How is PVE doing that? Is there a syntax I can use at the CLI to make at-will zfs copies from a PVE host to a non-PVE linux host? In this case, it is running Ubuntu 16 LTS, like PVE, but not part of the PVE cluster itself. It is just a NAS essentially that happens to use ZFS.
There is PVE-zsync, which is handy for keeping a few snapshots worth of replicas but I believe it too goes over SSH which is seems to be a throughput killer.
Lastly, if any of you know how to make vzdump work at line speed then that would be my preferred option since there is already a handy backup schedule interface built into PVE. For it to be useful to us, however, it needs to be able to handle a continuous 300 or more MB/s and I can't make it go more than about 100MB/s if even that.
Let me summarize what I want to learn:
GB
For several months, our small PVE environment was limited to 1Gbit ethernet and so that was generally our backup performance limiting factor. Then we got old used Infiniband for cheap on eBay. When it worked, you could do Iperf and see 15+ Gbit performance from memory buffer to memory buffer. However, in practice this never translated into fast VZdump backups. For example, we have a Windows 12 server with a 100 GB OS zvol and a 1 TB database zvol. A VZdump backup of this server using lz4 compression takes 3 hours and results in a roughly 400GB backup file.
When I study the iostat output, it appears to mostly be idle with brief surges of expected throughput. Searching through older PVE posts, I found some discussion around using a local storage to "stage" the vzdump process, which I tried but it doesn't seem to do anything much with the local staging target.
Regardless of whether one uses compression or not, it still seems to take 3 hours. We now have 10Gbit Ethernet so I can no longer blame Infiniband.
If I copy this same 400 GB .lzo file from one host to another over NFS (at 10G), I get throughput of around 400 MB/s. This is pretty close to the maximum sequential write performance for the target zpool and the kind of speed I want in backups and recovery!
With ZFS there are some alternative backup mechanisms. I am now experimenting with zfs send/receive but so far SSH seems to be another bottleneck keeping us from getting much more than about 150 MB/s of throughput. There are options like mbuffer that sidestep the SSH transport but they all seem to require synchronized commands on both the sending and receiving server making it complicated to automate for a noob such as myself. Testing zfs send/receive using mbuffer provides about 250 MB/s (compressed, not actual), which is better but still not anywhere close to being as fast as a bulk NFS file transfer.
I might get better zfs send/receive throughput with 128k recordsize/block size but NFS seems to like 16k best for bulk file transfers on our system. I may try changing record/block size and see if it makes a big difference but I don't expect that it will. The application vendor for the server being backed up requests a 16k block size for their Pervasive based database so we are pretty much stuck with it anyway. I suppose we could make a special 128k dataset on the receiving end, which might cut down on IP overhead.
Proxmox itself uses zfs send/receive to migrate vms from one host to another as long as your zpools and datasets have the same names on each host. Furthermore, there is a setting whereby you can do this migration in the clear, without SSH (nonsecure). How is PVE doing that? Is there a syntax I can use at the CLI to make at-will zfs copies from a PVE host to a non-PVE linux host? In this case, it is running Ubuntu 16 LTS, like PVE, but not part of the PVE cluster itself. It is just a NAS essentially that happens to use ZFS.
There is PVE-zsync, which is handy for keeping a few snapshots worth of replicas but I believe it too goes over SSH which is seems to be a throughput killer.
Lastly, if any of you know how to make vzdump work at line speed then that would be my preferred option since there is already a handy backup schedule interface built into PVE. For it to be useful to us, however, it needs to be able to handle a continuous 300 or more MB/s and I can't make it go more than about 100MB/s if even that.
Let me summarize what I want to learn:
- How do I perform zfs send/receive in a single command on a single host yet avoid the penalty of SSH?
- Can vzdump be made to go faster? Closer to line speed, assuming my zpools can handle the uptake?
- If neither option is optimum for getting the most out of 10G have any of you found a package that is?
GB