Low filetransfer performance in Linux Guest (ceph storage)

Ballistic

Active Member
Oct 4, 2018
21
0
41
23
Hi All,

I have been a big user of ESX but now setting foot in the Proxmox world with Ceph storage.
I got my (test) cluster running. Not as best practice yet but i'm not too worried about that right now..
It's a X5670 cluster with 120GB RAM and 1x 2TB OSD per machine.

My Ceph cluster seems to be performing fine:
rados bench -p storage 10 write --no-cleanup = 390MB/sec
rados bench -p storage 10 seq = Unkown. This command is broken (no such file or directory)

In a Windows VM. I also get decent speeds all around. Reading/writing in the ten's of MB/sec.

In a Linux VM with Virtio SCSI + NIC. As far as i can tell, i also get decent speeds of over 280MB/sec write and a wget 1000MB file downloads at 90MB/sec+. Bonnie++ test with twice RAM size:
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
www 15968M 386 99 284606 33 83850 15 912 99 127820 14 6103 17

I hit a problem when i started to transfer files to the Linux VM. I tried SCP, SFTP (via sshd & WinSCP) and FTP (via proftpd with FlashFXP)
All of these transfer methods result in very poor performance from multiple sources (local + internet). Speeds at start are already not higher then 1000KB/sec and vary from 500KB/sec to 1500KB/sec for short moments.

Does anyone have an idea of what this issue could be or other test methods to narrow down the problem?
 
you have 1 OSD per Node, probably just HDD? And 3 nodes, so just 3 OSD's?
Which network speed?

CEPH needs both low latency, and at best many OSD's. The slow filesystem would be no surprise.

Replace the HDD's with Enterprise SSD's and use more OSD's per Node, and at least 10 GBit in the backend net then it would fire up.
 
I understand your reaction and agree that that is the best practice to get the most performance out of Ceph and that the current solution is not optimal. However, i don't think this is the reason for the performance hit to just 500KB/sec. Especially not since all other benchmarks show pretty decent performance. Scaling this performance to 12 OSD's would still only give about 6MB/sec.

I installed Filezilla FTP server on the Windows guest and did the same test. Results are averaging 15MB/sec which is still not great but expected with this low number of OSD's.

The Ceph network is 10Gbit and iperf tests give over 6Gbit/sec (0.2x ms latency) The OSD's are all 2TB SSD's (MX500)
 
Since KVM is singlethreaded and ceph is multi, I tried adding 4 new HDDs (VirtIO block) and did a raid 0 in Windows which gave me much better speeds.

Next step is ceph cache tiering + ssd/nvme journaling or if you are really brave, bcache
 
I hit a problem when i started to transfer files to the Linux VM. I tried SCP, SFTP (via sshd & WinSCP) and FTP (via proftpd with FlashFXP)
All of these transfer methods result in very poor performance from multiple sources (local + internet). Speeds at start are already not higher then 1000KB/sec and vary from 500KB/sec to 1500KB/sec for short moments.
Hi,
the question is, is the bottleneck the network (AES?), or the file-io (on read and or write side), or both together?

How looks iperf between your windows-source to the linux-vm? How fast is an scp from windows to linux in /dev/null?

BTW. Consumer-SSDs are not the best choice (an bad choice for ceph) - but perhaps not the issue here.

Udo
 
The only down side to consumer SSD's as far as i've noticed is the low replication performance when Ceph needs to move data around. Regarding to power failures: Chances are slim (dual feed, UPS etc) and the MX500 SSD's have some more inteligence regarding sudden power loss.
This cluster not host mission critical data (just some VDI's really) and will not be written to constantly so i don't see an endurance bottleneck.

Regarding the issue; I changed the NIC in the Linux guest from Virtio to Intel E1000 and i am now getting 26MB/sec+ :)
Something is not right with Debian9.5 + Virtio NIC guest.
 
Last edited:
The only down side to consumer SSD's as far as i've noticed is the low replication performance when Ceph needs to move data around. Regarding to power failures: Chances are slim (dual feed, UPS etc)
Hi,
the problem with consumer ssds are the very low write-speed and not linear performance.
and the MX500 SSD's have some more inteligence regarding sudden power loss.
Sure? For this the ssd need an cache cap, and this is one of the different to enterprise ssds.
This cluster not host mission critical data (just some VDI's really) and will not be written to constantly so i don't see an endurance bottleneck.

Regarding the issue; I changed the NIC in the Linux guest from Virtio to Intel E1000 and i am now getting 26MB/sec+ :)
Something is not right with Debian9.5 + Virtio NIC guest.
If you get more speed with e1000 than with virtio, there must be something totaly wrong!!

Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!