VM storage traffic on Ceph

holr

Well-Known Member
Jan 4, 2019
33
1
48
54
Hello,
I think I have misunderstood how some of the different networks function within Proxmox. I have a cluster of 9 nodes. Each node has two network cards; a 40Gbit/s dedicated for ceph storage, and a 10Gbit/s for all other networking (management/corosync, user traffic). I had assumed that, when using Ceph, all virtual machine read/writes to virtual hard disks would go via Ceph, i.e. the 40Gbit/s cards. However, I am seeing different results.

Ceph vs. main networks Video

In the above video, I have a windows 10 VM running the benchmarking tool crystaldiskmark. I also have iptraf-ng running on the server of the same node that the windows 10 VM is on. There are two network interfaces, eno50 which is the 10Gbit/s, and ens3f0 which is the 40Gbit/s.

What I observe in the video is that when Crystakdiskmark is making use of the virtual hard disk, it is eno50 (the 10Gbit/s card) that seems to carry the greatest amount of traffic, ens3f0 (40Gbit/s, Ceph) seems lightly used in comparison.

I believed that when using Ceph storage, the majority of the VM hard drive traffic would go via the Ceph network - which is why I put Ceph on a dedicated 40Gbit/s network. However, given what is observed in the video, I think this assumption is wrong.

Could a knowledgeable person please share why the Ceph network (ens3f0) is so lightly used compared to the main network (eno50) in this case? Did I make a mistake putting Ceph on the 40Gbit/s? Should Ceph be on the 10Gbit/s and the 40Gbit/s be on the main network?

Thank you!
 
Can you post your Ceph configuration (obtainable from the Web UI), as well as your network configuration (ip a), as well as the configuration for the VM in question?
 
Hi Shanreich, thank you for replying. Here's the Ceph configuration, network configuration, and VM configuration in that order below.

Ceph Configuration
Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 192.168.11.101/24
     fsid = 46ee4d0a-1bc4-45c4-a642-405c33d1374f
     mon_allow_pool_delete = true
     mon_host = 192.168.10.101 192.168.10.102 192.168.10.103
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 192.168.10.101/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve1]
     host = pve1
     mds_standby_for_name = pve

[mds.pve2]
     host = pve2
     mds_standby_for_name = pve

[mds.pve3]
     host = pve3
     mds standby for name = pve

[mon.pve1]
     public_addr = 192.168.10.101

[mon.pve2]
     public_addr = 192.168.10.102

[mon.pve3]
     public_addr = 192.168.10.103

The network configuration (ip a)
Code:
root@pve1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether f4:03:43:58:66:64 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f0
3: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 3c:fd:fe:a6:0e:28 brd ff:ff:ff:ff:ff:ff
    altname enp8s0f0
    inet 192.168.11.101/24 scope global ens3f0
       valid_lft forever preferred_lft forever
    inet6 fe80::3efd:feff:fea6:e28/64 scope link
       valid_lft forever preferred_lft forever
4: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether f4:03:43:58:66:65 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f1
5: eno49: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 48:df:37:20:ec:04 brd ff:ff:ff:ff:ff:ff
    altname enp4s0f0
6: eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether f4:03:43:58:66:66 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f2
7: ens3f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 3c:fd:fe:a6:0e:29 brd ff:ff:ff:ff:ff:ff
    altname enp8s0f1
8: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether f4:03:43:58:66:67 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f3
9: eno50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether 48:df:37:20:ec:05 brd ff:ff:ff:ff:ff:ff
    altname enp4s0f1
10: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 48:df:37:20:ec:05 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.101/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::4adf:37ff:fe20:ec05/64 scope link
       valid_lft forever preferred_lft forever
11: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether de:5b:10:76:84:f8 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.1/24 scope global vmbr1
       valid_lft forever preferred_lft forever
    inet6 fe80::dc5b:10ff:fe76:84f8/64 scope link
       valid_lft forever preferred_lft forever
12: tap100i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr100i0 state UNKNOWN group default qlen 1000
    link/ether 5e:69:10:01:c2:7e brd ff:ff:ff:ff:ff:ff
13: fwbr100i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:0d:9a:60:15:60 brd ff:ff:ff:ff:ff:ff
14: fwpr100p0@fwln100i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether 06:83:33:a9:eb:fe brd ff:ff:ff:ff:ff:ff
15: fwln100i0@fwpr100p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr100i0 state UP group default qlen 1000
    link/ether 86:fe:d7:e5:92:70 brd ff:ff:ff:ff:ff:ff
17: tap102i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr102i0 state UNKNOWN group default qlen 1000
    link/ether e2:a7:fd:4f:1b:b8 brd ff:ff:ff:ff:ff:ff
18: fwbr102i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5a:20:6d:6e:ee:56 brd ff:ff:ff:ff:ff:ff
19: fwpr102p0@fwln102i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether 5a:17:d7:69:2f:a2 brd ff:ff:ff:ff:ff:ff
20: fwln102i0@fwpr102p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr102i0 state UP group default qlen 1000
    link/ether 36:4c:8d:8e:c1:0c brd ff:ff:ff:ff:ff:ff
21: tap108i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr108i0 state UNKNOWN group default qlen 1000
    link/ether 72:37:c4:07:0e:70 brd ff:ff:ff:ff:ff:ff
22: fwbr108i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5a:cf:6d:e4:1c:69 brd ff:ff:ff:ff:ff:ff
23: fwpr108p0@fwln108i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether 66:05:88:54:3b:db brd ff:ff:ff:ff:ff:ff
24: fwln108i0@fwpr108p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr108i0 state UP group default qlen 1000
    link/ether 66:1d:99:07:ff:06 brd ff:ff:ff:ff:ff:ff
25: tap104i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr104i0 state UNKNOWN group default qlen 1000
    link/ether 3a:23:d4:4f:28:2c brd ff:ff:ff:ff:ff:ff
26: fwbr104i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7e:74:aa:53:ce:68 brd ff:ff:ff:ff:ff:ff
27: fwpr104p0@fwln104i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether 52:cd:01:17:6c:43 brd ff:ff:ff:ff:ff:ff
28: fwln104i0@fwpr104p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr104i0 state UP group default qlen 1000
    link/ether fa:9d:d2:75:51:49 brd ff:ff:ff:ff:ff:ff
29: tap132i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr132i0 state UNKNOWN group default qlen 1000
    link/ether 5e:54:1c:31:09:1a brd ff:ff:ff:ff:ff:ff
30: fwbr132i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b2:bd:d8:e0:5d:51 brd ff:ff:ff:ff:ff:ff
31: fwpr132p0@fwln132i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether 86:d8:5e:16:de:d4 brd ff:ff:ff:ff:ff:ff
32: fwln132i0@fwpr132p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr132i0 state UP group default qlen 1000
    link/ether 46:8c:77:66:24:f1 brd ff:ff:ff:ff:ff:ff
33: tap179i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr179i0 state UNKNOWN group default qlen 1000
    link/ether da:e5:9f:c9:70:36 brd ff:ff:ff:ff:ff:ff
34: fwbr179i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d6:20:15:f3:d8:37 brd ff:ff:ff:ff:ff:ff
35: fwpr179p0@fwln179i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether 86:3f:28:63:1c:33 brd ff:ff:ff:ff:ff:ff
36: fwln179i0@fwpr179p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr179i0 state UP group default qlen 1000
    link/ether 3e:04:7c:d6:1f:f3 brd ff:ff:ff:ff:ff:ff
42: tap158i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UNKNOWN group default qlen 1000
    link/ether 6a:5e:cc:bf:df:aa brd ff:ff:ff:ff:ff:ff
43: tap188i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr188i0 state UNKNOWN group default qlen 1000
    link/ether ae:0b:a3:0c:8f:eb brd ff:ff:ff:ff:ff:ff
44: fwbr188i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ee:79:41:85:1d:da brd ff:ff:ff:ff:ff:ff
45: fwpr188p0@fwln188i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether 7a:98:8c:a1:44:b0 brd ff:ff:ff:ff:ff:ff
46: fwln188i0@fwpr188p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr188i0 state UP group default qlen 1000
    link/ether ce:78:fc:a7:cf:21 brd ff:ff:ff:ff:ff:ff


And finally, the VM configuration
Code:
root@pve1:/etc/pve/nodes/pve1/qemu-server# cat 102.conf
agent: 1
boot: order=virtio0;net0;ide0
cores: 2
cpu: host
machine: pc-i440fx-6.1
memory: 8192
meta: creation-qemu=6.1.0,ctime=1641307976
name: VMNAME
net0: virtio=52:C5:98:78:B6:6F,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win10
parent: PARENT
scsihw: virtio-scsi-pci
smbios1: uuid=31aa7bc7-08d0-4a39-a95d-94a13a45f117
sockets: 1
startup: order=2
usb0: host=064f:03e9
usb1: host=0529:0001,usb3=1
virtio0: ceph_vm:vm-102-disk-0,cache=writeback,discard=on,size=1124G
vmgenid: d5018a6d-9117-4524-85b1-f387d03f7412

Thank you.
 
The reason why you are seeing this traffic is because Ceph uses the cluster (private) network only for replication/heartbeat traffic [1]. Traffic from your cluster to clients (in this case the VM) still runs over the public network. This might be a bit counterintuitive, but it is expected behaviour.

It might be smarter to have the 40G network dedicated for both your ceph public/cluster network, but this is still not optimal of course. Ceph could then run over the same network but additionally separated via VLAN. This makes splitting the networks on a physical layer easier later on. An optimal setup would require additional networks [2]. Additionally, it might be hard to swap your network configuration, depending on how reliant you are on the whole cluster. But it would involve removing all monitors/osds and rebuilding them.

[1] https://docs.ceph.com/en/latest/_images/ditaa-3bf285dacff79a5fe5eea8dd9ca2bd41baa26061.png
[2] https://forum.proxmox.com/attachments/proxmox-ceph_small-png.22723/
 
  • Like
Reactions: holr
Oh dear, looks like a schoolboy error on my part! I sincerely appreciate you taking the time to look into the issue. It sounds like modifying the networks in-situ is not without risk; i think it will be a rebuild at some point. Thanks again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!