[SOLVED] Proxmox migration 5 to 6, ok, but Ceph Nautilus perf down, why ?

atec666

Member
Mar 8, 2019
136
4
18
Issoire
Hello,

have an happy new years.
Upgrade from pve 5 to 6 work flawlessly ... but now , ceph cluster perf. are just awfull .
Before 63 MB/s ... now 16 MB/s why ? (test with dd , as recommanded by ceph, under CT , CT IO are extremely slow now .... )

And with Debian stretch to buster migration (LXC/CT) this not acceptable, it took me 50 mn for a simply DNs server (just bind9 package on this vanilla Debian ... !!!)
(Borgbackup take a lot of time too , from 4mn i'm now with 6mn ... 137Gb backup)

Code:
ceph -s


  cluster:

    id:     2ergggr7-2ggr38f-43ggr36-b9gregd3-ze44r6f64z1

    health: HEALTH_OK


  services:

    mon: 3 daemons, quorum NonettE,YssoN,VicheL (age 61m)

    mgr: VicheL(active, since 64m), standbys: NonettE, YssoN

    osd: 6 osds: 6 up (since 62m), 6 in


  data:

    pools:   1 pools, 128 pgs

    objects: 77.04k objects, 294 GiB

    usage:   881 GiB used, 10 TiB / 11 TiB avail

    pgs:     128 active+clean


  io:

    client:   341 B/s rd, 268 KiB/s wr, 0 op/s rd, 21 op/s wr


ceph tell osd.* version


osd.0: {

    "version": "ceph version 14.2.6 (ab28c28c7c0effdd172fa297c0e2e9172fa29) nautilus (stable)"

}

osd.1: {

    "version": "ceph version 14.2.6 (ab28c28c7c0effdd172fa297c0e2e9172fa29) nautilus (stable)"

}

osd.2: {

    "version": "ceph version 14.2.6 (ab28c28c7c0effdd172fa297c0e2e9172fa29) nautilus (stable)"

}

osd.3: {

    "version": "ceph version 14.2.6 (ab28c28c7c0effdd172fa297c0e2e9172fa29) nautilus (stable)"

}

osd.4: {

    "version": "ceph version 14.2.6 (ab28c28c7c0effdd172fa297c0e2e9172fa29) nautilus (stable)"

}

osd.5: {

    "version": "ceph version 14.2.6 (ab28c28c7c0effdd172fa297c0e2e9172fa29) nautilus (stable)"
 
Last edited:
First, Alwin : thank you for your quick answer ! ;-)
i think my PGs number is too low ... did you confirm that ?

BUT i did not understand very well PGS ... i know , i'm bad about that :-(

But reading benchmark doc ... with fio command ...
Note: This command will destroy any data on your disk. ==> no i don't want to doi that for testing perf. sorry ... i prefere dd ... in an LXC (CT)

Code:
ceph osd df tree
ID CLASS WEIGHT   REWEIGHT SIZE    RAW USE DATA    OMAP    META    AVAIL   %USE VAR  PGS STATUS TYPE NAME 
-1       10.91574        -  11 TiB 904 GiB 898 GiB 346 MiB 5.7 GiB  10 TiB 8.09 1.00   -        root default
-7        3.63858        - 3.6 TiB 301 GiB 299 GiB  93 MiB 1.9 GiB 3.3 TiB 8.09 1.00   -            host NonettE
3   hdd  1.81929  1.00000 1.8 TiB 155 GiB 154 GiB  56 MiB 968 MiB 1.7 TiB 8.34 1.03  66     up         osd.3
4   hdd  1.81929  1.00000 1.8 TiB 146 GiB 145 GiB  37 MiB 987 MiB 1.7 TiB 7.84 0.97  62     up         osd.4
-5        3.63858        - 3.6 TiB 301 GiB 299 GiB 122 MiB 1.9 GiB 3.3 TiB 8.09 1.00   -            host VicheL
2   hdd  1.81929  1.00000 1.8 TiB 150 GiB 149 GiB  50 MiB 974 MiB 1.7 TiB 8.06 1.00  64     up         osd.2
5   hdd  1.81929  1.00000 1.8 TiB 151 GiB 150 GiB  71 MiB 953 MiB 1.7 TiB 8.12 1.00  64     up         osd.5
-3        3.63858        - 3.6 TiB 301 GiB 299 GiB 131 MiB 1.9 GiB 3.3 TiB 8.09 1.00   -            host YssoN
0   hdd  1.81929  1.00000 1.8 TiB 172 GiB 171 GiB  85 MiB 939 MiB 1.7 TiB 9.21 1.14  73     up         osd.0
1   hdd  1.81929  1.00000 1.8 TiB 130 GiB 129 GiB  46 MiB 978 MiB 1.7 TiB 6.97 0.86  55     up         osd.1
                     TOTAL  11 TiB 904 GiB 898 GiB 346 MiB 5.7 GiB  10 TiB 8.09                           
MIN/MAX VAR: 0.86/1.14  STDDEV: 0.66

and

Code:
pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-4.15: 5.4-12
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-4.15.18-24-pve: 4.15.18-52
pve-kernel-4.15.18-20-pve: 4.15.18-46
ceph: 14.2.6-pve1
ceph-fuse: 14.2.6-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-2
pve-cluster: 6.1-2
pve-container: 3.0-16
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-4
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

and

Code:
ceph osd pool get cephPool1 pg_num
pg_num: 128


On PVE hypervisor (with two mdadm SSD 120Go kingston)

Code:
turner@YssoN:~# dd if=/dev/zero of=/tmp/1G bs=1G count=1 oflag=direct
1+0 enregistrements lus
1+0 enregistrements écrits
1073741824 octets (1,1 GB, 1,0 GiB) copiés, 18,8554 s, 56,9 MB/s

dd if=/dev/zero of=/tmp/333M bs=333M count=1 oflag=direct
1+0 enregistrements lus
1+0 enregistrements écrits
349175808 octets (349 MB, 333 MiB) copiés, 4,11905 s, 84,8 MB/s


On LXC (CT) so with ceph in action ... (6 OSD, each OSD is a 2To 7200tr/min HDD, dedicated gigabit network)

Code:
alex@web1:~# dd if=/dev/zero of=/tmp/333M bs=333M count=1 oflag=direct
1+0 records in
1+0 records out
349175808 bytes (349 MB, 333 MiB) copied, 4.68064 s, 74.6 MB/s
root@web1:~# dd if=/dev/zero of=/tmp/1G bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 14.1002 s, 76.2 MB/s

So not so bad ... now , is ceph rebalancing after upgrade ? for hours ?
 
Last edited:
i think my PGs number is too low ... did you confirm that ?
Yup, hence my comment. ;) See the calculator the get the amount of PGs needed for each pool. The calculator should be good enough explained on the pgcalc page to easily get the correct result. https://ceph.io/pgcalc/

But reading benchmark doc ... with fio command ...
Note: This command will destroy any data on your disk. ==> no i don't want to doi that for testing perf. sorry ... i prefere dd ... in an LXC (CT)
Not fio, the rados benchmark. The benchmark will tell you how good the bandwidth and latency of Ceph on an particular pool is. Sadly, dd is not an adequate benchmark tool. And you test different layers of the whole stack.

349175808 bytes (349 MB, 333 MiB) copied, 4.68064 s, 74.6 MB/s
What was the result before? As said before, best use a rados benchmark to compare the ceph performance.

So not so bad ... now , is ceph rebalancing after upgrade ? for hours ?
Depending on the length of the downtime of one node, there might be some recovery traffic. Rebalancing is less likely with 3x nodes, as by default a pool has a replica of 3.
 
thank you alwin.

For 2 OSD per nodes (3 nodes) so 6 OSD (2TB each) : is 256 pgs (instead of 128) is good enough ?
Note : replica is 3 copies
 
Last edited:
For 2 OSD per nodes (3 nodes) so 6 OSD (2TB each) : is 256 pgs (instead of 128) is good enough ?
If you stick with one pool, then it should be enough. But it's not a huge issue, with Ceph Nautilus the PGs can be down/up-sized.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!