Very Slow Performance after Upgrade to Proxmox 7 and Ceph Pacific

digitalchild

New Member
Jun 6, 2020
12
1
1
40
Hi all,

I've three servers with Proxmox and Ceph and installed them since the release of version 6.

This week I dedided to upgrade from 6.4 to 7 and, following your guide, I first upgraded ceph to Octopus, then PVE to 7, then Octopus to Pacific.
Everything seems to be fine, no errors, no warnings, except for the VM performances that are embarrassingly slow (in terms of disk read and write).

I tried to reboot, disable HA, poweroff all the VMs except the one I'm trying to use but none of these solutions has addressed my issue.

Just to give you an idea of the performances. here is a DD command with just 10 megabites on a simple CentOS VM:

Code:
[cdi@serverg]$ dd if=/dev/zero of=/tmp/test1.img bs=1M count=10 oflag=dsync                                                                               
10+0 records in                                                                 
10+0 records out                                                               
10485760 bytes (10 MB) copied, 4.95152 s, 2.1 MB/s                             
[cdi@serverg]$

Here is the distribution of the SSDs on the three hosts (2 per host)

Code:
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS
 0    ssd  0.46579   1.00000  477 GiB  268 GiB  267 GiB  1.1 MiB  676 MiB  209 GiB  56.13  1.14   74      up
 1    ssd  0.46579   1.00000  477 GiB  201 GiB  201 GiB  4.7 MiB  681 MiB  276 GiB  42.21  0.86   55      up
 2    ssd  0.46579   1.00000  477 GiB  223 GiB  222 GiB    3 KiB  732 MiB  254 GiB  46.77  0.95   61      up
 3    ssd  0.46579   1.00000  477 GiB  246 GiB  245 GiB  1.1 MiB  747 MiB  231 GiB  51.60  1.05   68      up
 4    ssd  0.46579   1.00000  477 GiB  242 GiB  241 GiB      0 B  772 MiB  235 GiB  50.71  1.03   66      up
 5    ssd  0.46579   1.00000  477 GiB  227 GiB  227 GiB  4.1 MiB  629 MiB  250 GiB  47.64  0.97   63      up
                       TOTAL  2.8 TiB  1.4 TiB  1.4 TiB   11 MiB  4.1 GiB  1.4 TiB  49.18

here is the package versions of all of three servers
Code:
proxmox-ve: 7.0-2 (running kernel: 5.11.22-4-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-7
pve-kernel-helper: 7.0-7
pve-kernel-5.4: 6.4-6
pve-kernel-5.11.22-4-pve: 5.11.22-8
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 16.2.5-pve1
ceph-fuse: 16.2.5-pve1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: not correctly installed
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-6
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-11
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-3
pve-firmware: 3.3-1
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1


here is the ceph config
Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 192.168.170.31/24
     fsid = c01cf831-1afa-430f-b8d7-96cd37903e0b
     mon_allow_pool_delete = true
     mon_host = 192.168.169.31 192.168.169.32 192.168.169.33
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 192.168.169.31/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.geocluster1]
     public_addr = 192.168.169.31

[mon.geocluster2]
     public_addr = 192.168.169.32

[mon.geocluster3]
     public_addr = 192.168.169.33

Please indicate me which files you need for better understanding my infrastructure.

Thank you in advance for your help.

Digitalchild
 

digitalchild

New Member
Jun 6, 2020
12
1
1
40
Hi All,

this morning I upgraded ceph from version 16.2.5 to 16.2.6 on all three nodes.

After rebooting each node, performance seems to be partially restored.

The simple bench I made on the Linux VM shows a performance increase from 2Mb/s to about 11Mb/s, but still far from the 80/90 Mb/s of the previous Proxmox/ceph release.

VMs are still unusable at the moment.
Thanks a lot for your support.

Digitalchild
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
6,720
1,173
164
does ceph -s show anything out of the ordinary? did you follow the upgrade guidelines and use the check script?
 

Whatever

Active Member
Nov 19, 2012
267
13
38
Enabling rbd_cache should help a lot
Ex in my setup

Code:
[client]   
keyring = /etc/pve/priv/$cluster.$name.keyring   
rbd_cache_size = 134217728
 

digitalchild

New Member
Jun 6, 2020
12
1
1
40
@fabian I followed the upgrade guidelines and found no errors in any step of the upgrade

here is my ceph -s output:

Code:
  cluster:
    id:     c01cf831-1afa-430f-b8d7-96cd37903e0b
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum geocluster1,geocluster2,geocluster3 (age 7m)
    mgr: geocluster1(active, since 9m), standbys: geocluster2, geocluster3
    osd: 6 osds: 6 up (since 7m), 6 in (since 5M)
 
  data:
    pools:   2 pools, 129 pgs
    objects: 120.04k objects, 466 GiB
    usage:   1.4 TiB used, 1.4 TiB / 2.8 TiB avail
    pgs:     129 active+clean
 
  io:
    client:   12 MiB/s rd, 341 B/s wr, 3 op/s rd, 0 op/s wr
 

digitalchild

New Member
Jun 6, 2020
12
1
1
40
Just to give you an ideea of what is happening after a few minutes, I started a backup on this VM to a local storage and the speed decreases constantly

here is the log of the backup.....it starts from around 150Mb/s and ends at 10Mb/s


Code:
INFO: include disk 'ide0' 'geovmosdpool:vm-105-disk-0' 150G
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-qemu-105-2021_09_29-11_19_26.vma.zst'
INFO: starting kvm to execute backup task
INFO: started backup task 'f8870159-3c63-4299-9558-5bd48393c98d'
INFO:   0% (1007.8 MiB of 150.0 GiB) in 3s, read: 335.9 MiB/s, write: 143.3 MiB/s
INFO:   1% (1.5 GiB of 150.0 GiB) in 6s, read: 189.7 MiB/s, write: 139.3 MiB/s
INFO:   2% (3.1 GiB of 150.0 GiB) in 15s, read: 179.5 MiB/s, write: 152.2 MiB/s
INFO:   3% (4.6 GiB of 150.0 GiB) in 24s, read: 169.3 MiB/s, write: 101.8 MiB/s
INFO:   4% (6.0 GiB of 150.0 GiB) in 33s, read: 161.5 MiB/s, write: 107.3 MiB/s
INFO:   5% (7.6 GiB of 150.0 GiB) in 42s, read: 174.0 MiB/s, write: 109.0 MiB/s
INFO:   6% (9.1 GiB of 150.0 GiB) in 51s, read: 176.9 MiB/s, write: 36.7 MiB/s
INFO:   7% (10.6 GiB of 150.0 GiB) in 1m, read: 167.5 MiB/s, write: 18.2 MiB/s
INFO:   8% (12.3 GiB of 150.0 GiB) in 1m 5s, read: 348.7 MiB/s, write: 45.1 MiB/s
INFO:  11% (16.7 GiB of 150.0 GiB) in 1m 8s, read: 1.5 GiB/s, write: 119.1 MiB/s
INFO:  13% (20.8 GiB of 150.0 GiB) in 1m 11s, read: 1.4 GiB/s, write: 106.6 MiB/s
INFO:  14% (21.2 GiB of 150.0 GiB) in 1m 14s, read: 122.0 MiB/s, write: 119.6 MiB/s
INFO:  15% (22.7 GiB of 150.0 GiB) in 1m 24s, read: 153.3 MiB/s, write: 141.5 MiB/s
INFO:  16% (24.0 GiB of 150.0 GiB) in 1m 34s, read: 137.0 MiB/s, write: 136.2 MiB/s
INFO:  17% (25.6 GiB of 150.0 GiB) in 1m 45s, read: 141.0 MiB/s, write: 130.8 MiB/s
INFO:  18% (27.0 GiB of 150.0 GiB) in 2m 9s, read: 62.4 MiB/s, write: 61.0 MiB/s
INFO:  19% (28.5 GiB of 150.0 GiB) in 3m 45s, read: 16.0 MiB/s, write: 15.8 MiB/s
INFO:  20% (30.0 GiB of 150.0 GiB) in 5m 37s, read: 13.5 MiB/s, write: 12.7 MiB/s
INFO:  21% (31.5 GiB of 150.0 GiB) in 7m, read: 18.6 MiB/s, write: 17.3 MiB/s
INFO:  22% (33.0 GiB of 150.0 GiB) in 8m 33s, read: 16.6 MiB/s, write: 15.4 MiB/s
 
Last edited:

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
6,720
1,173
164
that does not really show anything (the initial faster speed might also be caused by holes/zero blocks).

does a rados benchmark look okay? e.g,

Code:
# write
rados bench 600 write -b 4M -t 16 --no-cleanup
# read (uses data from write)
rados bench 600 seq -t 16

and then again with -t 1 to simulate a single guest accessing Ceph.
 

digitalchild

New Member
Jun 6, 2020
12
1
1
40
Enabling rbd_cache should help a lot
Ex in my setup

Code:
[client]  
keyring = /etc/pve/priv/$cluster.$name.keyring  
rbd_cache_size = 134217728

I tried to set up the cache, as you mentioned, but the performance didn't change. :(
 

digitalchild

New Member
Jun 6, 2020
12
1
1
40
I'm also trying to copy the VMs out of the ceph storage pool, to let my users continue working from the local storage, but after about 10% of the transfer, the server reboots (!!!).

Ceph log shows an error:
Code:
Connection Error (599) Too Many Redirections

Here is the backup Log:
Code:
()
ide0
create full clone of drive ide0 (geovmosdpool:vm-105-disk-0)
transferred 0.0 B of 150.0 GiB (0.00%)
transferred 1.5 GiB of 150.0 GiB (1.00%)
transferred 3.0 GiB of 150.0 GiB (2.00%)
transferred 4.5 GiB of 150.0 GiB (3.00%)
transferred 6.0 GiB of 150.0 GiB (4.01%)
transferred 7.5 GiB of 150.0 GiB (5.01%)
transferred 9.0 GiB of 150.0 GiB (6.01%)
transferred 10.5 GiB of 150.0 GiB (7.01%)
transferred 12.0 GiB of 150.0 GiB (8.01%)
transferred 13.5 GiB of 150.0 GiB (9.01%)
transferred 15.0 GiB of 150.0 GiB (10.01%)
transferred 16.5 GiB of 150.0 GiB (11.01%)
transferred 18.0 GiB of 150.0 GiB (12.02%)
transferred 19.5 GiB of 150.0 GiB (13.02%)
transferred 21.0 GiB of 150.0 GiB (14.02%)
 
Last edited:
  • Like
Reactions: hepo

digitalchild

New Member
Jun 6, 2020
12
1
1
40
that does not really show anything (the initial faster speed might also be caused by holes/zero blocks).

does a rados benchmark look okay? e.g,

Code:
# write
rados bench 600 write -b 4M -t 16 --no-cleanup
# read (uses data from write)
rados bench 600 seq -t 16

and then again with -t 1 to simulate a single guest accessing Ceph.
Please find, attached to this message, the output of rados benchmark
Write and read, sequential and random, 1 and 16 concurrent.

It seems to be very painful on writing.
 

Attachments

  • Rados_Bench_Read_Rand_1.txt
    50.9 KB · Views: 8
  • Rados_Bench_Read_Rand_16.txt
    51.1 KB · Views: 2
  • Rados_Bench_Read_Seq_1.txt
    11.9 KB · Views: 1
  • Rados_Bench_Read_Seq_16.txt
    6.9 KB · Views: 1
  • Rados_Bench_Write.txt
    51.3 KB · Views: 5

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
6,720
1,173
164
yeah, that does not look very good at all. can you check your network performance over the NICs used for Ceph? e.g., with iperf3?

the following commands might be interesting to see whether it's one or a few disks misbehaving that cause the slowdown:

https://docs.ceph.com/en/latest/rados/operations/control/#osd-subsystem

Runs a simple throughput benchmark against OSD.N, writing TOTAL_DATA_BYTES in write requests of BYTES_PER_WRITE each. By default, the test writes 1 GB in total in 4-MB increments. The benchmark is non-destructive and will not overwrite existing live OSD data, but might temporarily affect the performance of clients concurrently accessing the OSD.

ceph tell osd.N bench [TOTAL_DATA_BYTES] [BYTES_PER_WRITE]


To clear an OSD’s caches between benchmark runs, use the ‘cache drop’ command

ceph tell osd.N cache drop


To get the cache statistics of an OSD, use the ‘cache status’ command

ceph tell osd.N cache status
 

digitalchild

New Member
Jun 6, 2020
12
1
1
40
Here is the output of bench.....only osd1 is almost fast, the other ones are really slow.

All of these OSDs are on DELL Enterprise SSD 480Gb. and are located 2 per each server (3 DELL poweredge R630).

Code:
root@geocluster2:~# ceph tell osd.5 cache drop
root@geocluster2:~# ceph tell osd.5 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 32.134856483,
    "bytes_per_sec": 33413618.155352008,
    "iops": 7.9664273632411975
}
root@geocluster2:~# ceph tell osd.5 cache drop
root@geocluster2:~# ceph tell osd.5 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 52.243891488999999,
    "bytes_per_sec": 20552485.532707252,
    "iops": 4.9000943977134828
}
root@geocluster2:~# ceph tell osd.4 cache drop
root@geocluster2:~# ceph tell osd.4 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 29.980100913000001,
    "bytes_per_sec": 35815150.426475152,
    "iops": 8.538997274988926
}
root@geocluster2:~# ceph tell osd.4 cache drop
root@geocluster2:~# ceph tell osd.4 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 53.891790579000002,
    "bytes_per_sec": 19924033.186947115,
    "iops": 4.7502596824043071
}
root@geocluster2:~# ceph tell osd.3 cache drop
root@geocluster2:~# ceph tell osd.3 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 29.874876499999999,
    "bytes_per_sec": 35941297.49791602,
    "iops": 8.5690730805196811
}
root@geocluster2:~# ceph tell osd.3 cache drop
root@geocluster2:~# ceph tell osd.3 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 49.727775184999999,
    "bytes_per_sec": 21592396.201225709,
    "iops": 5.1480284216942094
}
root@geocluster2:~# ceph tell osd.2 cache drop
root@geocluster2:~# ceph tell osd.2 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 34.220024866000003,
    "bytes_per_sec": 31377587.485824358,
    "iops": 7.4809998240052122
}
root@geocluster2:~# ceph tell osd.2 cache drop
root@geocluster2:~# ceph tell osd.2 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 33.012587646,
    "bytes_per_sec": 32525224.484488446,
    "iops": 7.754617806551086
}
root@geocluster2:~# ceph tell osd.1 cache drop
root@geocluster2:~# ceph tell osd.1 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 4.6336774829999996,
    "bytes_per_sec": 231725627.84944263,
    "iops": 55.247694933281572
}
root@geocluster2:~# ceph tell osd.1 cache drop
root@geocluster2:~# ceph tell osd.1 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 4.3644711479999998,
    "bytes_per_sec": 246018770.10735598,
    "iops": 58.655445601309772
}
root@geocluster2:~# ceph tell osd.0 cache drop
root@geocluster2:~# ceph tell osd.0 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 21.074421547,
    "bytes_per_sec": 50950002.191298582,
    "iops": 12.147427127670904
}
root@geocluster2:~# ceph tell osd.0 cache drop
root@geocluster2:~# ceph tell osd.0 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 29.808478291,
    "bytes_per_sec": 36021356.525408149,
    "iops": 8.5881606400986072
}
root@geocluster2:~#
 

digitalchild

New Member
Jun 6, 2020
12
1
1
40
Here is an example of the output of the iperf3 command on all of three hosts

Code:
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   114 MBytes   955 Mbits/sec    0    430 KBytes       
[  5]   1.00-2.00   sec   113 MBytes   947 Mbits/sec    0    475 KBytes       
[  5]   2.00-3.00   sec   112 MBytes   943 Mbits/sec    0    499 KBytes       
[  5]   3.00-4.00   sec   112 MBytes   940 Mbits/sec    0    525 KBytes       
[  5]   4.00-5.00   sec   111 MBytes   933 Mbits/sec    0    550 KBytes       
[  5]   5.00-6.00   sec   113 MBytes   948 Mbits/sec    0    550 KBytes       
[  5]   6.00-7.00   sec   112 MBytes   936 Mbits/sec    0    577 KBytes       
[  5]   7.00-8.00   sec   112 MBytes   942 Mbits/sec   33    437 KBytes       
[  5]   8.00-9.00   sec   112 MBytes   942 Mbits/sec    0    491 KBytes       
[  5]   9.00-10.00  sec   112 MBytes   936 Mbits/sec    0    491 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec   33             sender
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
6,720
1,173
164
those bench numbers really don't look good.. what does smart say about the SSDs?
 
Oct 7, 2019
89
19
13
If you have enough free space in your cluster, I would suggest to fully stop and remove one of those slow OSD's. Wait until Ceph rebalances to the remaining OSD's. Then proceed with some fio or dd benchmarks on that disk and compare with the results you got from ceph benchs. This will help you pinpoint the problem either to something Ceph related or to the disk/controller itself.
 

digitalchild

New Member
Jun 6, 2020
12
1
1
40
Good morning,

yesterday, finally, I migrated all the VMs to allow me to test the entire cluster.

I made the following tests:
- Rebooted to DELL diagnostics and checked all the three R630 servers. Everything is OK.
- Rebooted to a WinPE distribution and checked all the SSDs for performance and errors.
- No errors were detected and all SSDs were performing around 400-500Mb/s in reading and 300-400Mb/s in writing.
- Regarding SMART data, no error were detected, as only 1ssd is indicated as "critical" in terms of expected life but healthy and with no errors.

Now, I decided to upgrade the lan for Ceph to see if LAN is the bottleneck with ceph "Pacific" upgrade and I purchased.
Mellanox 40Gb/s cards and Switch, even if I'm still concerned about the fact that servers were performing good before Ceph version upgrade.

Thanks a lot for your support and please allow me a few days to give you an update on this issue.

In the meantime, any suggesion and/or idea would be very appreciated.

D.
 
Mar 27, 2021
64
11
8
41
I'm also trying to copy the VMs out of the ceph storage pool, to let my users continue working from the local storage, but after about 10% of the transfer, the server reboots (!!!).

Ceph log shows an error:
Code:
Connection Error (599) Too Many Redirections

Here is the backup Log:
Code:
()
ide0
create full clone of drive ide0 (geovmosdpool:vm-105-disk-0)
transferred 0.0 B of 150.0 GiB (0.00%)
transferred 1.5 GiB of 150.0 GiB (1.00%)
transferred 3.0 GiB of 150.0 GiB (2.00%)
transferred 4.5 GiB of 150.0 GiB (3.00%)
transferred 6.0 GiB of 150.0 GiB (4.01%)
transferred 7.5 GiB of 150.0 GiB (5.01%)
transferred 9.0 GiB of 150.0 GiB (6.01%)
transferred 10.5 GiB of 150.0 GiB (7.01%)
transferred 12.0 GiB of 150.0 GiB (8.01%)
transferred 13.5 GiB of 150.0 GiB (9.01%)
transferred 15.0 GiB of 150.0 GiB (10.01%)
transferred 16.5 GiB of 150.0 GiB (11.01%)
transferred 18.0 GiB of 150.0 GiB (12.02%)
transferred 19.5 GiB of 150.0 GiB (13.02%)
transferred 21.0 GiB of 150.0 GiB (14.02%)


We have experienced this reboot 2 time over the last two days... trying to move vm disk out of ceph to local disk to troubleshoot ceph performance issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!