[SOLVED] Huge increase in I/O load on NVMe disks (at equal VM load) after upgrade from Ceph 12 to 15

lucaferr · Aug 13, 2021

Hi! Last night we upgraded our production 9-nodes cluster from PVE 5.4 to PVE 6.4 and from Ceph 12 to 14 and then to 15 (Octopus), following the official tutorials. Everything went smoothly and all running VMs have been online during the upgrade, so we're very happy about the operation. Now the cluster and Ceph are all HEALTH_OK and stable, no rebalancing or recovery in process.
But our monitoring system (which is Zabbix-based) is telling us that the OSDs (which are all NVMe SSDs, 4 on each of the 9 nodes for a total of 36) frequently spike to 100% I/O activity. Analysing the data and comparing it with the data from a few days ago, we realised that although the bandwidth in reads and writes from VMs to Ceph and the IOPS are similar (the VMs have the same load they had a few days ago), the individual NVMe SSDs do a much higher number of writes and reads (by a factor of x50!)
I'm afraid that this will greatly accelerate the SSD wear process a lot and, under high VM load conditions, also slow down the performance (for now the client-side performance remains good, but August is not a busy month and the VMs are very underutilised).
Curiously, network traffic on the 10 Gb/s network dedicated to Ceph did not increase at all (and so which data is Ceph reading and writing continuously on the OSDs? Only transferring data between the OSDs of each single node? Maybe doing some kind of internal format conversion?)
Do you have any ideas? Thank you very much!

fabian · Aug 13, 2021

possibly this (https://docs.ceph.com/en/latest/releases/octopus/#v15-2-0-octopus):

5. Upgrade all OSDs by installing the new packages and restarting the ceph-osd daemons on all OSD hosts:

# systemctl restart ceph-osd.target

Note that the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).

?

lucaferr · Aug 13, 2021

fabian said:
possibly this (https://docs.ceph.com/en/latest/releases/octopus/#v15-2-0-octopus):

?

Hi Fabian, the first OSD restart did took 5 to 10 minutes, during which the upgraded OSD where down and the CPU on the node very high....but then, once all the OSDs went up on all nodes and Ceph went back to HEALTH_OK I assumed that the upgrade process was completed. Did I assume wrong? Also, more than 24 hours have passed since the update and these are fast NVMe drives with 2 TB capacity each, so I guess that in 24 hours it would have completed any adjustment process...unless it had to do some sort of indexing or other optimisation... has this happened to anyone else?

PS: ceph versions shows everything already upgraded to 15.2.13:

2021-08-13 15_05_43-217.61.42.68 - PuTTY.png

lucaferr · Aug 16, 2021

I apologise, the monitoring system was reporting incorrect I/O data because the iostat output used changed between PVE 5 and PVE 6. So false alarm, PVE 6 with Ceph Octopus works perfectly!
I mark the topic as [SOLVED]

Search

Search

[SOLVED] Huge increase in I/O load on NVMe disks (at equal VM load) after upgrade from Ceph 12 to 15

lucaferr

Renowned Member

fabian

Proxmox Staff Member

lucaferr

Renowned Member

lucaferr

Renowned Member

We value your privacy