After enabling CEPH pool one-way mirroring pool usage is growing up constantly and pool could overfull shortly

NetUser · Dec 27, 2022

My experience tells:
1. Issue is due to some old problems
2. Issue is due to VM writing too much / bad network / etc
Solution for problem n.1 is disable journaling from the VMs where you have accumulated journal_data, your prompt will be given back after some time (seconds to minutes, depending on how many journal data are accumulated) and re-enable it
Solution for problem n.2 is using snapshot based mirroring

hainh · Dec 27, 2022

Thanks @NetUser for quick response.
I have checked all VM disk images in pool name and found that only one VM got this journaling issue (among 47 VMs currently mirroring).
I cleaned up all journal data for this vm with

Bash:

rbd -p <pool-name> --image vm-1011-disk-0 journal reset

and then enable journaling feature again.
After that, space usage went down significantly (about 600GB claimed).
We will continue to monitor this VM again to see if journal data keep increasing.

NetUser · Dec 27, 2022

just out of curiosity, how did you checked which one are accumulating journal_data?
I've personally found a way with a for loop checking how many journal_data objects are present in my pool with rbd info on every image but I believe there can be a better solution

hainh · Dec 27, 2022

Guess what? I used same method

isodude · Dec 30, 2022

@hainh Yeah,
Add

rbd skip partial discard = false

to the ceph conf.

In our case AioDiscard that where generated by fstrim (and I guess ext4 journal) made JournalMetadata not update the rbd-image properly. Thus journal_data files were added and never removed.

There's some more info here: https://tracker.ceph.com/issues/57396

I don't see anyone mention the VM configuration here, but from my testing as soon as I had
* A VM with discard (for virtio-blk that means 4.18+, virtio-scsi has it earlier)
* discard on in qemu.

Journaling would stop replaying on the local cluster as soon as a discard event that was not aligned came through.

Happy new year

isodude · Jan 2, 2023

Note that this in effect disables discards since the same outcome is made by setting rbd discard granularity bytes = 0.

I made a fix that stops the misaligned discards from entering the journal. Hopefully that will make sense or becomes modified to fix the problem.

This has been an issue since Nautilus: https://github.com/ceph/ceph/commit/7ca1bab90f3db3aaaa4cdbfc1f18e9f5cfbf5568

hainh · Jan 9, 2023

@isodude yes, you're right. The VM has discard=on option.

isodude · Jan 10, 2023

@hainh thanks. Does it work with the setting? Note that it's the client config, not server, thus you need to set this in the conf on the proxmox host.
If you could confirm the fix that would be awesome!

I should note that there may be hidden dragons here, nothing confirmed but at least this issue: https://tracker.ceph.com/issues/18352. If you find anything, report, I guess

This should soon be fixed in upstream and backported to stable releases. Commit and tests are pushed and I guess right now the wheels need to turn their turns

hainh · Jan 10, 2023

@isodude oh, i didn't know it's client side. If that's proxmox host, it should be in /etc/ceph/ceph.conf or /etc/pve/storage.cfg ?
and which option should I put there? cause you mentioned two

isodude · Jan 10, 2023

@hainh
The file is

Code:

/etc/pve/priv/ceph/<storage_name>.conf

and the config is

Code:

rbd discard granularity bytes = 0

A good place to add it is under [global].

hainh · Jan 10, 2023

@isodude on my proxmox host under /etc/pve/priv/ceph/ can only see keyring and secret files (which use for authentication).
Or you mean we can create <storage_name>.conf and put it there? I doubt it will have any impact.

isodude · Jan 10, 2023

Use the /etc/ceph/ceph.conf in that case.

isodude · Jan 11, 2023

@hainh I might actually be wrong here. I tried it out now and rbd_discard_granularity_bytes cannot be set to < 4096. Setting rbd_skip_partial_discard = false sets rbd_discard_granularity_bytes = 0 internally.

It's easy to see this

Code:

rbd journal inspect --image $IMAGE --verbose 
Entry: tag_id=1, commit_tid=1
{
    "event_type": "AioDiscard",
    "offset": 113151025152,
    "length": 4096,
    "discard_granularity_bytes": 0,
    "timestamp": "2023-01-11T18:21:34.764330+0100"
}

hainh · Jan 12, 2023

Thanks for your update. I will check it out.

isodude · Jan 12, 2023

NetUser said:
just out of curiosity, how did you checked which one are accumulating journal_data?
I've personally found a way with a for loop checking how many journal_data objects are present in my pool with rbd info on every image but I believe there can be a better solution

This should make it possible to check it with one command. Check if active_set != minimum_set.

Code:

rbd --format json --verbose mirror pool status

hainh · Jan 12, 2023

@isodude I can't seem to see any field active_set or minimum_set from the output. This is one example

JSON:

{
      "name": "csi-vol-e531e023-7c53-11ed-a59c-763d294f4792",
      "global_id": "cd9abe46-d63a-4e81-b852-3f39d1e20c60",
      "state": "up+replaying",
      "description": "replaying, {\"bytes_per_second\":0.0,\"bytes_per_snapshot\":1613010944.0,\"local_snapshot_timestamp\":1671100623,\"remote_snapshot_timestamp\":1671100623,\"replay_state\":\"idle\"}",
      "daemon_service": {
        "service_id": "19654137",
        "instance_id": "19654956",
        "daemon_id": "DELL-MEC7",
        "hostname": "DELL-MEC7"
      },
      "last_update": "2023-01-12 11:57:41"
}

Ceph version is Octopus.

isodude · Jan 12, 2023

btw you are running snapshotting and not journalbased on the pool.

Code:

    {
      "name": "<image>",
      "global_id": "0af1a493-f8dd-483c-8b38-779f1f10be15",
      "state": "up+stopped",
      "description": "local image is primary",
      "daemon_service": {
        "service_id": "1271169854",
        "instance_id": "1271169859",
        "daemon_id": "<mirror-id>",
        "hostname": "<mirror-client>"
      },
      "last_update": "2023-01-12 08:48:58",
      "peer_sites": [
        {
          "site_name": "<remote-cluster>",
          "mirror_uuids": "bdde9b90-df26-4e3d-84b3-66605dc45608",
          "state": "up+replaying",
          "description": "replaying, {\"bytes_per_second\":3991.2,\"entries_behind_primary\":0,\"entries_per_second\":0.77,\"non_primary_position\":{\"entry_tid\":4391671,\"object_number\":4607,\"tag_tid\":23},\"primary_position\":{\"entry_tid\":4391671,\"object_number\":4607,\"tag_tid\":23}}",
          "last_update": "2023-01-12 08:48:40"
        }
      ]
    }
[CODE]

primary_position.entry_tid-non_primary_position.entry_tid should be positive at least.

For each image
[CODE]
rbd --pool $POOL --image $IMAGE journal status --format json
{
  "minimum_set": 1151,
  "active_set": 1151,
  "registered_clients": [
    {
      "id": "",
      "data": "00000000  02 01 0d 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|\n00000010  00 00 00                                          |...|\n00000013\n",
      "commit_position": {
        "object_positions": [
          {
            "object_number": 4604,
            "tag_tid": 23,
            "entry_tid": 4391856
          },
          {
            "object_number": 4607,
            "tag_tid": 23,
            "entry_tid": 4391855
          },
          {
            "object_number": 4606,
            "tag_tid": 23,
            "entry_tid": 4391854
          },
          {
            "object_number": 4605,
            "tag_tid": 23,
            "entry_tid": 4391853
          }
        ]
      },
      "state": "connected"
    },
    {
      "id": "bdde9b90-df26-4e3d-84b3-66605dc45608",
      "data": "00000000  02 01 2a 00 00 00 01 00  00 00 0e 00 00 00 31 35  |..*...........15|\n00000010  31 32 36 61 31 36 38 64  63 30 65 35 01 00 00 00  |126a168dc0e5....|\n00000020  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|\n00000030\n",
      "commit_position": {
        "object_positions": [
          {
            "object_number": 4604,
            "tag_tid": 23,
            "entry_tid": 4391856
          },
          {
            "object_number": 4607,
            "tag_tid": 23,
            "entry_tid": 4391855
          },
          {
            "object_number": 4606,
            "tag_tid": 23,
            "entry_tid": 4391854
          },
          {
            "object_number": 4605,
            "tag_tid": 23,
            "entry_tid": 4391853
          }
        ]
      },
      "state": "connected"
    }
  ]
}

Here you have minimum vs active set.

longskylab · Nov 8, 2024

Hi isodude finally did you find any solution to prevent journaling data accumulation incase of high write operation on the VM?
Any settings that can set to speed up the mirroring process (journal mode)?

Search

Search

After enabling CEPH pool one-way mirroring pool usage is growing up constantly and pool could overfull shortly

NetUser

Member

hainh

Member

NetUser

Member

hainh

Member

isodude

Active Member

isodude

Active Member

hainh

Member

isodude

Active Member

hainh

Member

isodude

Active Member

hainh

Member

isodude

Active Member

isodude

Active Member

hainh

Member

isodude

Active Member

hainh

Member

isodude

Active Member

longskylab

New Member

We value your privacy