After enabling CEPH pool one-way mirroring pool usage is growing up constantly and pool could overfull shortly

My experience tells:
1. Issue is due to some old problems
2. Issue is due to VM writing too much / bad network / etc
Solution for problem n.1 is disable journaling from the VMs where you have accumulated journal_data, your prompt will be given back after some time (seconds to minutes, depending on how many journal data are accumulated) and re-enable it
Solution for problem n.2 is using snapshot based mirroring
 
Thanks @NetUser for quick response.
I have checked all VM disk images in pool name and found that only one VM got this journaling issue (among 47 VMs currently mirroring).
I cleaned up all journal data for this vm with
Bash:
rbd -p <pool-name> --image vm-1011-disk-0 journal reset
and then enable journaling feature again.
After that, space usage went down significantly (about 600GB claimed).
We will continue to monitor this VM again to see if journal data keep increasing.
 
just out of curiosity, how did you checked which one are accumulating journal_data?
I've personally found a way with a for loop checking how many journal_data objects are present in my pool with rbd info on every image but I believe there can be a better solution
 
@hainh Yeah,
Add
rbd skip partial discard = false
to the ceph conf.

In our case AioDiscard that where generated by fstrim (and I guess ext4 journal) made JournalMetadata not update the rbd-image properly. Thus journal_data files were added and never removed.

There's some more info here: https://tracker.ceph.com/issues/57396

I don't see anyone mention the VM configuration here, but from my testing as soon as I had
* A VM with discard (for virtio-blk that means 4.18+, virtio-scsi has it earlier)
* discard on in qemu.

Journaling would stop replaying on the local cluster as soon as a discard event that was not aligned came through.

Happy new year :)
 
Last edited:
  • Like
Reactions: hainh
@hainh thanks. Does it work with the setting? Note that it's the client config, not server, thus you need to set this in the conf on the proxmox host.
If you could confirm the fix that would be awesome!

I should note that there may be hidden dragons here, nothing confirmed but at least this issue: https://tracker.ceph.com/issues/18352. If you find anything, report, I guess :)

This should soon be fixed in upstream and backported to stable releases. Commit and tests are pushed and I guess right now the wheels need to turn their turns :)
 
Last edited:
@isodude oh, i didn't know it's client side. If that's proxmox host, it should be in /etc/ceph/ceph.conf or /etc/pve/storage.cfg ?
and which option should I put there? cause you mentioned two
 
@hainh
The file is
Code:
/etc/pve/priv/ceph/<storage_name>.conf
and the config is
Code:
rbd discard granularity bytes = 0

A good place to add it is under [global].
 
@isodude on my proxmox host under /etc/pve/priv/ceph/ can only see keyring and secret files (which use for authentication).
Or you mean we can create <storage_name>.conf and put it there? I doubt it will have any impact.
 
@hainh I might actually be wrong here. I tried it out now and rbd_discard_granularity_bytes cannot be set to < 4096. Setting rbd_skip_partial_discard = false sets rbd_discard_granularity_bytes = 0 internally.

It's easy to see this
Code:
rbd journal inspect --image $IMAGE --verbose 
Entry: tag_id=1, commit_tid=1
{
    "event_type": "AioDiscard",
    "offset": 113151025152,
    "length": 4096,
    "discard_granularity_bytes": 0,
    "timestamp": "2023-01-11T18:21:34.764330+0100"
}
 
Last edited:
just out of curiosity, how did you checked which one are accumulating journal_data?
I've personally found a way with a for loop checking how many journal_data objects are present in my pool with rbd info on every image but I believe there can be a better solution
This should make it possible to check it with one command. Check if active_set != minimum_set.
Code:
rbd --format json --verbose mirror pool status
 
@isodude I can't seem to see any field active_set or minimum_set from the output. This is one example
JSON:
{
      "name": "csi-vol-e531e023-7c53-11ed-a59c-763d294f4792",
      "global_id": "cd9abe46-d63a-4e81-b852-3f39d1e20c60",
      "state": "up+replaying",
      "description": "replaying, {\"bytes_per_second\":0.0,\"bytes_per_snapshot\":1613010944.0,\"local_snapshot_timestamp\":1671100623,\"remote_snapshot_timestamp\":1671100623,\"replay_state\":\"idle\"}",
      "daemon_service": {
        "service_id": "19654137",
        "instance_id": "19654956",
        "daemon_id": "DELL-MEC7",
        "hostname": "DELL-MEC7"
      },
      "last_update": "2023-01-12 11:57:41"
}

Ceph version is Octopus.
 
btw you are running snapshotting and not journalbased on the pool.

Code:
    {
      "name": "<image>",
      "global_id": "0af1a493-f8dd-483c-8b38-779f1f10be15",
      "state": "up+stopped",
      "description": "local image is primary",
      "daemon_service": {
        "service_id": "1271169854",
        "instance_id": "1271169859",
        "daemon_id": "<mirror-id>",
        "hostname": "<mirror-client>"
      },
      "last_update": "2023-01-12 08:48:58",
      "peer_sites": [
        {
          "site_name": "<remote-cluster>",
          "mirror_uuids": "bdde9b90-df26-4e3d-84b3-66605dc45608",
          "state": "up+replaying",
          "description": "replaying, {\"bytes_per_second\":3991.2,\"entries_behind_primary\":0,\"entries_per_second\":0.77,\"non_primary_position\":{\"entry_tid\":4391671,\"object_number\":4607,\"tag_tid\":23},\"primary_position\":{\"entry_tid\":4391671,\"object_number\":4607,\"tag_tid\":23}}",
          "last_update": "2023-01-12 08:48:40"
        }
      ]
    }
[CODE]

primary_position.entry_tid-non_primary_position.entry_tid should be positive at least.

For each image
[CODE]
rbd --pool $POOL --image $IMAGE journal status --format json
{
  "minimum_set": 1151,
  "active_set": 1151,
  "registered_clients": [
    {
      "id": "",
      "data": "00000000  02 01 0d 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|\n00000010  00 00 00                                          |...|\n00000013\n",
      "commit_position": {
        "object_positions": [
          {
            "object_number": 4604,
            "tag_tid": 23,
            "entry_tid": 4391856
          },
          {
            "object_number": 4607,
            "tag_tid": 23,
            "entry_tid": 4391855
          },
          {
            "object_number": 4606,
            "tag_tid": 23,
            "entry_tid": 4391854
          },
          {
            "object_number": 4605,
            "tag_tid": 23,
            "entry_tid": 4391853
          }
        ]
      },
      "state": "connected"
    },
    {
      "id": "bdde9b90-df26-4e3d-84b3-66605dc45608",
      "data": "00000000  02 01 2a 00 00 00 01 00  00 00 0e 00 00 00 31 35  |..*...........15|\n00000010  31 32 36 61 31 36 38 64  63 30 65 35 01 00 00 00  |126a168dc0e5....|\n00000020  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|\n00000030\n",
      "commit_position": {
        "object_positions": [
          {
            "object_number": 4604,
            "tag_tid": 23,
            "entry_tid": 4391856
          },
          {
            "object_number": 4607,
            "tag_tid": 23,
            "entry_tid": 4391855
          },
          {
            "object_number": 4606,
            "tag_tid": 23,
            "entry_tid": 4391854
          },
          {
            "object_number": 4605,
            "tag_tid": 23,
            "entry_tid": 4391853
          }
        ]
      },
      "state": "connected"
    }
  ]
}

Here you have minimum vs active set.
 
  • Like
Reactions: hainh
Hi isodude finally did you find any solution to prevent journaling data accumulation incase of high write operation on the VM?
Any settings that can set to speed up the mirroring process (journal mode)?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!