[Feature request] Ability to enable rxbounce option in Ceph RBD storage for windows VMs

Oct 10, 2024
5
11
3
Currently, at our university, we have a large standalone Ceph cluster that serves as storage for our virtual machines. We utilize this Ceph cluster for multiple purposes, including: Virtual Machine Storage, S3 Object Storage and CephFS for Samba.

Overall, the performance is stable, and the cluster operates smoothly for most workloads. However, we are encountering significant issues with Windows virtual machines when using RBD as the primary disk storage (both for boot and data disks).

Issue Description​

During high I/O workloads on Windows VMs, we experience severe performance degradation accompanied by numerous errors in the PVE kernel logs. The errors appear as:

Code:
[Thu Oct 10 13:23:42 2024] libceph: read_partial_message 00000000d9278a57 data crc 623151463 != exp. 3643241286
[Thu Oct 10 13:23:42 2024] libceph: osd39 (1)158.*.*.*:7049 bad crc/signature
[Thu Oct 10 13:23:42 2024] libceph: osd76 (1)158.*.*.*:6978 bad crc/signature
[Thu Oct 10 13:23:42 2024] libceph: read_partial_message 00000000eeebfe2b data crc 3093401367 != exp. 937192709
[Thu Oct 10 13:23:42 2024] libceph: read_partial_message 00000000113dffc8 data crc 830701436 != exp. 4041960045
[Thu Oct 10 13:23:42 2024] libceph: osd80 (1)158.*.*.*:6805 bad crc/signature

These errors typically occur during I/O-intensive operations (e.g., large file transfers or database workloads) on Windows VMs. Interestingly, Linux VMs do not exhibit the same problem even under similar workloads, indicating a Windows-specific issues.

Root Cause Analysis​

After investigating Ceph’s RBD documentation and reviewing kernel logs, we identified that these errors are likely related to how Windows handles large I/O operations. Specifically, the issue arises because Windows may map a temporary “dummy” page into the destination buffer, making the buffer unstable during data transfer. This instability causes libceph to miscalculate checksums, resulting in bad crc/signature errors and performance degradation.

Solution: Enable rxbounce​

According to the Ceph RBD documentation, enabling the rxbounce option can resolve these issues. The rxbounce option forces the RBD client to use a bounce buffer when receiving data, which stabilizes the destination buffer during I/O operations. This is particularly necessary when dealing with Windows' I/O:

rxbounce: Use a bounce buffer when receiving data (introduced in kernel version 5.17). The default behavior is to read directly into the destination buffer, but this can cause problems if the destination buffer isn't stable. A bounce buffer is needed if the destination buffer may change during read operations, such as when Windows maps a temporary page to generate a single large I/O. Otherwise, libceph: bad crc/signature or libceph: integrity error messages can occur, along with performance degradation.
Source: Ceph RBD Documentation

Implementation​

To enable this option, I modified the RBDPlugin.pm file to include the rxbounce parameter inside @options variable. After applying the change, the errors stopped appearing, and the performance of Windows VMs improved significantly. Here’s a brief overview of the changes made:

Perl:
sub map_volume {
    my ($class, $storeid, $scfg, $volname, $snapname) = @_;


    my ($vtype, $img_name, $vmid) = $class->parse_volname($volname);


    my $name = $img_name;
    $name .= '@'.$snapname if $snapname;


    my $kerneldev = get_rbd_dev_path($scfg, $storeid, $name);


    return $kerneldev if -b $kerneldev; # already mapped


    # features can only be enabled/disabled for image, not for snapshot!
    $krbd_feature_update->($scfg, $storeid, $img_name);

    # added this option variable as proof of concept
    my @options = (
        '--options' , 'rxbounce',
    );
    my $cmd = $rbd_cmd->($scfg, $storeid, 'map', @options, $name);
    run_rbd_command($cmd, errmsg => "can't map rbd volume $name");


    return $kerneldev;
}

Outcome​

With rxbounce enabled, the performance of Windows VMs is now on par with Linux VMs, and the kernel logs are free from the bad crc/signature errors. This solution has stabilized our Windows virtual machine workloads, making the Ceph cluster reliable for mixed OS environments.

Suggestion​

It would be usefull to have field in RBD storage configuration to add custom map options.

PVE version​

Code:
proxmox-ve: 8.2.0 (running kernel: 6.5.11-8-pve)
pve-manager: 8.2.7 (running version: 8.2.7/3e0176e6bb2ade3b)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-9
proxmox-kernel-6.8: 6.8.12-2
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.8.8-4-pve-signed: 6.8.8-4
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
pve-kernel-5.15.131-2-pve: 5.15.131-3
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.3
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.1
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.10
libpve-storage-perl: 8.2.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-4
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.2.0
pve-docs: 8.2.3
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.0.7
pve-firmware: 3.13-2
pve-ha-manager: 4.0.5
pve-i18n: 3.2.3
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.4
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1
 
Last edited:
Currently, at our university, we have a large standalone Ceph cluster that serves as storage for our virtual machines. We utilize this Ceph cluster for multiple purposes, including: Virtual Machine Storage, S3 Object Storage and CephFS for Samba.

Overall, the performance is stable, and the cluster operates smoothly for most workloads. However, we are encountering significant issues with Windows virtual machines when using RBD as the primary disk storage (both for boot and data disks).

Issue Description​

During high I/O workloads on Windows VMs, we experience severe performance degradation accompanied by numerous errors in the PVE kernel logs. The errors appear as:

Code:
[Thu Oct 10 13:23:42 2024] libceph: read_partial_message 00000000d9278a57 data crc 623151463 != exp. 3643241286
[Thu Oct 10 13:23:42 2024] libceph: osd39 (1)158.*.*.*:7049 bad crc/signature
[Thu Oct 10 13:23:42 2024] libceph: osd76 (1)158.*.*.*:6978 bad crc/signature
[Thu Oct 10 13:23:42 2024] libceph: read_partial_message 00000000eeebfe2b data crc 3093401367 != exp. 937192709
[Thu Oct 10 13:23:42 2024] libceph: read_partial_message 00000000113dffc8 data crc 830701436 != exp. 4041960045
[Thu Oct 10 13:23:42 2024] libceph: osd80 (1)158.*.*.*:6805 bad crc/signature

These errors typically occur during I/O-intensive operations (e.g., large file transfers or database workloads) on Windows VMs. Interestingly, Linux VMs do not exhibit the same problem even under similar workloads, indicating a Windows-specific issues.

Root Cause Analysis​

After investigating Ceph’s RBD documentation and reviewing kernel logs, we identified that these errors are likely related to how Windows handles large I/O operations. Specifically, the issue arises because Windows may map a temporary “dummy” page into the destination buffer, making the buffer unstable during data transfer. This instability causes libceph to miscalculate checksums, resulting in bad crc/signature errors and performance degradation.

Solution: Enable rxbounce​

According to the Ceph RBD documentation, enabling the rxbounce option can resolve these issues. The rxbounce option forces the RBD client to use a bounce buffer when receiving data, which stabilizes the destination buffer during I/O operations. This is particularly necessary when dealing with Windows' I/O:


Source: Ceph RBD Documentation

Implementation​

To enable this option, I modified the RBDPlugin.pm file to include the rxbounce parameter inside @options variable. After applying the change, the errors stopped appearing, and the performance of Windows VMs improved significantly. Here’s a brief overview of the changes made:

Perl:
sub map_volume {
    my ($class, $storeid, $scfg, $volname, $snapname) = @_;


    my ($vtype, $img_name, $vmid) = $class->parse_volname($volname);


    my $name = $img_name;
    $name .= '@'.$snapname if $snapname;


    my $kerneldev = get_rbd_dev_path($scfg, $storeid, $name);


    return $kerneldev if -b $kerneldev; # already mapped


    # features can only be enabled/disabled for image, not for snapshot!
    $krbd_feature_update->($scfg, $storeid, $img_name);

    # added this option variable as proof of concept
    my @options = (
        '--options' , 'rxbounce',
    );
    my $cmd = $rbd_cmd->($scfg, $storeid, 'map', @options, $name);
    run_rbd_command($cmd, errmsg => "can't map rbd volume $name");


    return $kerneldev;
}

Outcome​

With rxbounce enabled, the performance of Windows VMs is now on par with Linux VMs, and the kernel logs are free from the bad crc/signature errors. This solution has stabilized our Windows virtual machine workloads, making the Ceph cluster reliable for mixed OS environments.

Suggestion​

It would be usefull to have field in RBD storage configuration to add custom map options.

PVE version​

Code:
proxmox-ve: 8.2.0 (running kernel: 6.5.11-8-pve)
pve-manager: 8.2.7 (running version: 8.2.7/3e0176e6bb2ade3b)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-9
proxmox-kernel-6.8: 6.8.12-2
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.8.8-4-pve-signed: 6.8.8-4
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
pve-kernel-5.15.131-2-pve: 5.15.131-3
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.3
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.1
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.10
libpve-storage-perl: 8.2.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-4
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.2.0
pve-docs: 8.2.3
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.0.7
pve-firmware: 3.13-2
pve-ha-manager: 4.0.5
pve-i18n: 3.2.3
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.4
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1
Amazing work... we have the same issue and where still wondering how we got enable rxbounce in PVE Hyperconverged CEPH!
 
  • Like
Reactions: Johannes S
Very nice find, yet the forums is not the way to do a feature request, they go here.
To have endless discussions there about why something is not worth their time and to end up DIY it and post it here anyways?

@janlostak Please mark your post as "TUTORIAL", you can do so by editing the "thread" option in the top right corner.
 
This issue happens often when KRBD is enabled and PBS-Backup-Jobs run towards a fast PBS.
So I believe this is a more "often" issue than most people think and should be filed as "bug"...
 
  • Like
Reactions: esi_y
So for my understanding,
you edit "/usr/share/perl5/PVE/Storage/RDBPlugin.pm" and add these lines?
Code:
    # added this option variable as proof of concept
    my @options = (
        '--options' , 'rxbounce',
    );

What has to be done after edit? Reboot nodes? Anything else?
 
So for my understanding,
you edit "/usr/share/perl5/PVE/Storage/RDBPlugin.pm" and add these lines?
Code:
    # added this option variable as proof of concept
    my @options = (
        '--options' , 'rxbounce',
    );

What has to be done after edit? Reboot nodes? Anything else?
As far as I understand, you need to stop and start the VM for which you want to activate it. I would need to check for myself.
 
As far as I understand, you need to stop and start the VM for which you want to activate it. I would need to check for myself.
You need also add the @option to command argument:

my $cmd = $rbd_cmd->($scfg, $storeid, 'map', @options, $name);

then run:

systemctl restart pvedaemon

and shutdown the VM. After succesfull shutdown start the VM again and the option will be in effect on RBD device.
 
Last edited:
  • Like
Reactions: itNGO
This issue happens often when KRBD is enabled and PBS-Backup-Jobs run towards a fast PBS.
So I believe this is a more "often" issue than most people think and should be filed as "bug"...
Yes, I posted it in the hope that it would gain more attention and eventually be fixed. That’s why I didn’t present it as a tutorial, because I believe there should be a proper way to enable advanced features on RBD storage. Another reason this should not be considered a tutorial for fixing the issue is that when you upgrade to a new version of PVE, the modified script will be most likely overwritten with the original version.
 
Yes, I posted it in the hope that it would gain more attention and eventually be fixed. That’s why I didn’t present it as a tutorial, because I believe there should be a proper way to enable advanced features on RBD storage. Another reason this should not be considered a tutorial for fixing the issue is that when you upgrade to a new version of PVE, the modified script will be most likely overwritten with the original version.
I made a ticket @proxmox and asked for "permanent" implementation option....
 
Hi, thanks for sharing your troubleshooting steps and good to hear that rxbounce fixes the Windows VM performance issues for you.
Yes, I posted it in the hope that it would gain more attention and eventually be fixed. That’s why I didn’t present it as a tutorial, because I believe there should be a proper way to enable advanced features on RBD storage. Another reason this should not be considered a tutorial for fixing the issue is that when you upgrade to a new version of PVE, the modified script will be most likely overwritten with the original version.
Yes, I'd also recommend against running patched code. As suggested, could you please open a feature request on our Bugzilla? [0]

Some first thoughts (I'll also post them to the feature request): Always setting rxbounce (as done by the patch) is probably not desirable, as it would also affect setups which don't see problems currently, and rxbounce may have an unnecessary performance impact then.

If you want to enable rxbounce for your pool or specific images without code changes, you can try adding it to rbd_default_map_options either on the pool or on the image level. See [1] [2] for more information. For a mapped device /dev/rbdN you can check the current map options in /sys/block/rbdN/device/config_info.

Just for the record, switching to librbd (disabling KRBD), which is apparently not affected by these issues, can also be a workaround.

[0] https://bugzilla.proxmox.com/
[1] https://docs.ceph.com/en/reef/man/8/rbd/#commands
[2] https://github.com/ceph/ceph/blob/b...26ac9b4c3/src/common/options/rbd.yaml.in#L507
 
Last edited:
Just for the record, switching to librbd (disabling KRBD), which is apparently not affected by these issues, can also be a workaround.
Thx for sharing and joining the discussion...

Disabling KRBD is what we often did in the past. This problem rises nearly everywhere when Proxmox Backup Server is used, KRBD is enabled and you backup Windows VM... so I consider this as a "known" issue....

But in our cluster "NVME-Only" 3 Nodes... we loose about 20% iops just by disabling KRBD... on the other side, besides the CRC-Error in the logs, there is nothing known about any problems which rise beside the syslog getting flooded....

As the TO stated.. maybe an "option" in the GUI is desirable....
 
Thx for sharing and joining the discussion...

Disabling KRBD is what we often did in the past. This problem rises nearly everywhere when Proxmox Backup Server is used, KRBD is enabled and you backup Windows VM... so I consider this as a "known" issue....

But in our cluster "NVME-Only" 3 Nodes... we loose about 20% iops just by disabling KRBD... on the other side, besides the CRC-Error in the logs, there is nothing known about any problems which rise beside the syslog getting flooded....

As the TO stated.. maybe an "option" in the GUI is desirable....
Under heavy storage I/O load (not fully saturated—about 50%) on the PVE node hosting both Linux and Windows virtual machines, we sometimes experience hundreds or even thousands of these bad CRC/signature messages per second, leading to degraded performance. With the rxbounce option enabled, these issues are resolved. I also tried disabling KRBD, but the overall performance dropped by about 20–30%, and I/O latency increased. This problem consistently occurs when we run I/O-intensive scientific calculations inside our VMs.
 
Last edited:
Hi, thanks for sharing your troubleshooting steps and good to hear that rxbounce fixes the Windows VM performance issues for you.

Yes, I'd also recommend against running patched code. As suggested, could you please open a feature request on our Bugzilla? [0]

Some first thoughts (I'll also post them to the feature request): Always setting rxbounce (as done by the patch) is probably not desirable, as it would also affect setups which don't see problems currently, and rxbounce may have an unnecessary performance impact then.

If you want to enable rxbounce for your pool or specific images without code changes, you can try adding it to rbd_default_map_options either on the pool or on the image level. See [1] [2] for more information. For a mapped device /dev/rbdN you can check the current map options in /sys/block/rbdN/device/config_info.

Just for the record, switching to librbd (disabling KRBD), which is apparently not affected by these issues, can also be a workaround.

[0] https://bugzilla.proxmox.com/
[1] https://docs.ceph.com/en/reef/man/8/rbd/#commands
[2] https://github.com/ceph/ceph/blob/b...26ac9b4c3/src/common/options/rbd.yaml.in#L507
I opened the request in bugzilla https://bugzilla.proxmox.com/show_bug.cgi?id=5779 and included link to this forum post if it is enough.
 
Yes, I'd also recommend against running patched code. As suggested, could you please open a feature request on our Bugzilla? [0]

Do you guys even ever call anything being a bug?

Some first thoughts (I'll also post them to the feature request): Always setting rxbounce (as done by the patch) is probably not desirable, as it would also affect setups which don't see problems currently, and rxbounce may have an unnecessary performance impact then.

I am not sure I misunderstand here, but:

Suggestion​

It would be usefull to have field in RBD storage configuration to add custom map options.

That's asking for something else.

If you want to enable rxbounce for your pool or specific images without code changes, you can try adding it to rbd_default_map_options either on the pool or on the image level. See [1] [2] for more information. For a mapped device /dev/rbdN you can check the current map options in /sys/block/rbdN/device/config_info.

Just for the record, switching to librbd (disabling KRBD), which is apparently not affected by these issues, can also be a workaround.

I also don't understand there's two different ways staff happen to approach bug reports.

One is that they instantly create a new THEMSELVES in BZ, make themselves assignee and start submitting patches or RFC.

The other is to tell OP to go BZ themselves.

The former usually happens when the report is about "crossing t's and dotting i's" kind of reports.

 
Just for Information... we changed two 3-Node-Clusters with this "fix" and just had our PBS-Run.... about 170VMs, around 110 of them running Windows... Not a single CRC error... smooth and fast.... so whatever it takes... this fix has to be implemented permanently as an option!

Before the change the logs where flooded and sometimes the system got so "weird" that we had to split the backup-times to keep the system reliable stable.... these issues are gone now....
 
This issue happens often when KRBD is enabled and PBS-Backup-Jobs run towards a fast PBS.
So I believe this is a more "often" issue than most people think and should be filed as "bug"...
What is a fast PBS in your words? NVMe / SSD only storage with sufficient bandwidth eg. 40G / 25G / 100G ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!