Currently, at our university, we have a large standalone Ceph cluster that serves as storage for our virtual machines. We utilize this Ceph cluster for multiple purposes, including: Virtual Machine Storage, S3 Object Storage and CephFS for Samba.
Overall, the performance is stable, and the cluster operates smoothly for most workloads. However, we are encountering significant issues with Windows virtual machines when using RBD as the primary disk storage (both for boot and data disks).
These errors typically occur during I/O-intensive operations (e.g., large file transfers or database workloads) on Windows VMs. Interestingly, Linux VMs do not exhibit the same problem even under similar workloads, indicating a Windows-specific issues.
Overall, the performance is stable, and the cluster operates smoothly for most workloads. However, we are encountering significant issues with Windows virtual machines when using RBD as the primary disk storage (both for boot and data disks).
Issue Description
During high I/O workloads on Windows VMs, we experience severe performance degradation accompanied by numerous errors in the PVE kernel logs. The errors appear as:
Code:
[Thu Oct 10 13:23:42 2024] libceph: read_partial_message 00000000d9278a57 data crc 623151463 != exp. 3643241286
[Thu Oct 10 13:23:42 2024] libceph: osd39 (1)158.*.*.*:7049 bad crc/signature
[Thu Oct 10 13:23:42 2024] libceph: osd76 (1)158.*.*.*:6978 bad crc/signature
[Thu Oct 10 13:23:42 2024] libceph: read_partial_message 00000000eeebfe2b data crc 3093401367 != exp. 937192709
[Thu Oct 10 13:23:42 2024] libceph: read_partial_message 00000000113dffc8 data crc 830701436 != exp. 4041960045
[Thu Oct 10 13:23:42 2024] libceph: osd80 (1)158.*.*.*:6805 bad crc/signature
These errors typically occur during I/O-intensive operations (e.g., large file transfers or database workloads) on Windows VMs. Interestingly, Linux VMs do not exhibit the same problem even under similar workloads, indicating a Windows-specific issues.
Root Cause Analysis
After investigating Ceph’s RBD documentation and reviewing kernel logs, we identified that these errors are likely related to how Windows handles large I/O operations. Specifically, the issue arises because Windows may map a temporary “dummy” page into the destination buffer, making the buffer unstable during data transfer. This instability causes libceph to miscalculate checksums, resulting in bad crc/signature errors and performance degradation.Solution: Enable rxbounce
According to the Ceph RBD documentation, enabling the rxbounce option can resolve these issues. The rxbounce option forces the RBD client to use a bounce buffer when receiving data, which stabilizes the destination buffer during I/O operations. This is particularly necessary when dealing with Windows' I/O:Source: Ceph RBD Documentationrxbounce: Use a bounce buffer when receiving data (introduced in kernel version 5.17). The default behavior is to read directly into the destination buffer, but this can cause problems if the destination buffer isn't stable. A bounce buffer is needed if the destination buffer may change during read operations, such as when Windows maps a temporary page to generate a single large I/O. Otherwise, libceph: bad crc/signature or libceph: integrity error messages can occur, along with performance degradation.
Implementation
To enable this option, I modified the RBDPlugin.pm file to include the rxbounce parameter inside @options variable. After applying the change, the errors stopped appearing, and the performance of Windows VMs improved significantly. Here’s a brief overview of the changes made:
Perl:
sub map_volume {
my ($class, $storeid, $scfg, $volname, $snapname) = @_;
my ($vtype, $img_name, $vmid) = $class->parse_volname($volname);
my $name = $img_name;
$name .= '@'.$snapname if $snapname;
my $kerneldev = get_rbd_dev_path($scfg, $storeid, $name);
return $kerneldev if -b $kerneldev; # already mapped
# features can only be enabled/disabled for image, not for snapshot!
$krbd_feature_update->($scfg, $storeid, $img_name);
# added this option variable as proof of concept
my @options = (
'--options' , 'rxbounce',
);
my $cmd = $rbd_cmd->($scfg, $storeid, 'map', @options, $name);
run_rbd_command($cmd, errmsg => "can't map rbd volume $name");
return $kerneldev;
}
Outcome
With rxbounce enabled, the performance of Windows VMs is now on par with Linux VMs, and the kernel logs are free from the bad crc/signature errors. This solution has stabilized our Windows virtual machine workloads, making the Ceph cluster reliable for mixed OS environments.Suggestion
It would be usefull to have field in RBD storage configuration to add custom map options.PVE version
Code:
proxmox-ve: 8.2.0 (running kernel: 6.5.11-8-pve)
pve-manager: 8.2.7 (running version: 8.2.7/3e0176e6bb2ade3b)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-9
proxmox-kernel-6.8: 6.8.12-2
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.8.8-4-pve-signed: 6.8.8-4
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
pve-kernel-5.15.131-2-pve: 5.15.131-3
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.3
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.1
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.10
libpve-storage-perl: 8.2.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-4
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.2.0
pve-docs: 8.2.3
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.0.7
pve-firmware: 3.13-2
pve-ha-manager: 4.0.5
pve-i18n: 3.2.3
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.4
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1
Last edited: