Copy progress stalls when moving drives from Ceph to local lvm

devaux · Oct 7, 2024

Hi there,
Doing my first steps with ceph storage and i'm not sure if there is anything wrong or this behaviour is expected.
When moving a disk from ceph to local lvm-thin (2x business SSDs on RAID1 with RAID-Controller), the progress stalls for about a minute multiple times.

Strangely enough, I can't reproduce this behaviour, when I move it back to Ceph. It only happens when i move from Ceph to local lvm.

The VM is a newly installed Ubuntu 24.04 Server with 120GB Disk with lvm partition layout.

Update: Just noticed that the disk uses 120GB of space on the lvm-thin although it uses just about 5gb of data in the ubuntu-vm.

Update2: Did following test: Installed a new Ubuntu VM with a 120GB disk, lvm-partitioned on the lvm-thin. As expected it took just a few GB space on the lvm-thin. But after moving it to ceph, it looks like it's thick-provisioned:

Code:

~# rbd du vms0_ceph/vm-901-disk-0
NAME           PROVISIONED  USED
vm-901-disk-0      120 GiB  120 GiB

Update3: It looks like i can get some space back doing a "fstrim -va" inside the vm. At least the space which is allocated to the vm:

Code:

~# rbd du vms0_ceph/vm-901-disk-0
NAME           PROVISIONED  USED
vm-901-disk-0      120 GiB  69 GiB

Code:

# pvs
  PV         VG        Fmt  Attr PSize    PFree
  /dev/sda3  ubuntu-vg lvm2 a--  <118.00g 59.00g

The workaround i found is to create a partition from the unallocated space and use blkdiscard to trim:

Code:

lvcreate -l100%FREE -n blkdiscard ubuntu-vg
blkdiscard -v /dev/ubuntu-vg/blkdiscard
lvremove ubuntu-vg/blkdiscard

Now it looks quite better:

Code:

# rbd du vms0_ceph/vm-901-disk-0
NAME           PROVISIONED  USED  
vm-901-disk-0      120 GiB  9.8 GiB

Update4: ok, it's definitely "trim-related". If i trim the disks of the VM before moving from lvm-thin to Ceph, it's a) faster and b) doesn't have these slowdowns while moving.
But after every move between these two storage types, i have to trim again.
So i think it is what it is(?)

VictorSTS · Oct 7, 2024

Drive full clone, the process used to do storage migration live with the VM powered on, has to copy every block from source to destination storage. thus losing thing provisioning. To recover it, install QEMU Agent in the VM, enable it in the options of the VM and tick "Run fstrim after moving a disk or migrating the VM.". PVE will instruct QEMU agent to run an fstrim after each storage migration.

The slowdowns you see might be caused by the destination drives not being able to keep up writing and writes stall until drives / controller have flushed their buffers.

devaux · Oct 8, 2024

VictorSTS said:
Drive full clone, the process used to do storage migration live with the VM powered on, has to copy every block from source to destination storage. thus losing thing provisioning. To recover it, install QEMU Agent in the VM, enable it in the options of the VM and tick "Run fstrim after moving a disk or migrating the VM.". PVE will instruct QEMU agent to run an fstrim after each storage migration.

That's what i missed! Thanks a lot!

EDIT: Doesn't look like it's working as expected. Created a new Ubuntu VM with 32GB LVM - fully provisioned - moved to LVM and back to Ceph with the automatic trim function enabled.

Code:

$ df -h
Filesystem                         Size  Used Avail Use% Mounted on
tmpfs                              391M  1.2M  390M   1% /run
efivarfs                           256K   53K  199K  21% /sys/firmware/efi/efivars
/dev/mapper/ubuntu--vg-ubuntu--lv   29G  7.7G   20G  29% /
tmpfs                              2.0G     0  2.0G   0% /dev/shm
tmpfs                              5.0M     0  5.0M   0% /run/lock
/dev/sda2                          2.0G   96M  1.7G   6% /boot
/dev/sda1                          1.1G  6.2M  1.1G   1% /boot/efi
tmpfs                              391M   12K  391M   1% /run/user/1000

Code:

~$ sudo fstrim -av
/boot/efi: 1 GiB (1118564352 bytes) trimmed on /dev/sda1
/boot: 0 B (0 bytes) trimmed on /dev/sda2
/: 203.4 MiB (213245952 bytes) trimmed on /dev/mapper/ubuntu--vg-ubuntu--lv

Code:

~# rbd du vms0_ceph/vm-900-disk-2
NAME           PROVISIONED  USED 
vm-900-disk-2       32 GiB  30 GiB

VictorSTS said:
The slowdowns you see might be caused by the destination drives not being able to keep up writing and writes stall until drives / controller have flushed their buffers.

This only happens when i move a disk which was not trimmed. If it was trimmed before moving the speed is normal.

VictorSTS · Oct 8, 2024

devaux said:
This only happens when i move a disk which was not trimmed. If it was trimmed before moving the speed is normal.

When the disk is trimmed you are actually moving a lot less data during a migration, maybe not triggering whatever causes the slowdown. Try to fill that 120G disk with data (use something like dd if=/dev/urandom of=/randomfile1 bs=1024 count=1024) and check how it behaves in this case.

Search

Search

Copy progress stalls when moving drives from Ceph to local lvm

devaux

Active Member

Attachments

VictorSTS

Famous Member

devaux

Active Member

VictorSTS

Famous Member