disk move really slow

RobFantini

Famous Member
May 24, 2012
2,022
107
133
Boston,Mass
Lately we have seen that disk move from ceph to zfs is very slow.

I just tested trying to move a 12GB kvm disk.

the kvm was stopped prior to move.

move started at 20:14 . I am posting progress as it occurs.
20:24 13%
20:38 14%
20:43 14%
it is going very slow.

iotop does not most of the time show move activity.

log:
Code:
less UPID:pve3:002F43FB:019B5E32:5BF9F759:qmmove:2165:root*

create full clone of drive scsi0 (ceph_vm:vm-2165-disk-1)

transferred: 0 bytes remaining: 12884901888 bytes total: 12884901888 bytes progression: 0.00 %
transferred: 130137509 bytes remaining: 12754764379 bytes total: 12884901888 bytes progression: 1.01 %
transferred: 260275018 bytes remaining: 12624626870 bytes total: 12884901888 bytes progression: 2.02 %
transferred: 390412527 bytes remaining: 12494489361 bytes total: 12884901888 bytes progression: 3.03 %
transferred: 520550036 bytes remaining: 12364351852 bytes total: 12884901888 bytes progression: 4.04 %
transferred: 650687545 bytes remaining: 12234214343 bytes total: 12884901888 bytes progression: 5.05 %
transferred: 779536564 bytes remaining: 12105365324 bytes total: 12884901888 bytes progression: 6.05 %
transferred: 909674073 bytes remaining: 11975227815 bytes total: 12884901888 bytes progression: 7.06 %
transferred: 1039811582 bytes remaining: 11845090306 bytes total: 12884901888 bytes progression: 8.07 %
transferred: 1169949091 bytes remaining: 11714952797 bytes total: 12884901888 bytes progression: 9.08 %
transferred: 1300086600 bytes remaining: 11584815288 bytes total: 12884901888 bytes progression: 10.09 %
transferred: 1430224109 bytes remaining: 11454677779 bytes total: 12884901888 bytes progression: 11.10 %
transferred: 1560361618 bytes remaining: 11324540270 bytes total: 12884901888 bytes progression: 12.11 %
transferred: 1690499127 bytes remaining: 11194402761 bytes total: 12884901888 bytes progression: 13.12 %
transferred: 1820636636 bytes remaining: 11064265252 bytes total: 12884901888 bytes progression: 14.13 %

the zpool is a mirror of two good data center model intel ssd's:
Code:
Model Family:     Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model:     INTEL SSDSC2BP480G4

ceph network 10G .

the pve host has 2CPU's , 40 cores , 200GB+ memory.



ceph -s :
Code:
  cluster:
    id:     220b9a53-4556-48e3-a73c-28deff665e45
    health: HEALTH_WARN
           noout flag(s) set

  services:
    mon: 3 daemons, quorum pve3,sys8,pve10
    mgr: pve3(active), standbys: sys8, pve10
    osd: 65 osds: 65 up, 65 in
        flags noout

  data:
    pools:   2 pools, 1088 pgs
    objects: 105.93k objects, 373GiB
    usage:   1.10TiB used, 24.4TiB / 25.5TiB avail
    pgs:     1088 active+clean
  io:
    client:   194KiB/s wr, 0op/s rd, 35op/s wr

i'll try letting the move continue and post the result.

the move finished at 23:57 . 3+ hours to move 12GB - Something is wrong with our configuration or there is a bug.

Does anyone have a suggestion to get disk move back to normal speed?
 
Code:
# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-8-pve)                                               
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)                                         
pve-kernel-4.15: 5.2-11                                                                                 
pve-kernel-4.15.18-8-pve: 4.15.18-28                                                                             
pve-kernel-4.15.18-7-pve: 4.15.18-27                                                                             
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-41
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-9
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
pve-zsync: 1.7-1
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve2~bpo1

ceph.conf:
Code:
[global]
         auth client required = none
         auth cluster required = none
         auth service required = none
         cluster network = 10.11.12.0/24
         debug asok = 0/0
         debug auth = 0/0
         debug buffer = 0/0
         debug client = 0/0
         debug context = 0/0
         debug crush = 0/0
         debug filer = 0/0
         debug filestore = 0/0
         debug finisher = 0/0
         debug heartbeatmap = 0/0
         debug journal = 0/0
         debug journaler = 0/0
         debug lockdep = 0/0
         debug mds = 0/0
         debug mds balancer = 0/0
         debug mds locker = 0/0
         debug mds log = 0/0
         debug mds log expire = 0/0
         debug mds migrator = 0/0
         debug mon = 0/0
         debug monc = 0/0
         debug ms = 0/0
         debug objclass = 0/0
         debug objectcacher = 0/0
         debug objecter = 0/0
         debug optracker = 0/0
         debug osd = 0/0
         debug paxos = 0/0
 
         debug perfcounter = 0/0
         debug rados = 0/0
         debug rbd = 0/0
         debug rgw = 0/0
         debug throttle = 0/0
         debug timer = 0/0
         debug tp = 0/0
         fsid = xx
         keyring = /etc/pve/priv/$cluster.$name.keyring
         mon allow pool delete = true
         osd deep scrub interval = 1209600
         osd journal size = 5120
         osd pool default min size = 2
         osd pool default size = 3
         osd scrub begin hour = 21
         osd scrub end hour = 6
         osd scrub sleep = 0.1
         public network = 10.11.12.0/24

[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring
         osd max backfills = 1
         osd recovery max active = 1

         #
         # 2018-11-21  see https://forum.proxmox.com/threads/ceph-slow-requests-are-blocked.48955/#post-229061
         # disable throttling
         #
         objecter_inflight_ops = 10240
         objecter_inflight_op_bytes = 1048576000
         ms_dispatch_throttle_bytes = 1048576000
         osd_client_message_size_cap = 0
         osd_client_message_cap = 0
         osd_enable_op_tracker = false

[mon.pve3]
         host = pve3
         mon addr = 10.11.12.3:6789

[mon.sys8]
         host = sys8
         mon addr = 10.11.12.8:6789

[mon.pve10]
         host = pve10
         mon addr = 10.11.12.10:6789
 
I just did a lxc move from zfs to ceph , will next move same lxc back to ceph.

I was checking the log during the process.
Code:
/dev/rbd1
mke2fs 1.43.4 (31-Jan-2017)
Discarding device blocks: done                           
Creating filesystem with 1048576 4k blocks and 262144 inodes
Filesystem UUID: db048e5a-7ac5-4921-958f-f9fd5fcc7aba
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736

Allocating group tables: done                           
Writing inode tables: done                           
Creating journal (16384 blocks): done
Multiple mount protection is enabled with update interval 5 seconds.
Writing superblocks and filesystem accounting information: done

Number of files: 145,126 (reg: 102,269, dir: 22,983, link: 19,814, dev: 2, special: 58)
Number of created files: 145,124 (reg: 102,269, dir: 22,981, link: 19,814, dev: 2, special: 58)
Number of deleted files: 0
Number of regular files transferred: 30,029
Total file size: 3,600,828,164 bytes
Total transferred file size: 1,657,949,168 bytes
Literal data: 1,657,949,168 bytes
Matched data: 0 bytes
File list size: 2,948,767
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 1,671,368,068
Total bytes received: 6,830,918

sent 1,671,368,068 bytes  received 6,830,918 bytes  7,187,147.69 bytes/sec
total size is 3,600,828,164  speedup is 2.15
TASK OK

after the transfer was done - when the log wrote the ' total size ..' line, it took around 10 more minutes for the 'TASK OK' to get to the log.

during that 10 minutes - at the pve screen - the task shows in progress AND root disk showed still at zfs.
 
moving from ceph to zfs was fast / normal speed.
started at 9:02, ended at 9:03 - including done at pve screen.

Code:
Number of files: 145,125 (reg: 102,269, dir: 22,983, link: 19,814, dev: 2, special: 57)
Number of created files: 145,124 (reg: 102,269, dir: 22,982, link: 19,814, dev: 2, special: 57)
Number of deleted files: 0
Number of regular files transferred: 30,029
Total file size: 3,600,870,523 bytes
Total transferred file size: 1,657,991,527 bytes
Literal data: 1,657,991,527 bytes
Matched data: 0 bytes
File list size: 2,817,580
File list generation time: 0.012 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 1,671,415,307
Total bytes received: 6,830,971

sent 1,671,415,307 bytes  received 6,830,971 bytes  30,238,671.68 bytes/sec
total size is 3,600,870,523  speedup is 2.15

I'll repeat move to ceph then back to zfs .
 
zfs to ceph: 4 minutes.

ceph to zfs 4 minutes. note the actual transfer took a minutes. the task was stuck for 3.5 minutes at 'Removing image: 97% complete...' .

so results are not consistent .

this could be network related . we are changing from Intel 10G to Mellanox ConnectX-4 ethernet this week.
 
from a linux desktop running on ceph, i ran a disk benchmark.

it got stuck 1/3 way through.

attached is dmesg from that time

here is top part of the file:
Code:
[  187.471084] sd 0:0:0:1: [sdb] tag#10 abort
[  363.593385] INFO: task kworker/u8:0:6 blocked for more than 120 seconds.
[  363.593391]       Not tainted 4.9.0-8-amd64 #1 Debian 4.9.130-2
[  363.593393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.593395] kworker/u8:0    D    0     6      2 0x00000000
[  363.593408] Workqueue: scsi_tmf_0 scmd_eh_abort_handler [scsi_mod]
[  363.593410]  ffff94346eb27a40 0000000000000000 ffff943473762e40 ffff943485d98980
[  363.593412]  ffff9434737961c0 ffffac0a818c7cb0 ffffffffbc414219 0000000000000001
[  363.593414]  00ffac0a818c7d28 ffff943485d98980 0000001301321092 ffff943473762e40
[  363.593417] Call Trace:
[  363.593424]  [<ffffffffbc414219>] ? __schedule+0x239/0x6f0
[  363.593426]  [<ffffffffbc414702>] ? schedule+0x32/0x80
[  363.593427]  [<ffffffffbc417a8d>] ? schedule_timeout+0x1dd/0x380
[  363.593429]  [<ffffffffbc415131>] ? wait_for_completion+0xf1/0x130
[  363.593432]  [<ffffffffbbea59a0>] ? wake_up_q+0x70/0x70
[  363.593434]  [<ffffffffc031f954>] ? virtscsi_tmf.constprop.12+0xa4/0x110 [virtio_scsi]
[  363.593438]  [<ffffffffc02e44c2>] ? scmd_eh_abort_handler+0x72/0x270 [scsi_mod]
[  363.593440]  [<ffffffffbbe9421a>] ? process_one_work+0x18a/0x420
[  363.593441]  [<ffffffffbbe944fd>] ? worker_thread+0x4d/0x490
[  363.593443]  [<ffffffffbbe944b0>] ? process_one_work+0x420/0x420
[  363.593444]  [<ffffffffbbe9a569>] ? kthread+0xd9/0xf0
[  363.593447]  [<ffffffffbc4190a4>] ? __switch_to_asm+0x34/0x70
[  363.593448]  [<ffffffffbbe9a490>] ? kthread_park+0x60/0x60
[  363.593449]  [<ffffffffbc419137>] ? ret_from_fork+0x57/0x70
[  363.593472] INFO: task gnome-disks:1393 blocked for more than 120 seconds.
[  363.593473]       Not tainted 4.9.0-8-amd64 #1 Debian 4.9.130-2
[  363.593474] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.593475] gnome-disks     D    0  1393   1274 0x00000000
[  363.593477]  ffff94346e0a59c0 0000000000000000 ffff94346fe93080 ffff943485d18980
[  363.593478]  ffff943473796f40 ffffac0a82cdba90 ffffffffbc414219 0000000000000000
[  363.593480]  00ff94346ec53850 ffff943485d18980 ffff94346e996210 ffff94346fe93080
[  363.593481] Call Trace:
 

Attachments

I moved disk to zfs [ took 10 min for a 1GB disk] .

from zfs benchmark worked. 2 GB/s read, 208 MB Write.

then when done i moved the disk back to ceph. the entire process took less then 15 seconds. more like 10.
 
Last edited:
This sounds exactly like the behaviour I've been seeing. It started when I started using ZFS. I gave up.
 
I should add - I've moved back to EXT4 and the difference so far seems to be quite a lot. I would not be surprised at all to find out there is some kind of bug either in the code, or perhaps just in a default config somewhere that's causing this.
 
I should add - I've moved back to EXT4 and the difference so far seems to be quite a lot. I would not be surprised at all to find out there is some kind of bug either in the code, or perhaps just in a default config somewhere that's causing this.

So with ZFS you had the issue. Interesting... is there a thread where you already posted info on your set up? I'd like to look for similarities . thank you for the responses
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!