kworker 99% IO

Andrey Dudin

New Member
Dec 16, 2017
10
0
1
33
When I migrate/restore VM I see that kworker use 99% IO (by iotop). And other VMs on destination node sometimes stuck.


All running VMs on destination node show this exception:
Снимок экрана 2019-02-09 в 15.08.00.png

I use HW RAID1. Lvm-thin pools.
Hardware: R540 - Silver 4114
Raid: Perc H730


pveversion -v
Code:
# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-10-pve)
pve-manager: 5.3-9 (running version: 5.3-9/ba817b29)
pve-kernel-4.15: 5.3-1
pve-kernel-4.15.18-10-pve: 4.15.18-32
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-45
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-37
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-33
pve-container: 2.0-34
pve-docs: 5.3-2
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-17
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 3.10.1-1
qemu-server: 5.0-46
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1

I think problem maybe with dd. Because when I use live migration with local disks, I see copying of disks with good speed and kworker is sleeping.

iotop:
Code:
Total DISK READ :      21.26 K/s | Total DISK WRITE :      84.38 M/s
Actual DISK READ:      21.26 K/s | Actual DISK WRITE:      85.33 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                   
 4173 be/4 root          0.00 B      0.00 B  0.00 % 99.27 % [kworker/u80:2]
 5091 be/4 root         89.00 K      0.00 B  0.00 % 87.92 % vgs --separator : --noheadings --units b ~-options vg_name,vg_size,vg_free,lv_count
 2647 be/4 root          0.00 B      0.00 B  0.00 % 11.58 % [kworker/u82:0]
39193 be/4 root          4.00 K      0.00 B  0.00 %  4.34 % [kworker/u80:1]
 4830 be/4 root          0.00 B      0.00 B  0.00 %  0.03 % [kworker/u81:1]
  635 be/3 root          0.00 B     36.00 K  0.00 %  0.03 % [jbd2/dm-1-8]
 4986 be/4 root          0.00 B      0.00 B  0.00 %  0.01 % [kworker/u81:4]
37696 be/4 root          0.00 B      0.00 B  0.00 %  0.01 % [kworker/u81:2]
 4739 be/4 root          0.00 B      0.00 B  0.00 %  0.00 % [kworker/u82:3]
 4352 be/4 root          0.00 B    929.50 M  0.00 %  0.00 % dd of=/dev/pve-hdd/vm-123-disk-2 conv=sparse bs=64k


perf report:
Code:
+   14.02%     0.00%  kworker/u80:2    [kernel.kallsyms]                        [k] ret_from_fork
+   14.02%     0.00%  kworker/u80:2    [kernel.kallsyms]                        [k] kthread            
+   14.02%     0.00%  kworker/u80:2    [kernel.kallsyms]                        [k] worker_thread
+   14.02%     0.00%  kworker/u80:2    [kernel.kallsyms]                        [k] process_one_work
+   14.02%     0.00%  kworker/u80:2    [kernel.kallsyms]                        [k] do_worker
+   14.02%     0.00%  kworker/u80:2    [kernel.kallsyms]                        [k] process_prepared
+   14.02%     0.01%  kworker/u80:2    [kernel.kallsyms]                        [k] process_prepared_mapping
+   13.60%     0.06%  kworker/u80:2    [kernel.kallsyms]                        [k] inc_remap_and_issue_cell
+   12.99%     0.02%  kworker/u80:2    [kernel.kallsyms]                        [k] remap_and_issue
+   12.93%     0.00%  kworker/u80:2    [kernel.kallsyms]                        [k] issue
+   12.86%     0.17%  kworker/u80:2    [kernel.kallsyms]                        [k] generic_make_request

From host dmesg:
Code:
[Sat Feb  9 02:57:43 2019] INFO: task dd:29806 blocked for more than 120 seconds.

[Sat Feb  9 02:57:43 2019]       Tainted: P           O     4.15.18-10-pve #1
[Sat Feb  9 02:57:43 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Feb  9 02:57:43 2019] dd              D    0 29806  29784 0x00000000
[Sat Feb  9 02:57:43 2019] Call Trace:
[Sat Feb  9 02:57:43 2019]  __schedule+0x3e0/0x870
[Sat Feb  9 02:57:43 2019]  schedule+0x36/0x80
[Sat Feb  9 02:57:43 2019]  io_schedule+0x16/0x40
[Sat Feb  9 02:57:43 2019]  wait_on_page_bit_common+0xf3/0x190
[Sat Feb  9 02:57:43 2019]  ? page_cache_tree_insert+0xe0/0xe0
[Sat Feb  9 02:57:43 2019]  __filemap_fdatawait_range+0xfa/0x160
[Sat Feb  9 02:57:43 2019]  filemap_write_and_wait+0x4d/0x90
[Sat Feb  9 02:57:43 2019]  __blkdev_put+0x7a/0x210
[Sat Feb  9 02:57:43 2019]  ? fsnotify+0x259/0x440
[Sat Feb  9 02:57:43 2019]  blkdev_put+0x4c/0xd0
[Sat Feb  9 02:57:43 2019]  blkdev_close+0x34/0x70
[Sat Feb  9 02:57:43 2019]  __fput+0xea/0x220
[Sat Feb  9 02:57:43 2019]  ____fput+0xe/0x10
[Sat Feb  9 02:57:43 2019]  task_work_run+0x9d/0xc0
[Sat Feb  9 02:57:43 2019]  exit_to_usermode_loop+0xc4/0xd0
[Sat Feb  9 02:57:43 2019]  do_syscall_64+0xf4/0x130
[Sat Feb  9 02:57:43 2019]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[Sat Feb  9 02:57:43 2019] RIP: 0033:0x7f3fec42ccf0
[Sat Feb  9 02:57:43 2019] RSP: 002b:00007ffd46e71a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[Sat Feb  9 02:57:43 2019] RAX: 0000000000000000 RBX: 00007f3fec904698 RCX: 00007f3fec42ccf0
[Sat Feb  9 02:57:43 2019] RDX: 00007ffd46e71a70 RSI: 00007ffd46e71a70 RDI: 0000000000000001
[Sat Feb  9 02:57:43 2019] RBP: 0000000000000000 R08: 0000000000006000 R09: 0000000000004000
[Sat Feb  9 02:57:43 2019] R10: 00000000000005e1 R11: 0000000000000246 R12: ffffffffffffffff
[Sat Feb  9 02:57:43 2019] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[Sat Feb  9 02:57:43 2019] INFO: task systemd-udevd:33283 blocked for more than 120 seconds.
[Sat Feb  9 02:57:43 2019]       Tainted: P           O     4.15.18-10-pve #1


Monitoring during migration:
Снимок экрана 2019-02-09 в 21.31.13.png
Снимок экрана 2019-02-09 в 21.29.56.png
Снимок экрана 2019-02-09 в 21.32.45.png Снимок экрана 2019-02-09 в 21.35.35.png
 
Last edited:
I have set the bandwidth limit to 25MB/s, but I have the same problem. The migration of my 20 GB VM never ends :'(.
 
Strange.
Are you using SATA disks on the RAID 1 and what a is the model and make.

Secondly on iotop after making the change suggested above do you see it limiting bandwidth at 25MB/s
 
For me, I used 2x1 TB SSD (RAID 1). The model is Crucial_CT1050MX300SSD1.

iotop result:

Screenshot from 2020-04-05 14-33-10.png

Proxmox log:
Code:
2020-04-05 14:10:01 use dedicated network address for sending migration traffic (...)
2020-04-05 14:10:01 starting migration of VM 300 to node 'dedicated' (...)
2020-04-05 14:10:01 found local disk 'local-lvm:vm-300-disk-0' (in current VM config)
2020-04-05 14:10:01 copying local disk images
  Logical volume "vm-300-disk-0" created.
 
I don't understand where the trouble is.

I've been retrying to find the trouble for hours, but I still haven't figured out why it's doing this.
 
Seems fine - I used Crucial SSDs before and they cant be the cause.

Could it be your network throughput? I assume the dedicated network is 1Gbps or even 10Gbps?

I checked our config and we have this in the /etc/pve/datacenter.cfg file:

cat /etc/pve/datacenter.cfg
keyboard: en-us
# use dedicated migration network
migration: secure,network=10.0.0.0/24


bwlimit: move=71680,restore=51200,default=51200,migration=51200,clone=51200

We dont have any issues using the above and have this on quite a few old and new servers. I think you need to find the bottleneck somewhere. Try lowered the settings even lower
 
The network throughput is good (1Gbps dedicated network yes). The transfer of the disk is done quite quickly on the destination machine.

But is seems blocked on the destination machine with dd of=/dev/pve/vm-300-~k-0 conv=sparse bs=64k running process.

It doesn't seem to me that bandwidth is the problem here, but a problem of io on the disk, but I don't see where that could be.

I have the IO delay indicator on Proxmox GUI which is always between 8 and 10 during the migration.

Screenshot from the Proxmox GUI:
Screenshot from 2020-04-05 15-08-16.png

For /etc/pve/datacenter.cfg config i have this:

Code:
bwlimit: clone=25600,default=25600,migration=25600,move=25600,restore=25600
keyboard: fr
migration: secure
 
Having same issues, I'm restoring 1 VM from backup file. It takes forever. Iotop shows only few K/s Disk write. But says 99.99% IO , strange.
 
The same problem.
progress 93% (read 31954567168 bytes, duration 53 sec)
progress 94% (read 32298172416 bytes, duration 53 sec)
progress 95% (read 32641777664 bytes, duration 54 sec)
progress 96% (read 32985382912 bytes, duration 55 sec)
progress 97% (read 33328988160 bytes, duration 56 sec)
progress 98% (read 33672593408 bytes, duration 56 sec)
progress 99% (read 34016198656 bytes, duration 56 sec)
progress 100% (read 34359738368 bytes, duration 56 sec)
-----------------------------
total bytes read 34359738368, sparse bytes 3074543616 (8.95%)
space reduction due to 4K zero blocks 1.03%
rescan volumes...
----------------------------

and to finish 32G VM restore - underlined took about 3.5hours...

I tried:
VM on LVMthin --> VM on LVM thin, VM on ZFS, VM on qcow
VM on qcow --> VM on LVM thin, VM on ZFS, VM on qcow

With Proxmox 6.3 there wasn't troubles

added:

Tested once more with different variants on 6.3 - all is fine. So i can say that this is a bug of Proxmox 6.4
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!